# Website Scrape

Use **Website Scrape** when you want a workflow step to fetch a page from the web and return the parts of that page your workflow needs.

## Configuration

| Option                                                            | Required | Description                                                                                     |
| ----------------------------------------------------------------- | -------- | ----------------------------------------------------------------------------------------------- |
| Name                                                              | No       | Label for the step in the workflow canvas.                                                      |
| URL                                                               | Yes      | URL in **URL**. This field supports workflow variables through **Insert Variable**.             |
| HTML Output                                                       | No       | Controls whether **HTML** output is returned.                                                   |
| Markdown Output                                                   | No       | Controls whether **Markdown** output is returned.                                               |
| Links Output                                                      | No       | Controls whether extracted **Links** are returned.                                              |
| Subpages                                                          | No       | Controls whether **Subpages** crawling is enabled.                                              |
| Crawl Mode                                                        | No       | Crawl behavior selected in **Crawl Mode**.                                                      |
| Max Characters                                                    | No       | Maximum output size set in **Max Characters**.                                                  |
| Max Retries                                                       | No       | Retry count in **Max Retries**.                                                                 |
| Timeout (ms)                                                      | No       | Timeout value in **Timeout (ms)**.                                                              |
| Screenshot                                                        | No       | Controls whether a page screenshot is captured.                                                 |
| Screenshot Type                                                   | No       | Screenshot mode in **Screenshot Type** when screenshots are enabled.                            |
| When the step fails                                               | No       | Controls whether the workflow should **Terminate Workflow** or **Continue** if this step fails. |
| Add this step from the **Research** group in **Search steps...**. |          |                                                                                                 |

The **URL** field supports **Insert Variable**. Below that, the settings sheet lets you choose which output types to return: **HTML**, **Markdown**, **Links**, and **Subpages**.

Use **Crawl Mode** to control how Fetch Hive retrieves the page:

* **Preferred** tries a live crawl first, then falls back to cache.
* **Always** always uses a live crawl.
* **Fallback** uses cache first, then crawls if needed.
* **Never** only uses cache.

If you turn **Screenshot** on, **Screenshot Type** appears with **Viewport** and **Full Page** options.

## Output

Click **Run** in the step header to test the step. Fetch Hive shows the scrape result in **Output** after the run completes.

Use the variable picker in a later step to insert the exact output path available for that run. The base reference is:

```
{{STEP_IDENTIFIER.output}}
```

The exact fields depend on which outputs you enabled. For example, HTML, markdown, links, subpage data, and screenshot-related fields only appear when those outputs are turned on. Use the variable picker after a test run to inspect the returned fields.

## Example

Add **Website Scrape** from the **Research** group in **Search steps...**.

Set **Name** to something like `Scrape product page`.

Paste the page into **URL**. If the URL comes from an earlier workflow step, click **Insert Variable** and add that reference.

Turn on the outputs you need. For example, enable **Markdown** for clean content, **Links** for extracted links, and **Subpages** if you want one subpage crawled from the main page.

Choose a **Crawl Mode**, then set **Max Characters**, **Max Retries**, and **Timeout (ms)** for the run.

If you need a visual capture, turn **Screenshot** on and choose **Viewport** or **Full Page** in **Screenshot Type**.

Click **Run** and review the scraped result in **Output** before sending it to later workflow steps.

## Notes

* The returned result depends on which output toggles you enable, so inspect the variable picker after a run if you need exact field names.
* The editor shows a warning for direct LinkedIn URLs. LinkedIn URLs are not supported for scraping.
* Use **Markdown** when you want cleaner page content, **HTML** when you need raw markup, and **Links** when you only need extracted URLs.

See also: [Creating and Editing](https://docs.fetchhive.com/workflows/creating-and-editing), [Testing and Iteration](https://docs.fetchhive.com/workflows/testing-and-iteration), and [Error Handling](https://docs.fetchhive.com/workflows/error-handling)
