Website Scrape

Configure a workflow step that scrapes a website URL and returns selected content types for later steps

Use Website Scrape when you want a workflow step to fetch a page from the web and return the parts of that page your workflow needs.

Configuration

Option
Required
Description

Name

No

Label for the step in the workflow canvas.

URL

Yes

URL in URL. This field supports workflow variables through Insert Variable.

HTML Output

No

Controls whether HTML output is returned.

Markdown Output

No

Controls whether Markdown output is returned.

Links Output

No

Controls whether extracted Links are returned.

Subpages

No

Controls whether Subpages crawling is enabled.

Crawl Mode

No

Crawl behavior selected in Crawl Mode.

Max Characters

No

Maximum output size set in Max Characters.

Max Retries

No

Retry count in Max Retries.

Timeout (ms)

No

Timeout value in Timeout (ms).

Screenshot

No

Controls whether a page screenshot is captured.

Screenshot Type

No

Screenshot mode in Screenshot Type when screenshots are enabled.

When the step fails

No

Controls whether the workflow should Terminate Workflow or Continue if this step fails.

Add this step from the Research group in Search steps....

The URL field supports Insert Variable. Below that, the settings sheet lets you choose which output types to return: HTML, Markdown, Links, and Subpages.

Use Crawl Mode to control how Fetch Hive retrieves the page:

  • Preferred tries a live crawl first, then falls back to cache.

  • Always always uses a live crawl.

  • Fallback uses cache first, then crawls if needed.

  • Never only uses cache.

If you turn Screenshot on, Screenshot Type appears with Viewport and Full Page options.

Output

Click Run in the step header to test the step. Fetch Hive shows the scrape result in Output after the run completes.

Use the variable picker in a later step to insert the exact output path available for that run. The base reference is:

The exact fields depend on which outputs you enabled. For example, HTML, markdown, links, subpage data, and screenshot-related fields only appear when those outputs are turned on. Use the variable picker after a test run to inspect the returned fields.

Example

Add Website Scrape from the Research group in Search steps....

Set Name to something like Scrape product page.

Paste the page into URL. If the URL comes from an earlier workflow step, click Insert Variable and add that reference.

Turn on the outputs you need. For example, enable Markdown for clean content, Links for extracted links, and Subpages if you want one subpage crawled from the main page.

Choose a Crawl Mode, then set Max Characters, Max Retries, and Timeout (ms) for the run.

If you need a visual capture, turn Screenshot on and choose Viewport or Full Page in Screenshot Type.

Click Run and review the scraped result in Output before sending it to later workflow steps.

Notes

  • The returned result depends on which output toggles you enable, so inspect the variable picker after a run if you need exact field names.

  • The editor shows a warning for direct LinkedIn URLs. LinkedIn URLs are not supported for scraping.

  • Use Markdown when you want cleaner page content, HTML when you need raw markup, and Links when you only need extracted URLs.

See also: Creating and Editing, Testing and Iteration, and Error Handling

Last updated