Website Scrape

Configure a workflow step that scrapes a website URL and returns selected content types for later steps

Use Website Scrape when you want a workflow step to fetch a page from the web and return the parts of that page your workflow needs.

Configuration

Option

Required

Description

Name

Label for the step in the workflow canvas.

URL

Yes

URL in URL. This field supports workflow variables through Insert Variable.

HTML Output

Controls whether HTML output is returned.

Markdown Output

Controls whether Markdown output is returned.

Links Output

Controls whether extracted Links are returned.

Subpages

Controls whether Subpages crawling is enabled.

Crawl Mode

Crawl behavior selected in Crawl Mode.

Max Characters

Maximum output size set in Max Characters.

Max Retries

Retry count in Max Retries.

Timeout (ms)

Timeout value in Timeout (ms).

Screenshot

Controls whether a page screenshot is captured.

Screenshot Type

Screenshot mode in Screenshot Type when screenshots are enabled.

When the step fails

Controls whether the workflow should Terminate Workflow or Continue if this step fails.

Add this step from the Research group in Search steps....

The URL field supports Insert Variable. Below that, the settings sheet lets you choose which output types to return: HTML, Markdown, Links, and Subpages.

Use Crawl Mode to control how Fetch Hive retrieves the page:

Preferred tries a live crawl first, then falls back to cache.
Always always uses a live crawl.
Fallback uses cache first, then crawls if needed.
Never only uses cache.

If you turn Screenshot on, Screenshot Type appears with Viewport and Full Page options.

Output

Click Run in the step header to test the step. Fetch Hive shows the scrape result in Output after the run completes.

Use the variable picker in a later step to insert the exact output path available for that run. The base reference is:

{{STEP_IDENTIFIER.output}}

The exact fields depend on which outputs you enabled. For example, HTML, markdown, links, subpage data, and screenshot-related fields only appear when those outputs are turned on. Use the variable picker after a test run to inspect the returned fields.

Example

Add Website Scrape from the Research group in Search steps....

Set Name to something like Scrape product page.

Paste the page into URL. If the URL comes from an earlier workflow step, click Insert Variable and add that reference.

Turn on the outputs you need. For example, enable Markdown for clean content, Links for extracted links, and Subpages if you want one subpage crawled from the main page.

Choose a Crawl Mode, then set Max Characters, Max Retries, and Timeout (ms) for the run.

If you need a visual capture, turn Screenshot on and choose Viewport or Full Page in Screenshot Type.

Click Run and review the scraped result in Output before sending it to later workflow steps.

Notes

The returned result depends on which output toggles you enable, so inspect the variable picker after a run if you need exact field names.
The editor shows a warning for direct LinkedIn URLs. LinkedIn URLs are not supported for scraping.
Use Markdown when you want cleaner page content, HTML when you need raw markup, and Links when you only need extracted URLs.

PreviousCopilot Search NextUtility

Last updated 6 hours ago

hashtagConfiguration

hashtagOutput

hashtagExample

hashtagNotes

Configuration

Output

Example

Notes