Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progressive extraction #1155

Open
mweichert opened this issue Nov 7, 2024 · 1 comment
Open

Progressive extraction #1155

mweichert opened this issue Nov 7, 2024 · 1 comment

Comments

@mweichert
Copy link

Hi there!

First off, great project.

I wanted to confirm whether or not this capability was available, and if not, log it as a feature request.

I'd like to create a task which goes to a website and extracts an array of links.

  • Navigates to a url
  • Loop until no there's no next page button: 1) Extracts all links containing "[text]" 2) Click next page button

From my limited testing, it looks like its impossible to extract some data, continue browsing, and then extract some more data, is that right?

Thanks,
Mike

@suchintan
Copy link
Contributor

Hey Mike,

This is possible with our workflows feature. You can basically set it up so that it does this:

Task block w/ data extraction goal: extract number of pages
Loop over the output above: for each page
Task block: Navigate to each page, data extract the links

This will pretty much give you what you're looking for!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants