How does skyvern integrate with ollama litellm #242

alexsiu398 · 2024-04-29T02:31:52Z

Is there a way or tutorial on how to configure ollama litellm to work with skyvern? How can skyvern work with a local llm?

suchintan · 2024-05-03T07:38:38Z

Here's an example where @ykeremy built out bedrock support within Skyvern

https://github.com/Skyvern-AI/skyvern/pull/251/files

Are you open to opening a PR for ollama + litellm? We'd love a contribution here!

suchintan · 2024-05-03T07:39:11Z

Ignore the files in the experimentation module. The other configs are all you need!

santiagoblanco22 · 2024-05-10T03:34:16Z

Nice. Now GPT4 60 USD in 3 days. :( Ollama is awesome! I don't know how to help!

suchintan · 2024-05-10T04:10:34Z

@santiagoblanco22 we would love a contribution here!! Or maybe we can ask for people's help in our discord?

GPT4 is super expensive. Try it with Claude 3 sonnet instead

OB42 · 2024-05-10T08:35:24Z

hi, I'm currently trying to add it. :)

Do you think we should allow all ollama models? in setup.sh should we ask the user for a specific model name(as a string)? or a numbered choice like for anthropic with just llama3/mistral maybe llava?

OB42 · 2024-05-12T18:44:36Z

FYI for now it seems that most models available on Ollama are not good enough for Skyvern , at least on my computer, so it seems pointless to add models that would not work well.

Maybe it could work with a 34/70B model with no quantization, but you would need a very beefy setup, at that point you'd probably be better off using bedrock/anthropic IMO

github-actions · 2024-06-13T01:48:50Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2024-06-27T01:49:33Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions · 2024-08-25T01:57:19Z

This issue is stale because it has been open for 30 days with no activity.

HarryBak · 2024-09-05T15:42:40Z

FYI for now it seems that most models available on Ollama are not good enough for Skyvern , at least on my computer, so it seems pointless to add models that would not work well.

Maybe it could work with a 34/70B model with no quantization, but you would need a very beefy setup, at that point you'd probably be better off using bedrock/anthropic IMO

What models did you try using?

At what stage was it getting stuck?

Any other information you could share that may help the research into making this reality?

OB42 · 2024-09-05T15:52:30Z

FYI for now it seems that most models available on Ollama are not good enough for Skyvern , at least on my computer, so it seems pointless to add models that would not work well.
Maybe it could work with a 34/70B model with no quantization, but you would need a very beefy setup, at that point you'd probably be better off using bedrock/anthropic IMO

What models did you try using?

At what stage was it getting stuck?

Any other information you could share that may help the research into making this reality?

If I remember correctly, LLama-2, Mistral 7B, Phi3, maybe I'm forgetting some.
It was struggling to follow the prompt, to output a valid JSON(not sure if skyvern still uses JSON with LLMs?), and/or choose the correct id when deciding to click on something

maybe there was too much quantization, maybe I did not use adequate parameters, maybe we need completely different prompts for weaker models

Also I think @suchintan said that the screenshots are really important for the LLMs to correctly understand what's going on the page, and most of the models I tried were not multimodal(and when they were, they did not have a good enough understanding of the screenshots)

But this was months ago and I'm not really sure about this, also I didn't really follow the latest changes with Skyvern.

HarryBak · 2024-09-05T16:21:23Z

FYI for now it seems that most models available on Ollama are not good enough for Skyvern , at least on my computer, so it seems pointless to add models that would not work well.
Maybe it could work with a 34/70B model with no quantization, but you would need a very beefy setup, at that point you'd probably be better off using bedrock/anthropic IMO

What models did you try using?
At what stage was it getting stuck?
Any other information you could share that may help the research into making this reality?

If I remember correctly, LLama-2, Mistral 7B, Phi3, maybe I'm forgetting some. It was struggling to follow the prompt, to output a valid JSON(not sure if skyvern still uses JSON with LLMs?), and/or choose the correct id when deciding to click on something

maybe there was too much quantization, maybe I did not use adequate parameters, maybe we need completely different prompts for weaker models

Also I think @suchintan said that the screenshots are really important for the LLMs to correctly understand what's going on the page, and most of the models I tried were not multimodal(and when they were, they did not have a good enough understanding of the screenshots)

But this was months ago and I'm not really sure about this, also I didn't really follow the latest changes with Skyvern.

Didn't expect such a prompt reply, thank you. Do you know what GPU you were running at the time?

Awesome information, the models have come a very long way since you first tested this so I'm intrigued to see.

I would assume that for this to work on local models there will have to be a lot of optimizations to be done. Context length becomes an issue quickly with local models.

For repeating workflows where only the input JSON changes then caching the workflow locally will be a definite requirement.

First run through would be slow but after that it would be lightning fast.

If anyone has any inputs on ideas to optimize the workflow further feel free to add them in here.

suchintan · 2024-09-05T17:42:23Z

We've had some promising results doing internal testing of the intern vl 2 model series (https://huggingface.co/spaces/opencompass/open_vlm_leaderboard)

I'm not sure if that's available on ollama yet but it might be a good place to get started!

github-actions · 2024-10-06T02:03:56Z

This issue is stale because it has been open for 30 days with no activity.

brooksc · 2024-10-08T04:09:26Z

Hey all -- given the recent release of Llama 3.2 vision models, how would one evaluate if these are sufficiently good?

https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-vision-models-(11b/90b)-

full disclosure: I work for Meta, but I'm not asking as part of my work - I'm using Skyvern for personal projects... using it to unsubscribe from the too many emails I seem to be unsubscribed to... :)

suchintan · 2024-10-08T04:19:29Z

Hey all -- given the recent release of Llama 3.2 vision models, how would one evaluate if these are sufficiently good?

https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-vision-models-(11b/90b)-

full disclosure: I work for Meta, but I'm not asking as part of my work - I'm using Skyvern for personal projects... using it to unsubscribe from the too many emails I seem to be unsubscribed to... :)

OMG. We've been heads down building some features (specifically: caching to torpedo Skyvern's runtime costs) -- haven't had a chance to try it yet! I'm optimistic that the 90B Llama 3.2 will be good enough for Skyvern which would be HUGE

@brooksc are you open to opening a PR adding support for it??

brooksc · 2024-10-08T04:28:58Z

@brooksc are you open to opening a PR adding support for it??

I'd love to, but I'm not an engineer -- I know python but lately I've been using aider.chat and deepseek to code, so I'll see what I can do. I see you are using litellm already so I can at least evaluate myself it with the vision models.

Do you have some benchmarks or other process to evaluate whether a model is good enough to recommend? As an example Aider https://aider.chat/docs/leaderboards/ has a nice way to evaluate new models and see the accuracy. Is there an equivalent of the various text llm benchmarks (e.g. the equivalent of MMLU, Arc Challenge, etc?) for web browsing?

suchintan · 2024-10-08T04:34:11Z

@brooksc are you open to opening a PR adding support for it??

I'd love to, but I'm not an engineer -- I know python but lately I've been using aider.chat and deepseek to code, so I'll see what I can do. I see you are using litellm already so I can at least evaluate myself it with the vision models.

Do you have some benchmarks or other process to evaluate whether a model is good enough to recommend? As an example Aider https://aider.chat/docs/leaderboards/ has a nice way to evaluate new models and see the accuracy. Is there an equivalent of the various text llm benchmarks (e.g. the equivalent of MMLU, Arc Challenge, etc?) for web browsing?

We have an internal benchmark geared towards our existing users / customers, but can't open source it due to confidentiality agreements

We have plans to open source another benchmark likely in December

Our approach (for open source) has been to allow users to select whichever model they like by setting the LLM Key, and adding support for the latest and greatest

Here's how we added bedrock support: https://github.com/Skyvern-AI/skyvern/pull/251/files -- hopefully this is a good point to get started! If you do add a new LLM Key for llama 3.2 we would LOVE a contribution

DIGist · 2024-10-09T21:57:06Z

Maybe a good way to benchmark if open models might work is to have a known page that works with skyvern via closed models (say like youtube.com) and an example of the controls or data that is detected in the page. That way a person could feed the same page to an open source model to see what controls/data it can detect and respond with so we could quickly evaluate which models might be worth trying to setup for more robust testing.

suchintan · 2024-10-09T22:55:13Z

Maybe a good way to benchmark if open models might work is to have a known page that works with skyvern via closed models (say like youtube.com) and an example of the controls or data that is detected in the page. That way a person could feed the same page to an open source model to see what controls/data it can detect and respond with so we could quickly evaluate which models might be worth trying to setup for more robust testing.

Great idea -- this is how we do our own internal benchmarks but haven't had a chance to open source it yet! I think we will open source it soon though

suchintan mentioned this issue May 3, 2024

We are eagerly anticipating integrating Ollama into Skyvern. #239

Closed

suchintan added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed community Suggestions, discussions around how to build and elevate our community labels May 3, 2024

This was referenced May 3, 2024

How to use local vision model to replace gpt-4 turbo? #180

Closed

How to replace with a local large model? #267

Closed

suchintan mentioned this issue May 22, 2024

Integration with Ollama #92

Closed

github-actions bot added the Stale label Jun 13, 2024

github-actions bot closed this as completed Jun 27, 2024

suchintan reopened this Jun 30, 2024

suchintan mentioned this issue Jun 30, 2024

non-OpenAI LLM APIs configuration #487

Closed

github-actions bot removed the Stale label Jul 26, 2024

github-actions bot added the Stale label Aug 25, 2024

github-actions bot removed the Stale label Sep 6, 2024

github-actions bot added the Stale label Oct 6, 2024

suchintan removed the Stale label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does skyvern integrate with ollama litellm #242

How does skyvern integrate with ollama litellm #242

alexsiu398 commented Apr 29, 2024 •

edited

Loading

suchintan commented May 3, 2024

suchintan commented May 3, 2024

santiagoblanco22 commented May 10, 2024

suchintan commented May 10, 2024

OB42 commented May 10, 2024 •

edited

Loading

OB42 commented May 12, 2024 •

edited

Loading

github-actions bot commented Jun 13, 2024

github-actions bot commented Jun 27, 2024

github-actions bot commented Aug 25, 2024

HarryBak commented Sep 5, 2024

OB42 commented Sep 5, 2024 •

edited

Loading

HarryBak commented Sep 5, 2024

suchintan commented Sep 5, 2024

github-actions bot commented Oct 6, 2024

brooksc commented Oct 8, 2024 •

edited

Loading

suchintan commented Oct 8, 2024

brooksc commented Oct 8, 2024

suchintan commented Oct 8, 2024

DIGist commented Oct 9, 2024

suchintan commented Oct 9, 2024

How does skyvern integrate with ollama litellm #242

How does skyvern integrate with ollama litellm #242

Comments

alexsiu398 commented Apr 29, 2024 • edited Loading

suchintan commented May 3, 2024

suchintan commented May 3, 2024

santiagoblanco22 commented May 10, 2024

suchintan commented May 10, 2024

OB42 commented May 10, 2024 • edited Loading

OB42 commented May 12, 2024 • edited Loading

github-actions bot commented Jun 13, 2024

github-actions bot commented Jun 27, 2024

github-actions bot commented Aug 25, 2024

HarryBak commented Sep 5, 2024

OB42 commented Sep 5, 2024 • edited Loading

HarryBak commented Sep 5, 2024

suchintan commented Sep 5, 2024

github-actions bot commented Oct 6, 2024

brooksc commented Oct 8, 2024 • edited Loading

suchintan commented Oct 8, 2024

brooksc commented Oct 8, 2024

suchintan commented Oct 8, 2024

DIGist commented Oct 9, 2024

suchintan commented Oct 9, 2024

alexsiu398 commented Apr 29, 2024 •

edited

Loading

OB42 commented May 10, 2024 •

edited

Loading

OB42 commented May 12, 2024 •

edited

Loading

OB42 commented Sep 5, 2024 •

edited

Loading

brooksc commented Oct 8, 2024 •

edited

Loading