Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip Dependency Resolver Selects Incompatible Package Version Leading to Installation Failure for AutoGluon #12990

Open
1 task done
tonyhoo opened this issue Oct 4, 2024 · 13 comments
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior

Comments

@tonyhoo
Copy link

tonyhoo commented Oct 4, 2024

Description

Starting around this Tuesday (10/01/2024), we got reports from users on installation failure of AutoGluon on previous working releases. When installing autogluon==1.1.1 using pip, the dependency resolver selects an older version of onnx (1.10.0), which fails to build due to missing files (specifically requirements.txt). Previously, pip would select a compatible version of onnx (e.g., 1.16.2 or 1.17.0), and the installation would succeed without issues. Meanwhile, we noticed that uv pip install autogluon==1.1.1 and pip install autogluon==1.1.1 --use-deprecated=legacy-resolver works as expected.

This change in behavior suggests a potential issue with the dependency resolution in pip, possibly related to recent updates either in pip itself or in one of the transitive dependencies. After some deep dive on our side, we are not able to pinpoint the root cause of the issue and would like to seek guidance from pip community for help. Any pointers would be appreciated

Expected behavior

  • pip should resolve and install compatible versions of all dependencies required by autogluon==1.1.1.
  • Specifically, it should select a version of onnx that successfully installs (e.g., onnx==1.17.0) instead of an older, incompatible version (onnx==1.10.0).

pip version

24.2

Python version

3.8, 3.9, 3.10, 3.11

OS

Linux x86_64/ARM, macOS Intel/ARM

How to Reproduce

  1. Create a new virtual environment:
python -m venv test_env
source test_env/bin/activate 
  1. Upgrade pip to the latest version:
pip install --upgrade pip
  1. Attempt to install autogluon==1.1.1 or any earlier versions:
pip install autogluon==1.1.1
  1. Observe the installation failure, particularly with the onnx package.

Output

Collecting autogluon==1.1.1
  Using cached autogluon-1.1.1-py3-none-any.whl (9.5 kB)
Collecting autogluon.core[all]==1.1.1
  Using cached autogluon.core-1.1.1-py3-none-any.whl (207 kB)
# ... (additional output truncated for brevity)
Collecting onnx
  Using cached onnx-1.10.0.tar.gz (10.0 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [11 lines of output]
      /tmp/pip-install-xxxxxx/onnx/setup.py:36: DeprecationWarning: Use shutil.which instead of find_executable
        CMAKE = find_executable('cmake3') or find_executable('cmake')
      /tmp/pip-install-xxxxxx/onnx/setup.py:37: DeprecationWarning: Use shutil.which instead of find_executable
        MAKE = find_executable('make')
      fatal: not a git repository (or any of the parent directories): .git
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-xxxxxx/onnx/setup.py", line 318, in <module>
          raise FileNotFoundError("Unable to find " + requirements_file)
      FileNotFoundError: Unable to find requirements.txt
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Code of Conduct

@tonyhoo tonyhoo added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Oct 4, 2024
@notatallshaw
Copy link
Member

notatallshaw commented Oct 4, 2024

I am going to take a look at why there was a change (it was almost certainly something that happened in the dependency tree, there hasn't been a pip release since July).

But first, to be clear, it's not possible to tell if an sdist is compatible unless pip tries to build it, and if the build failure happens pip takes that as an indication that the user does not have the requirements to install and exits backtracking.

The best practice suggestion is, in general, to add good lower bounds. Is that something you can do for your project? Add a good lower bound on onnx? Or at least recommend it to users when installing?

P.S. I believe there has been discussions about allowing packages to publish static metadata for sdists, but I'm not familiar where that idea has got to and if there's much traction on going forward with it, and I think it would require projects to opt in. Also, uv does a trick where it assumes the dependency is the same across all wheels and sdists for a given release and reads the dependency metadata out of a wheel, this is not something pip is likely ever to do.

@pfmoore
Copy link
Member

pfmoore commented Oct 4, 2024

Nothing has changed with pip since this Tuesday. Are you able to confirm which is the first version of pip to exhibit this behaviour? I tried pip 23.1 and, while it appears to take a different route through the dependency tree, it ultimately still arrives at onnx 1.10.0 and fails.

As a comparison, I tried with the legacy resolver. It installed, but failed pip check:

❯ pip check
aliyun-python-sdk-core 2.15.2 has requirement jmespath<1.0.0,>=0.9.3, but you have jmespath 1.0.1.
blis 1.0.1 has requirement numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4.
datasets 3.0.1 has requirement dill<0.3.9,>=0.3.0, but you have dill 0.3.9.
datasets 3.0.1 has requirement fsspec[http]<=2024.6.1,>=2023.1.0, but you have fsspec 2024.9.0.
openxlab 0.1.1 has requirement filelock~=3.14.0, but you have filelock 3.16.1.
openxlab 0.1.1 has requirement pytz~=2023.3, but you have pytz 2024.2.
openxlab 0.1.1 has requirement requests~=2.28.2, but you have requests 2.32.3.
openxlab 0.1.1 has requirement rich~=13.4.2, but you have rich 13.9.2.
openxlab 0.1.1 has requirement setuptools~=60.2.0, but you have setuptools 75.1.0.
openxlab 0.1.1 has requirement tqdm~=4.65.0, but you have tqdm 4.66.5.
optimum 1.18.1 has requirement transformers[sentencepiece]<4.40.0,>=4.26.0, but you have transformers 4.40.2.
thinc 8.3.2 has requirement numpy<2.1.0,>=2.0.0; python_version >= "3.9", but you have numpy 1.26.4.

I also tried uv, which does install, as you say, and does not produce a broken environment.

(This was on Windows with Python 3.11, FWIW).

I don't know what might be wrong here. Maybe @notatallshaw can help, as he's done a lot of work on complex resolution issues.

@pfmoore
Copy link
Member

pfmoore commented Oct 4, 2024

(Whoops, @notatallshaw - our messages passed in the ether 🙂)

@notatallshaw
Copy link
Member

notatallshaw commented Oct 4, 2024

(Whoops, @notatallshaw - our messages passed in the ether 🙂)

😄

I also tried uv, which does install, as you say, and does not produce a broken environment.

Note my comment on a trick uv does which helps it avoid trying to build incompatible sdists:

uv does a trick where it assumes the dependency metadata is the same across all wheels and sdists for a given release and reads the dependency metadata out of a wheel

It might not be doing that here, but it could be. uv takes a different resolution path as it has different backtrack priorities (and in general they are a little lacking compared to pip's, but that could also be the difference here).

@pfmoore
Copy link
Member

pfmoore commented Oct 4, 2024

Assuming consistent metadata could indeed be affecting things - good catch! As you say, that's not something pip is going to do. I guess it's possible that if pip had been able to build the failing version(s) of onnx, we would also have ultimately arrived at a valid resolve. But we rely on being able to get metadata, unlike uv we don't ever assume it.

I believe there has been discussions about allowing packages to publish static metadata for sdists

This exists - it's metadata 2.2 (specifically, the "Dynamic" metadata item). But adoption of metadata 2.2 was delayed because PyPI didn't support it for some time, and so build backends didn't produce it. And pip doesn't have support yet because there's little point until packages start including it.

It wouldn't help in this case, though, as only new uploads will have modern metadata.

Ultimately, onnx needs to fix that broken sdist, though.

@tonyhoo
Copy link
Author

tonyhoo commented Oct 4, 2024

Thank you for your prompt reply. Considering the absence of recent releases on pip, the issue might be linked to the other package releases in our dependency trees, possibly causing the later versions of onnx to be incompatible. Could you advise me on how to identify the problematic package during the installation process? I tried pip install autogluon==1.1.1 -vvv --debug but it didn't prove successful.

@Innixma
Copy link

Innixma commented Oct 4, 2024

Ultimately, onnx needs to fix that broken sdist, though.

That is what I somewhat feared. The AutoGluon dependency tree is more complex than most packages, since it is an AutoML system with many large dependencies that themselves have complex dependency trees.

For onnx, we don't explicitly depend on it, rather we are calling a different package's optional dependency that installs onnx: "optimum[onnxruntime]>=1.17,<1.19",, and this package provides no lower or upper bound to onnx.

So I suppose even though we don't explicitly depend on onnx, we need to include it in our setup.py with version ranges whenever we also install optimum[onnxruntime]. However, it seems strange to me that the same setup issue doesn't occur when installing optimum[onnxruntime] standalone. I'm not the most familiar with complex dependency resolution issues though.

@notatallshaw
Copy link
Member

notatallshaw commented Oct 5, 2024

Here's what I've found so far:

Pip gets confused by the dependency chain spacy -> blis -> thinc, you'll notice that uv resolves to spacy 3.7.5 and spacy has a series of yanked releases (so you may seen this issue intermittently) but finally released 3.8.2 on 2024-10-01: https://pypi.org/project/spacy/3.8.2/#history

If you add constraints on blis < 1 or thinc < 8.3 or spacy < 3.8, then pip can resolve, I wouldn't normally advise adding upper bounds (they tend to cause long term problems), but it may be worth as a temporary fix.

The good news is pip will be able to resolve this correctly and fast once #12317 lands (or at least the extracted logic from it). The bad news is that it's waiting on a vendor from resolvelib, and that vendor from resolvelib will initially make pip very slow at this resolution (due to correctness bug fixes), so once the vendor does happen I will push for the associated changes to make resolution quicker again.

Also, I might have a simpler idea to fix this kind of complicated resolution than the ones in #12317 that could help here, I will test once I get a chance (probably in a few days).

@tonyhoo
Copy link
Author

tonyhoo commented Oct 6, 2024

Thank you for taking the time to investigate this issue. I can verify that downgrading the spaCy version to 3.7.5 is effective for us. To unblock the installation of all released AutoGluon versions without having to release a new one with a pinned upper limit on spaCy, could spaCy potentially lower its thinc lower limit at this location? Please inform us if this strategy is viable and if there is a quick method to validate it on our side before liaising with spaCy.

@notatallshaw
Copy link
Member

notatallshaw commented Oct 6, 2024

could spaCy potentially lower its thinc lower limit at this location? Could a new patch version be generated for this modification? Please inform us if this strategy is viable and if there is a quick method to validate it on our side before liaising with spaCy.

I don't know, you need to talk to spaCy why they have such a tight requirement on thinc.

As a library being tightly coupled to the versions of a dependency will inevitably cause resolution problems for users of that library, e.g. if two high level data science libraries have tight dependencies on numpy and they are different versions, then a user can never install both high level data science libraries.

In other news, I have found a way to improve pip to handle this resolution much better without too complicated of a change: #12993. I hope to make a PR within a few days.

@tonyhoo
Copy link
Author

tonyhoo commented Oct 7, 2024

We have cut an issue to spaCy to see if they can relax their version requirements on thinc and patch a release

@tonyhoo
Copy link
Author

tonyhoo commented Oct 10, 2024

Any update on the status for the fix and release? We are evaluating if we should perform a patch release to unblock users depends on how soon the issue can be resolved

@notatallshaw
Copy link
Member

I rate the chances of making the Pip 24.3 release as possible but low.

Changes to the resolver preference should ideally come with a lot of evidence that they are a net benefit, I am working on building a script that can automatically run a series of scenarios and give and objective measurement if this change made a difference. If I can get it ready in time and if it's convincing enough that it's a high benefit low risk change to the pip release manager then it will get in.

Otherwise it will have to wait till pip 25.0 (scheduled for January 2025).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants