Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to only provide causes of conflict when calling get_preference #131

Open
notatallshaw opened this issue May 29, 2023 · 2 comments
Open

Comments

@notatallshaw
Copy link
Collaborator

notatallshaw commented May 29, 2023

When #84 was implemented it gave the downstream library the information about whether something was the cause of a backtracking conflict or not.

However using this information is problematic from a performance perspective as it can create an O(n2) situation as the entire list of names is checked over and for each name the entire list of backtrack causes is checked over. Although in most cases n is small enough that it's negligible, it has created real world performance issues (pypa/pip#10621).

There have been of a couple of unmerged PRs to attempt to fix this via either by doing some fancy caching logic in Pip or defining a formal object structure for what a "Cause" object should look like and then add cache on to that object.

However a much simpler solution is to take the implicit loop out of get_preference and when there is a conflict for resolvelib to only pass names to the downstream library that are part of the backtrack cause. It seems to be this information is sufficiently generic for a resolving library that it can be an option of the algorithm itself and not just a preference of the downstream library.

I will start to make a PR and see how it performs, I opened this issue now to see if there is any direct objections to this approach.

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented May 29, 2023

Initial testing for this looks good, I consistently see a ~2.5 second save on running python -m pip download apache-airflow[all]==2.6.1 (with all downloads cached) or slightly above 1% of the total time, which might not sound like much but most of the time is being spent collecting requirements and building packages in isolated environments to extract the metadata so the fact the performance improvement is measurable and consistent is a big win here.

But actually I think there's an even bigger improvement from this approach, it allows to set a reporting event that lets the downstream library log the cause of backtracking, I think this will be very useful for users trying to debug why backtracking is happening in the first place.

I'll try and post a PR soon.

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Dec 14, 2023

My initial approach, #132, made incorrect assumptions about what information resolvelib could guarantee it has access to.

This would now be solved by #145, the provider would need to implement on their side which causes to filter out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant