Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency trees get merged when they should not #3456

Open
dervoeti opened this issue Nov 18, 2024 · 2 comments · May be fixed by #3457
Open

Dependency trees get merged when they should not #3456

dervoeti opened this issue Nov 18, 2024 · 2 comments · May be fixed by #3457
Labels
bug Something isn't working

Comments

@dervoeti
Copy link
Contributor

dervoeti commented Nov 18, 2024

What happened:
When collecting multiple BOMs with the sbom-cataloger, the dependency trees of the BOMs might get mixed up.

Example:
My Rust app App1 has a direct dependency proc-macro2 1.0.89. This dependency has a transitive dependency unicode-ident 1.0.13.
App2 depends on proc-macro2 1.0.89 as well, plus also directly on unicode-ident 1.0.9 (a lower version than the transitive dependency in App1).

I generate SBOMs for both apps and package the apps alongside their SBOMs in a container. When I scan the container with the latest version of Syft and enable the sbom-cataloger, the dependency tree in the resulting SBOM looks like this:

image
This is because the same package ID is generated for proc-macro2 1.0.89 and both packages are merged into one, which is understandable. But I think since the transitive dependencies can differ, even if the package itself is the same, the package ID should depend on some identifier of the cataloged SBOM as well. This effectively means the dependency trees for all cataloged SBOMs will be treated separately instead of being merged into one tree.

I think that this problem is not specific to Rust and affects other languages as well, at least I can think of similar examples in Java.

The change in this PR is just to demonstrate a very hacky solution specific to CycloneDX SBOMs with a serialnumber, it's definitely not a proper solution, hence I marked this PR as draft: I just pass the bom object into collectPackages to be able to access the BOM's serialnumber. I then temporarily prepend that serialnumber to the package name to make the package ID dependant on the BOM it was found in (two equal packages in different BOMs will now have different IDs). If you think the change in general makes sense, we could discuss a proper solution.

I generated the SBOM for my example app with the modified version of Syft and it looks as expected, the dependency trees are separate (proc-macro2 1.0.89 is in there twice, with two different package IDs).

What you expected to happen:
An SBOM with a dependecy tree like this:

image

Steps to reproduce the issue:
See https://github.com/dervoeti/syft-reproduce-dependency-overlap

Anything else we need to know?:
SPDX would need this as well, maybe all catalogers that obtain information about dependency relationships would need adjustments.

Environment:

  • Output of syft version: 1.14
  • OS (e.g: cat /etc/os-release or similar): NixOS 24.05 (Uakari)
@dervoeti dervoeti added the bug Something isn't working label Nov 18, 2024
@dervoeti dervoeti linked a pull request Nov 18, 2024 that will close this issue
5 tasks
@wagoodman
Copy link
Contributor

A couple of points (but I haven't dug too deep yet):

  1. the sbom-cataloger specifically (not other catalogers) is merging package graphs together when multiple SBOMs are found with similar dependencies. Elsewhere in syft we attempt to keep these kinds of project dependency trees separate (as what you are suggesting should be done in this case). We look to be adding the contextual SBOM location on each package discovered from each SBOM, which means in theory we should have distinct package IDs (thus distinct graphs as you are asking for). This gets to the second point...

  2. This might only be an issue for cyclonedx SBOMs. I'm curious if we repeat this same setup again (thanks for the reproduction repo 🙌 ) but for spdx or syft json (or a mix) do we see the same merging of project dependency graphs?

@wagoodman wagoodman moved this to Ready in OSS Nov 18, 2024
@dervoeti
Copy link
Contributor Author

dervoeti commented Nov 19, 2024

Thanks for the quick response!
Yeah, makes sense that only sbom-cataloger is affected if that's the only one merging packages.

I tried to reproduce the behavior with SPDX, I added two Java apps to my repo and generated SBOMs for them with mvn spdx:createSPDX.

They both depend on the same version of Gson and a different version of error_prone_annotations (transitive dependency through Gson). The resulting SPDX SBOM produced by Syft contains only one Gson component. However it's not really problematic in that case, since the sbom-cataloger does not pick up all the dependency information from the SPDX SBOMs generated by Maven anyway. The resulting SBOM just contains a flat list of components for each cataloged SBOM.

Maybe I did something wrong, I don't know too much about SPDX, but I see the dependency relation between Gson and error_prone_annotations in the Maven generated SBOMs: https://github.com/dervoeti/syft-reproduce-dependency-overlap/blob/05ce5a52c4621d116a3a48b1cf1828644f3a200f/java/app1/target/site/com.example_java-app-1.0.0.spdx.json#L95-L98
They are missing in the final SBOM for the container.

But that's probably out of scope for this issue. So yes, it looks like this is currently only a problem for CycloneDX SBOMs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Ready
Development

Successfully merging a pull request may close this issue.

2 participants