Proposal: Enhanced purls for generic artifacts #692

brunoapimentel · 2024-10-21T01:17:35Z

This draft introduces a more generic alternative to solve the need for specific Maven purls covered in #663.

In summary, it is a mechanism that would allow us to support a subset of purl types that do not fit into existing package manager ecosystems. Cachi2 would receive purls as input, resolve them into a download URL, and report them in the SBOM.

Maintainers will complete the following section

Commit messages are descriptive enough
Code coverage from testing does not decrease and new code is covered
Docs updated (if applicable)
Docs links in the code are still valid (if docs were updated)

docs/design/generic-enhanced-purls.md

a-ovchinnikov · 2024-10-21T11:33:13Z

docs/design/generic-enhanced-purls.md

+### Cachi2 CLI usage
+
+```
+cachi2 fetch-deps --source /path/to/repo generic


Would we want to have an ability to add arbitrary configs to the mix?

cachi2 fetch-deps --source /path/to/repo generic --add-to-the-mix ../generic-artifacts.yaml --detect-components-with ../cachi2-config.yaml

I still cannot wrap my head around cachi2-specific configs in (potentially) unrelated repos.

I think we'll just use/enhance the existing config-file functionality, right?

I think Alexey is refering to the lockfile itself. I've yet to see users complaining about adding the lockfiles, but --add-to-the-mix is definitely a possibility.

The main concern I have against it is that, by having everything committed to the repo, we can improve the reproducibility of the requests.

Yes, we can default to a location within a repo and use defaut configuration mechanism, but would need a way to point to some external config which will be version-controlled elsewhere. Suppose someone needs to build ScaPy (or some other kilostarred package) and suppose they also need some prebuilt generic artifact. I don't think it is safe to assume that upstream ScaPy would just accept any cachi2 yaml. If it is a local clone the someone would still need to add the config and maintain it. Likely not a blocker right now, but I feel like we'll need to deal with this at some point.

I agree with some of the @a-ovchinnikov's reasoning when it comes to checking-in some random file, but then again, imagine a blackbox pipeline, where would such a file come from? It might easily become a devops engineer's nightmare. I also agree with @brunoapimentel that from reproducibility POV and ease of debugging you want users to commit the file to their repos. Now, why would any random big upstream care about cachi2? Exactly, they won't, that's why downstream exists and then such a file is perfectly fine to be checked in to a repo. However, downstream isn't very popular in K8S world and so users tend to solve it with git submodules instead to compensate for the need of downstream control and here again, checking in the cachi2 lockfile is perfectly fine. I don't even think we need a different way of providing it ATM. If the need arises, we can discuss again.

docs/design/generic-enhanced-purls.md

eskultety · 2024-10-22T09:46:29Z

docs/design/generic-enhanced-purls.md

+
+## Enhanced purls overview
+
+- Implement an `enhancer` for every supported purl type, which is essentially a set of rules that will be applied to a generic artifact and, in case those rules can be matched, it will replace the generic purl for a more specific type.


Well, in that case we have to be prepared for the burden of pretty much of being in sync with most of the PURL spec and keeping up with different pure data PURL types. I'm not a fan per se because we'll essentially help co-maintain the PURL-spec from the shadows (or at least be one of the loudest complaining stakeholders) - IOW the PURL spec currently doesn't keep up with itself and many of their current user RFEs on spec enhancement stem from the fact that many PURL types are defined very vaguely, giving too much room for interpretation to the implementations. The biggest problem with that is that there ARE NO releases and so if the spec needs to fix something, it'll break our implementation. Additionally, there currently isn't ONE true implementation of the spec which complicates lives of the few spec readers out there and now we'll become one of them.
Note it's not that much about the resulting PURLs themselves, but the qualifiers we'll need to keep up with.

You can also count on requests asking for adding support for their own super-duper qualifier (yep, syft) which doesn't exist in the spec, we'll have to be very careful about that and not accept anything that isn't in the spec! (then again, RPM PURLs suck so much that we've already done that, so there goes...)

Since the main motivation here is Maven the other alternative would truly be a different lockfile, but the same backend code for the fetcher. While that may have started to make sense to me in retrospect, I don't think we'd end there and I can already see requests for those various data-typed PURLs describing the artifacts which means we'd have to do what you propose at some point anyway.

This seems like a more complicated approach to me than what is proposed in #663

Besides dealing with complexity of recognizing component types and creating proper PURLs for them, this proposal is based on the assumption that users will actually want Cachi2 to do that.
If a user provides a PURL to fetch, how do you know that the user will prefer Cachi2 to change it to something else?
How can a user be in control of the recorded PURLs?
What if Cachi2 can't get it right (for some reason) but the user knows what it should be, how could a user provide the desired target PURL?

IMO, whatever a user provides should not be altered.

I think this is the core of the discussion that led us to the change. We, as Cachi2 maintainers, are not confident we should allow users to specify how the reported purls should be. And to clarify, this design proposes that only resolved download_urls are used as input, which means we wouldn't accept purls as input.

If we consider Cachi2 as a standalone project, its goal is to prefetch the dependencies necessary for a hermetic build, and report them the most accurate way possible, but only to the extent of the info it has at hand during the prefetch. Sometimes, the user will have a better purl than what Cachi2 can provide. That is not true in all cases, though. Allowing purls to be passed through creates a precedent that goes against our ultimate goal of accurate SBOMs.

Now, if we need better purls for our specific internal case, maybe they can be enhanced in other points of the pipeline? But we can lead this discussion somewhere else.

this design proposes that only resolved download_urls are used as input

Thanks, I overlooked that.

If we consider Cachi2 as a standalone project, its goal is to prefetch the dependencies necessary for a hermetic build, and report them the most accurate way possible, but only to the extent of the info it has at hand during the prefetch.

Agreed.

Allowing purls to be passed through creates a precedent that goes against our ultimate goal of accurate SBOMs.

Could you please elaborate on that?

Whatever Cachi2 produces will be produced (ultimately) for the user who provides the input. The user will be responsible for the quality of the input and, as a consequence, the output.

Whatever Cachi2 produces will be produced (ultimately) for the user who provides the input. The user will be responsible for the quality of the input and, as a consequence, the output.

I agree the user has substantial responsibility towards the quality of the input/output, but Cachi2 has traditionally been on the restrictive side of things. For instance, for pip, Cachi2 simply won't process repos that don't have a fully resolved requirements file (with pinned versions and hashes). Cachi2 will never execute a setup.py file because this allows the execution of arbitrary code, which undermines our certainty that the only downloaded files were exactly the ones that are described in a certain ref of a repo.

Having Cachi2 receive a purl as input, resolve it to download an artifact, and simply copy paste the purl to the output SBOM seems to be opening a wide scope of things to happen. From the accuracy perspective, it could allow from simple "typos" to non-compliant purls to be passed through. From a practical standpoint, we would still need to validate the purls to an extent, and to keep up with the purl-spec.

Looking from another perspective, what we're proposing here is already allowing the user to specify the purl (with a limitation of the subset of purl types and details about the ouput). The decision here is mostly to not have purls as input, and to limit what can be done as output. So my question is, would the output purl for maven proposed be outside of what you expect? Are there any corner cases of concern?

Sometimes, the user will have a better purl than what Cachi2 can provide

That's the sad reality of the PURL spec. In ~~an ideal~~ a normal world the spec (like any other spec) itself would define what a perfect PURL looks like for each artifact type and with cachi2 strictly following the spec this statement would hold no ground; PURL spec doesn't live in that world :( .

Whatever Cachi2 produces will be produced (ultimately) for the user who provides the input. The user will be responsible for the quality of the input and, as a consequence, the output.

@aloubyansky this is AFAIK called tainting in our field, i.e. consuming user input with little to no validation and use it for output generation (in this case taking the PURL verbatim) and as such tainting is undesirable in general and so we'd like to avoid it by forcing users to provide the information in form of attributes leading to the desired PURL on the output. This is especially problematic with the PURL spec, which, like I mentioned earlier somewhere the spec is quite vague for some artifact types and we'd have to trust user's judgement on the quality of an input PURL.

I'll have ~6500 Maven artifacts in a lockfile. It'd be a lot easier to simply express essentially the same info in a single PURL.

Now I am not sure how different this would be for other component types.

@aloubyansky I guess you kinda answered your concerns with ^this - cachi2 cannot tailor a solution to a single use case knowing not all consumers would do their due diligence when it comes to providing input using PURLs straightaway.

I honestly don't see yet how this proposal is better from the "undesirable tainting" perspective.
Could we have an example side-by-side of a final PURL vs a generic one + extra info that would show the advantage of the latter?

Cachi2 is already hooked on PURLs either way.

i.e. consuming user input with little to no validation and use it for output generation (in this case taking the PURL verbatim) and as such tainting is undesirable in general and so we'd like to avoid it by forcing users to provide the information in form of attributes leading to the desired PURL on the output.

I don't think this is true. Cachi2 is quite literally in control of how much validation it does. It does not have to take the purl verbatim, it is free to decompose it into individual attributes and validate those.

I guess you kinda answered your concerns with ^this - cachi2 cannot tailor a solution to a single use case knowing not all consumers would do their due diligence when it comes to providing input using PURLs straightaway.

You're thinking about other use cases and that is good. However, those are so far all theoretical, while this is a very real one. I can do some research and give you the data, if other purl types convert to url in similar way, if that's the issue.

Other than that, in the case of the custom lockfile for the generic fetcher, you suggested that we use SBOMs instead, which would likely involve letting users supply the purl as part of those SBOM components. I can't see why now, when there's decision between an established format for specifying packages (purl) and a custom implementation that is essentially url + some custom attributes, your reasoning is different.

I can do some research and give you the data, if other purl types convert to url in similar way, if that's the issue.

I think it'd be worth it

I don't think this is true. Cachi2 is quite literally in control of how much validation it does. It does not have to take the purl verbatim, it is free to decompose it into individual attributes and validate those.

Does such decomposition involve detection of all extra non-spec qualifiers that we could flag and forbid? Because with YAML attributes this is as easy as validating the lockfile model and failing early and loudly about unrecognized attributes. Besides, I have a feeling that accepting a PURL is inviting the user to expect that, well, since we already consumed the PURL we just use it, right? Not really. Admittedly, either way this is an implementation detail.

Other than that, in the case of the custom lockfile for the generic fetcher, you suggested that we use SBOMs instead, which would likely involve letting users supply the purl as part of those SBOM components. I can't see why now, when there's decision between an established format for specifying packages (purl) and a custom implementation that is essentially url + some custom attributes, your reasoning is different.

@kosciCZ I don't think it is different at all, actually. The proposal of input SBOMs was shut down as too complex and we moved on to a different approach. I don't think I ever advocated for input PURLs anywhere (going back to #652 and more specifically this thread: #652 (comment)), for me it was the future proof type of input when it comes to the plethora of standardized attributes we could consume and look at during artifact processing from an input SBOM, PURLs are just an inherent part of SBOMs which we'd get as a side effect we could not do anything about at all if we had decided to consume it, so I think we may have misunderstood each other in terms of my stance towards PURLs in general. However, this is a new proposal where we have the choice for all input data type/format.

I can do some research and give you the data, if other purl types convert to url in similar way, if that's the issue.

I'm not sure I follow the outcome here, I guess you mean if other PURLs have a direct download URL mandatory? Is that what you're saying you'd like to check?

docs/design/generic-enhanced-purls.md

eskultety · 2024-10-22T10:01:09Z

docs/design/generic-enhanced-purls.md

+### Cachi2 CLI usage
+
+```
+cachi2 fetch-deps --source /path/to/repo generic


I agree with some of the @a-ovchinnikov's reasoning when it comes to checking-in some random file, but then again, imagine a blackbox pipeline, where would such a file come from? It might easily become a devops engineer's nightmare. I also agree with @brunoapimentel that from reproducibility POV and ease of debugging you want users to commit the file to their repos. Now, why would any random big upstream care about cachi2? Exactly, they won't, that's why downstream exists and then such a file is perfectly fine to be checked in to a repo. However, downstream isn't very popular in K8S world and so users tend to solve it with git submodules instead to compensate for the need of downstream control and here again, checking in the cachi2 lockfile is perfectly fine. I don't even think we need a different way of providing it ATM. If the need arises, we can discuss again.

docs/design/generic-enhanced-purls.md

brunoapimentel · 2024-10-31T14:44:47Z

New pushes: rewrite of the design to consider consuming purls as input, and how the resolution would look like for specific purl types.

arewm

Do we have a record of what confidence Cachi2 has for the reported contents for the various supported package managers outside of this proposal?

arewm · 2024-10-31T17:32:10Z

docs/design/generic-enhanced-purls.md

+- Parse the purl by using the [packageurl-python](https://github.com/package-url/packageurl-python) library
+- Validate that the purl is within the supported types


Earlier you mentioned that purls are reported as generic. Would Cachi2 also support a generic purl as input? If the supported types mentioned below are not used, would Cachi2 try to just fall back to fetch a generic artifact?

I think we should keep the current lockfile option of provinding donwload_url and checksums for the pkg:generic cases, but we can also support consuming purls as input for the sake of consistency.

As for the types, I think we need to be very explicit about which ones are supported, and anything that falls outside of that means a failed request.

So the only way that users would be able to prefetch in a way that produces a generic purl would be through the current lockfile option? There is no proposal to add a purl-supported generic package fetching?

There isn't, but we can add one. Do you think it's worth doing it now?

To be clear, will this proposal generate purls with a pkg:generic type? Or since the generic type isn't supported, we will never be able to produce those purls?

We can always add it later if you don't think that there is a specific need for it with this proposal.

No, I did not write anything about consuming input pkg:generic purls here because this is already supported by the current lockfile. So, initally, Cachi2 would not consume pkg:generic purls and for this reason, users wouldn't be able to produce pkg:generic SBOM components by providing input purls.

The mechanism to extend support to any purl type it is already described here, though, so supporting input pkg:generic is trivial (and we probably should, for the sake of consistency).

ack. completeness for this might be beneficial, but it isn't relevant to the acceptance/rejection of the proposal.

If you support generic with this method, the question might be asked in the future if you should deprecate one of the two generic package fetching strategies. It is probably fine to leave the trivial implementation until it is requested specifically?

docs/design/generic-enhanced-purls.md

arewm · 2024-10-31T17:38:16Z

docs/design/generic-enhanced-purls.md

@@ -0,0 +1,233 @@
+# Support for different purl types in the generic artifact fetcher
+
+The generic artifact package manager is being added to Cachi2 as a means for users to introduce files that do not belong to traditional package manager ecosystems (e.g. pip, npm, golang) to their hermetic container builds. Since Cachi2 does not have any extra information about the file that's being fetched, the purls are always reported as [pkg:generic](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#generic).


Is this a generic purl type and/or a generic property, i.e.

"properties": [ { "name": "cachi2:found_by", "value": "cachi2:generic" }

In this case, a generic purl type. The property is a way to inform which backend/package manager in Cachi2 produced that SBOM component.

Would it be possible to add an example purl that Cachi2 might generate with an input that could produce it?

Do you need an example different from the ones in https://github.com/containerbuildsystem/cachi2/pull/692/files#diff-aadc5e0bf6d8c1eefd2742bad409f3f2fb7a684aa7cbe0d47ebfb752e07d9fd6R31?

I guess it is sufficient. I cannot remember why I was asking specifically. This can be resolved.

arewm · 2024-10-31T17:39:23Z

docs/design/generic-enhanced-purls.md

+
+## Initial thoughts
+
+From a Cachi2 perspective, we can separate purls types into ones that are part of existing package manager ecosystems (such as nuget, composer, maven) and ones that are not (github, huggingface, oci):


What benefit do we have from this separation? Do we do anything with this classification?

We don't do anything with them. I initially had a hard time wrapping my head around non-package manager types (oci, github, huggingface), because they don't simply point to a file in a URL. I initally assumed we don't want to allow this feature to cover other existing package managers, because we'd want full support for a package manager.

I think the question here is: how flexible we want this feature to become. It started as a way to fetch a files, but do we want to extend it to OCI artifacts or even git repos?

Any of these artifact types vary with how metadata is stored and associated with the content that is downloaded.

Being able to specify/support purl types that could match to future package manager support is one way to simplify the required work for fetching content. The downside to this (from a users' perspective) is that the native dependency files for the package managers are not supported. If, in the future, a package manager-native way to resolve dependencies is developed, it would be an option for the Cachi2 tooling to deprecate the generic purl-based fetching mechanism. This deprecation would, of course, require a breaking change.

I would consider flexibility to be driven by a case-by-case basis. If there is requested support for fetching some form of artifact, is it required to have the native dependency format (i.e. lockfiles)? Or is it possible/reasonable to implement the fetching with a generic purl-based approach? Implementation with a purl-based approach seems like it would at least be faster if that is an acceptable solution.

simplify the required work for fetching content

We want to make sure, for anything with an existing package manager, that we don't break builds, i.e. anything which happens after the fetch.

For instance: if we have to handle some arbitrary level of complexity to resolve the dependency graph we need to download, and/or some arbitrarily complex setup step is involved (e.g. a specific filesystem layout)... we're better off simply using the existing package manager ecosystem and it's lockfiles to resolve the dependency graph, get URLs, download to correct locations from those URLs, etc...

To a first approximation, cachi2 is a very smart "package manager wrapper" - our secondary brief is to make sure that e.g. go build, pip install -r foo, yarn (er... build? >_<) succeed, and our primary brief is that we control the downloads which the build relies on.

I don't think that it is Cachi2's job to ensure that builds are not broken if a purl-driven package manager was used. This should be handled by users' specifications of what content to download ... whether it is by specifying a path to a lockfile or a set of purls. It does, however, seem reasonable that users expectations are set based on the choice of interface.

For example, you may have a lockfile that only has one tarball: https://github.com/stolostron/image-builder/blob/master/yarn-source/package-lock.json

A user should be able to use the npm package manager to resolve that if they want to maintain the package lock. If they don't want to maintain the package lock, they should be able to just fetch that tarball.

Similarly, if there is not support for a specific package manager in Cachi2 but a user wants to prefetch all content, they should be able to take the complexity to individually specify all resolved packages for their builds. Cachi2 wouldn't know that there is complete resolution of artifacts but that is irrelevant.

If Cachi2 wants to record how dependencies were defined (i.e. if they were resolved with a package manager or defined individually), it seems reasonable that the generated SBOM would have metadata to indicate how the specific dependency was resolved.

docs/design/generic-enhanced-purls.md

arewm · 2024-10-31T17:40:59Z

docs/design/generic-enhanced-purls.md

+
+### Decision points
+
+- Should the checksums be specified as part of the input purl?


What are the Cachi2 requirements for the checksums for other package managers? This should at least be a best practice for Cachi2 users if it isn't a requirement.

We will require checksums, it is one of the few validations we have in place. The question here is about where the checksum should be in the lockfile. A generic artifact looks like this:

- download_url: checksums: https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors?download=true sha256: 90bffe1884b84d5e255f12ff0ecbd70f2edfc877b68d612dc6fb50638b3ac17c

The artifact with a specific type will have a purl instead of the download_url, so should we drop the extra checksums attribute and make it part of the purl?

If the checksum is a simple/standard addition to a purl, it makes sense to require it. The usability of keeping the checksum separate would just be to make it simpler to see/update it.

Would it be reasonable to support both flows? If the checksum is specified separately, Cachi2 would still generate the purl with the checksum to inject into the SBOM.

FWIW, CycloneDX components have dedicated fields for checksums (hashes). Perhaps both ways are supported. The CDX maven generator includes ~8 checksums. Probably more than necessary and it doesn't add them to purls, which is probably not a bad idea in such a case.

I changed the design so that checksums are only consumed as separate attributes and only a single one of them is reported in the output purl. This is mainly to keep consistency with other existing package managers: we always report a single checksum in case of file dependencies.

We can extend this to support checksums in the purl, in case the need comes up.

That makes sense. Presumably Cachi2 would verify all checksums that are provided and it would report the "most secure" one, i.e. sha256 over md5.

arewm · 2024-10-31T17:41:30Z

docs/design/generic-enhanced-purls.md

+### Decision points
+
+- Should the checksums be specified as part of the input purl?
+- Should we limit the qualifiers to the types that are strictly available to that type?


What do you mean by this? Are there known qualifiers that should be allowed in the taxonomy for the type?

Yes, the purl specification loosely determines what are the qualifiers for each type. See oci, for example.

There are also global qualifiers that can be applied to any type (check the qualifiers bullet here).

I think I would expect the qualifiers to be verified as part of the parser. Cachi2 would need to ensure that the required qualifiers are present and that the optional ones may be present.

Which of the package-specific qualifiers that are directly specified would Cachi2 use? For example, would OCI artifact support be able to handle the repository_url, namespace, version, ... or would the input purl provide those plus a download_url? Or would support for these qualifiers be considered more for "full" package manager support instead of generic?

The specification for oci seems to indicate the use of repository_url.

I changed the design so that it mentions we will validate the qualifiers depending on type, it is probably better than to leave it completely open.

I haven't read the content that you have changed yet, but that change makes sense.

docs/design/generic-enhanced-purls.md

arewm · 2024-10-31T17:46:47Z

docs/design/generic-enhanced-purls.md

+- Should the checksums be specified as part of the input purl?
+- Should we limit the qualifiers to the types that are strictly available to that type?
+- Should we allow types that are not files (git repos, OCI artifacts)? Should they be reported as different component types?
+- How should our policy be regarding extending the generic fetcher for other package managers we don't fully support? Would this impact the will of contributors to provide full support for a package manager?


What would the difference be between fetching some dependency using a supported package manager vs. using the generic artifact method for retrieving content? Is there any difference in the guarantees/fidelity/integrity of the data?

If there are differences, then I feel like I would tend towards failing the use of the generic artifact fetcher in favor of the more specific tool.

I was thinking about this. If we compare it to most package managers that allow file dependencies (e.g. npm, pip), in the end, we're still using aiohttp to download the files, the main difference is that the dependency is defined in the expected lockfile for that package manager (package.json, requirements.txt), which gives a little more confidence on the package type. I don't think this means we have any other guarantees, nor greater confidence in fidelity or integrity. Any file downloaded will have its checksum validated, so we're sure we're getting the right content.

From this reasoning, we could probably even accept the generic fetcher to download pip or other supported package manager artifacts, which would only be useful in case a user needed a single artifact and did not want to initialize a project in his repo.

My comment earlier is also relevant for this thread: #692 (comment)

What you say here makes sense and I think it matches with my mental model of the interface/results.

docs/design/generic-enhanced-purls.md

brunoapimentel · 2024-11-04T22:58:37Z

docs/design/generic-enhanced-purls.md

+}
+```
+
+## Alternative: consume options instead of a purl


@aloubyansky How do you feel about this alternate proposal? Would it be difficult to generate the lockfiles for PNC/Maven artifacts?

My initial feeling is that having the attributes broken down is clearer and avoids the need to parse the purl according to the spec. The flexibilty to define the output purl would still exist, since we can add any needed attributes in order to generate the purl we want.

@arewm Wdyt?

It could work. To make the options complete for Maven, there could also be the classifier and the version. It'd be x6/7 times more lines per artifact compared to providing just the PURL (for Maven at least it's pretty straightforward).

I haven't read the alternative yet, but based on your comment, it is effectively the effect your comment above: #692 (comment)?

It is a recognition that different purl types will have different field requirements, so you would expose those fields as being required?

It seems like it would be harder to validate the validity of fields because it could be possible for a user to add content to an attribute which would end up matching some other part of the purl spec (i.e. if control characters are not properly escaped). If you default to consuming a purl itself, then you could validate the parsing of the purl and then only update those specific sub-elements which are relevant for cachi2 to control (i.e. uri/hash).

To record some parts of a conversation that I had out-of-band:

I don't see an issue with this approach, but you should ensure that you are making the decision for the proper reason.

With this method, I think it would still be required to validate the entire purl after it is assembled. Therefore, I don't think that this would alleviate the need to parse a purl

This seems to mostly affect user input. Is it a better user experience to provide values based on some set of required/optional keys? Would users be more likely to do this properly than if they were to specify the purl in its entirety?

Will the verbosity (i.e. the increased number of lines in the configuration file) be an issue? Or would it be a desired feature?

arewm · 2024-11-05T21:00:59Z

docs/design/generic-enhanced-purls.md

+  - purl: pkg:maven/io.quarkus/[email protected]?type=jar&repository_url=https://maven.repository.redhat.com/ga
+    checksums:
+      sha256: d16bf783cb6670f7f692ad7d6885ab957c63cfc1b9649bc4a3ba1cfbdfd5230c
+    target: quarkus.jar


Is this target the location on-disk for where the fetch artifacts would be put? Would all artifacts need to specify this so that they can be appropriately used by some later stage or would there be default/assumptions for package managers about locations and file names?

Signed-off-by: Bruno Pimentel <[email protected]>

brunoapimentel force-pushed the purl-enhancer-proposal branch from ff79377 to 65d82ae Compare October 21, 2024 01:18

a-ovchinnikov reviewed Oct 21, 2024

View reviewed changes

eskultety reviewed Oct 22, 2024

View reviewed changes

docs/design/generic-enhanced-purls.md Outdated Show resolved Hide resolved

aloubyansky reviewed Oct 22, 2024

View reviewed changes

docs/design/generic-enhanced-purls.md Outdated Show resolved Hide resolved

aloubyansky reviewed Oct 23, 2024

View reviewed changes

docs/design/generic-enhanced-purls.md Outdated Show resolved Hide resolved

brunoapimentel force-pushed the purl-enhancer-proposal branch from 65d82ae to 5520fdd Compare October 23, 2024 13:20

brunoapimentel force-pushed the purl-enhancer-proposal branch 3 times, most recently from 3a1c1c3 to d1255d5 Compare October 30, 2024 20:58

brunoapimentel changed the title ~~Draft proposal: Enhanced purls for generic artifacts~~ Proposal: Enhanced purls for generic artifacts Oct 30, 2024

brunoapimentel force-pushed the purl-enhancer-proposal branch from d1255d5 to 1d0c50c Compare October 31, 2024 14:42

arewm reviewed Oct 31, 2024

View reviewed changes

brunoapimentel marked this pull request as draft October 31, 2024 19:06

aloubyansky reviewed Oct 31, 2024

View reviewed changes

docs/design/generic-enhanced-purls.md Outdated Show resolved Hide resolved

aloubyansky reviewed Oct 31, 2024

View reviewed changes

docs/design/generic-enhanced-purls.md Outdated Show resolved Hide resolved

brunoapimentel force-pushed the purl-enhancer-proposal branch 4 times, most recently from 060befa to edac7be Compare November 4, 2024 21:57

brunoapimentel commented Nov 4, 2024

View reviewed changes

arewm reviewed Nov 5, 2024

View reviewed changes

brunoapimentel force-pushed the purl-enhancer-proposal branch 2 times, most recently from 593deda to 62fb6d2 Compare November 6, 2024 12:25

brunoapimentel marked this pull request as ready for review November 6, 2024 12:25

brunoapimentel mentioned this pull request Nov 7, 2024

generic fetcher: Official support with ADR #728

Merged

4 tasks

brunoapimentel force-pushed the purl-enhancer-proposal branch 3 times, most recently from a5c2632 to 6c2a617 Compare November 12, 2024 03:54

Proposal to enhance the generic artifacts fetcher purls

4b43f7a

Signed-off-by: Bruno Pimentel <[email protected]>

brunoapimentel force-pushed the purl-enhancer-proposal branch from 6c2a617 to 4b43f7a Compare November 12, 2024 03:56

kosciCZ mentioned this pull request Nov 12, 2024

Add design for fetching maven artifacts #663

Closed

4 tasks

ligangty mentioned this pull request Nov 14, 2024

feat: Update prefetch-dependencies to 0.2 to support sub-paths konflux-ci/build-definitions#1605

Closed

kosciCZ mentioned this pull request Nov 15, 2024

Generic fetcher maven support #735

Merged

4 tasks


		## Enhanced purls overview

		- Implement an `enhancer` for every supported purl type, which is essentially a set of rules that will be applied to a generic artifact and, in case those rules can be matched, it will replace the generic purl for a more specific type.

		- Parse the purl by using the [packageurl-python](https://github.com/package-url/packageurl-python) library
		- Validate that the purl is within the supported types

		@@ -0,0 +1,233 @@
		# Support for different purl types in the generic artifact fetcher

		The generic artifact package manager is being added to Cachi2 as a means for users to introduce files that do not belong to traditional package manager ecosystems (e.g. pip, npm, golang) to their hermetic container builds. Since Cachi2 does not have any extra information about the file that's being fetched, the purls are always reported as [pkg:generic](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#generic).


		## Initial thoughts

		From a Cachi2 perspective, we can separate purls types into ones that are part of existing package manager ecosystems (such as nuget, composer, maven) and ones that are not (github, huggingface, oci):


		### Decision points

		- Should the checksums be specified as part of the input purl?

Proposal: Enhanced purls for generic artifacts #692

Are you sure you want to change the base?

Proposal: Enhanced purls for generic artifacts #692

Conversation

brunoapimentel commented Oct 21, 2024 • edited Loading

Maintainers will complete the following section

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eskultety Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunoapimentel commented Oct 31, 2024

arewm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunoapimentel Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunoapimentel Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunoapimentel Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunoapimentel commented Oct 21, 2024 •

edited

Loading

eskultety Oct 23, 2024 •

edited

Loading

brunoapimentel Nov 4, 2024 •

edited

Loading

brunoapimentel Nov 4, 2024 •

edited

Loading

brunoapimentel Nov 4, 2024 •

edited

Loading