-
Notifications
You must be signed in to change notification settings - Fork 27
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Draft: Proposal to enhance the generic artifacts fetcher purls
Signed-off-by: Bruno Pimentel <[email protected]>
- Loading branch information
1 parent
e64c2eb
commit ff79377
Showing
1 changed file
with
150 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
# Purl enhancer for the generict artifact fetcher | ||
|
||
The generic artifact package manager is being added to Cachi2 as a means for users to introduce files that do not belong to traditional package manager ecosystems (e.g. pip, npm, golang) to their hermetic container builds. Since Cachi2 does not have any extra information about the file that's being fetched, the purls are always reported as [pkg:generic](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#generic). | ||
|
||
There are use cases that would benefit from more accurate purls, though, such as the recent Maven artifacts [proposal]. Considering that the purl specification already identifies several types of packages that don't fit into traditional package manager (e.g. github, docker, huggingface; see the [purl types spec](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst) for more info), this proposal builds on top of the fundamentals of the generic fetcher to provide an extensible mechanism that would allow Cachi2 to fetch files from specific sources and report them with matching purl types. | ||
|
||
## Enhanced purls overview | ||
|
||
- Implement an `enhancer` for every supported purl type, which is essentially a set of rules that will be applied to a generic artifact and, in case those rules can be matched, it will replace the generic purl for a more specific type. | ||
- The most generic rule seems to be the package's origin. With famous and well established public registries such as `maven.org` or `huggingface.co`, it is easy to correlate the package's origin with a more specific type. | ||
- Mapping famous and public registries will prevent the use of private registries. To solve this, we need to allow users to change Cachi2's default configuration to enable additional source URLs for each purl type. | ||
- More rules might be useful or even necessary when generating specific purl types. For example, Maven has the concept of groupId, artifactId and version (GAV) for each of its artifacts, and these values are fundamental in defining both the download URL and the purl. | ||
- Extend the generic artifacts lockfile specification to introduce a `type` attribute that allows users to hint at which purl type that artifact should have. | ||
- The only types that can be selected are the ones that had their respective `enhancers` implemented. We don't want to allow users to be able to freely specify the purl type for a generic artifact, but rather restrict this use to a specific subset of purl types. | ||
- Any failures to match the hinted `type` will not cause the request to fail. The file will still be fetched, but the purl will revert to `pkg:generic`. | ||
|
||
## A practical example | ||
|
||
### Input files | ||
|
||
**generic_artifacts.yaml** | ||
```yaml | ||
metadata: | ||
version: '1.0' | ||
artifacts: | ||
- download_url: https://github.com/containerbuildsystem/cachi2/archive/refs/tags/0.11.0.tar.gz | ||
target: cachi2_0_11_0.tar.gz | ||
checksums: | ||
sha256: fa0d536389db15fb3dabdb3b3d08354f47f765a653178140bfbe1b3de1a6ee76 | ||
- download_url: https://maven.repository.internal.com/ga/io/quarkus/quarkus-core/3.8.5.internal-00004/quarkus-core-3.8.5.internal-00004.jar | ||
target: quakus.jar | ||
type: maven | ||
checksums: | ||
sha1: e4ca5fadf89e62fb29d0d008046489b2305295bf | ||
- download_url: https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/blob/b919e5d07ce15f31ea741f2be99a00a33c3b427b/model-00001-of-00030.safetensors | ||
target: llama_3.1_1_of_30.safetensors | ||
type: huggingface | ||
``` | ||
**.cachi2-config.yaml** | ||
```yaml | ||
generic-artifact-sources: | ||
maven: | ||
# sample internal registry | ||
- maven.repository.internal.com | ||
- maven.org | ||
huggingface: | ||
- huggingface.co | ||
``` | ||
### Cachi2 CLI usage | ||
``` | ||
cachi2 fetch-deps --source /path/to/repo generic | ||
``` | ||
|
||
### Enhancer high-level definition | ||
|
||
- MavenPurlEnhancer: | ||
- validates the origin URL | ||
- parses the download url and converts it into the expected purl | ||
```bash | ||
# sample url | ||
https://maven.repository.internal.com/ga/io/quarkus/quarkus-core/3.8.5.internal-00004/quarkus-core-3.8.5.internal-00004.jar | ||
|
||
# how the parsing will be done | ||
https://{repository_url}/{as_dir(group_id)}/{artifactId}/{version}/{artifact_id}-{version}.{extension} | ||
|
||
# resulting purl | ||
# note that the type will need to be infered from the extension and potentially additional attributes | ||
pkg:maven/{groupId}/{artifactId}@{version}?type={type}&repository_url={repositoryUrl}&checksums={algorithm:checksum} | ||
``` | ||
- in case of failure, file will be reported as generic, warning will be issued | ||
|
||
|
||
- HuggingFacePurlEnhancer: | ||
- validates the origin URL | ||
- parses the purl | ||
```bash | ||
# sample url | ||
https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/blob/b919e5d07ce15f31ea741f2be99a00a33c3b427b/model-00001-of-00030.safetensors | ||
# parsing the url | ||
https://{repository_url}/{namespace}/{name}/blob/{commit_hash}/model-00001-of-00030.safetensors | ||
# resulting purl | ||
pkg:huggingface/{namespace}/{name}@{commit_hash}&download_url={download_url} | ||
``` | ||
- in case of failure, file will be reported as generic, warning will be issued | ||
|
||
### Resulting SBOM | ||
```json | ||
{ | ||
"components": [ | ||
{ | ||
"name": "cachi2-0.11.0.tar.gz", | ||
"purl": "pkg:generic/cachi2_0_11_0.tar.gz?checksum=sha256:fa0d536389db15fb3dabdb3b3d08354f47f765a653178140bfbe1b3de1a6ee76&download_url=https://github.com/containerbuildsystem/cachi2/archive/refs/tags/0.11.0.tar.gz", | ||
"properties": [ | ||
{ | ||
"name": "cachi2:found_by", | ||
"value": "cachi2:generic" | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "distribution", | ||
"url": "https://github.com/containerbuildsystem/cachi2/archive/refs/tags/0.11.0.tar.gz" | ||
} | ||
], | ||
"type": "file" | ||
}, | ||
{ | ||
"name": "quakus-core", | ||
"version": "3.8.5.internal-00004", | ||
"purl": "pkg:maven/ga.io.quarkus/[email protected]?type=jar&repository_url=https://maven.repository.internal.com&checksums=sha1:e4ca5fadf89e62fb29d0d008046489b2305295bf", | ||
"properties": [ | ||
{ | ||
"name": "cachi2:found_by", | ||
"value": "cachi2:generic" | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "distribution", | ||
"url": "https://maven.repository.internal.com/ga/io/quarkus/quarkus-core/3.8.5.internal-00004/quarkus-core-3.8.5.internal-00004.jar" | ||
} | ||
], | ||
"type": "file" | ||
}, | ||
{ | ||
"name": "Llama-3.1-Nemotron-70B-Instruct-HF", | ||
"purl": "pkg:huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF@043235d6088ecd3dd5fb5ca3592b6913fd516027&download_url=https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/blob/b919e5d07ce15f31ea741f2be99a00a33c3b427b/model-00001-of-00030.safetensors", | ||
"properties": [ | ||
{ | ||
"name": "cachi2:found_by", | ||
"value": "cachi2:generic" | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "distribution", | ||
"url": "https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/blob/b919e5d07ce15f31ea741f2be99a00a33c3b427b/model-00001-of-00030.safetensors" | ||
} | ||
], | ||
"type": "file" | ||
}, | ||
] | ||
} | ||
``` |