Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine minimum viable test data set #68

Open
TimothyStiles opened this issue Apr 4, 2023 · 3 comments
Open

Determine minimum viable test data set #68

TimothyStiles opened this issue Apr 4, 2023 · 3 comments
Assignees

Comments

@TimothyStiles
Copy link
Contributor

Our minimum viable test set should be small enough to allow local development but complete (and probably large enough) to do some actually meaningful work. Thought is to perhaps keep just a large collection of b. sub strains or some model organism and their affiliated genes/proteins? rhea, chembl, and reactome.org should all be small enough that we can just keep their relevant data in the test set.

Right now the scraper is not sophisticated enough to just take b. Sub genomes. It simply just downloads all of genbank / uniprots data dumps.

@TimothyStiles TimothyStiles converted this from a draft issue Apr 4, 2023
@rkrishnasanka
Copy link

We should have something in the range of 100-300MB. It will otherwise clog up everyone's LFS quotas when they fork it. Additionally, if we can avoid LFS, we should that way you won't have any issues if someone forgets to check-in LFS objects.

@TimothyStiles
Copy link
Contributor Author

git lfs is nightmare <- Krishna

minimum viable dataset <- most interesting reaction Isaac
isaac 👇
protein -> metabolic pathway -> map against genome

@TimothyStiles
Copy link
Contributor Author

checking out @rkrishnasanka's minimal viable dataset on his fork.

@TimothyStiles TimothyStiles moved this to Todo in Ark roadmap Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants