Determine minimum viable test data set #68

TimothyStiles · 2023-04-04T19:37:40Z

Our minimum viable test set should be small enough to allow local development but complete (and probably large enough) to do some actually meaningful work. Thought is to perhaps keep just a large collection of b. sub strains or some model organism and their affiliated genes/proteins? rhea, chembl, and reactome.org should all be small enough that we can just keep their relevant data in the test set.

Right now the scraper is not sophisticated enough to just take b. Sub genomes. It simply just downloads all of genbank / uniprots data dumps.

rkrishnasanka · 2023-04-04T20:45:31Z

We should have something in the range of 100-300MB. It will otherwise clog up everyone's LFS quotas when they fork it. Additionally, if we can avoid LFS, we should that way you won't have any issues if someone forgets to check-in LFS objects.

TimothyStiles · 2023-04-04T20:45:44Z

git lfs is nightmare <- Krishna

minimum viable dataset <- most interesting reaction Isaac
isaac 👇
protein -> metabolic pathway -> map against genome

TimothyStiles · 2023-04-04T20:46:43Z

checking out @rkrishnasanka's minimal viable dataset on his fork.

TimothyStiles added this to Ark roadmap Apr 4, 2023

TimothyStiles converted this from a draft issue Apr 4, 2023

TimothyStiles assigned Koeng101 Apr 4, 2023

TimothyStiles moved this to Todo in Ark roadmap Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine minimum viable test data set #68

Determine minimum viable test data set #68

TimothyStiles commented Apr 4, 2023

rkrishnasanka commented Apr 4, 2023

TimothyStiles commented Apr 4, 2023

TimothyStiles commented Apr 4, 2023

Determine minimum viable test data set #68

Determine minimum viable test data set #68

Comments

TimothyStiles commented Apr 4, 2023

rkrishnasanka commented Apr 4, 2023

TimothyStiles commented Apr 4, 2023

TimothyStiles commented Apr 4, 2023