-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated documentation and inventory #39
Conversation
@@ -1,7 +1,7 @@ | |||
name value | |||
citation doi: 10.1016/j.cell.2012.10.052 | |||
qiita-id NA | |||
raw-data-url http://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeProject&project=1319 | |||
raw-data-url ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh3/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this link have Turnbaugh3
in it? Shouldn't it be mock-X
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregcaporaso In the microbio.me FTP, we had the original dataset names for all directories and never changed them. Turnbaugh-3 would be consistent with this. If we are moving the raw data elsewhere, we can rename the directories there... until we decide on the final home I will leave this as is. Does that sound ok?
@nbokulich, can you check that |
|
||
mockrobiota does not host raw data files (e.g., sequencing files). All sequencing data and other raw data files must be deposited on public, external websites. Stable, public depositories are preferred, but this requirement is not enforced by mockrobiota. mockrobiota ensures that valid, accessible links are provided in the dataset metadata (if not, integrity checks will fail and your dataset will not be accepted), but does not manage these external resources and will not guarantee the validity of raw data that are contributed by outside users. When preparing raw data for linking to mockrobiota datasets, please observe the following regulations: | ||
|
||
1. All raw sequence data should be deposited in .fastq format and archived using standard compression formats, e.g., .gz or .zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing period at end of sentence.
Done reviewing this one @nbokulich. |
Thanks @gregcaporaso I will make all changes above and then merge. A couple changes (e.g., physical-specimen-contact --> contact-email in all other dataset metadata) will be in a separate PR. Wanted to get this PR and PR#40 through before making more changes to these files. |
Ok, makes sense. Thanks! |
@gregcaporaso I have updated CONTRIBUTING.md to reflect that tabular data files no longer start with a leading # (i.e., the change made in #40). However, the document calls these "classic biom files" — without the leading "#", these are no longer classic biom format, correct? |
Good point - let's just drop that as the description of those files. |
Got it. Am dropping those and will squash/merge this PR On Thu, Sep 8, 2016 at 9:51 AM, Greg Caporaso [email protected]
|
I can do the final merge (generally someone else merges, not the person who submits). |
|
||
### Expected taxonomy (``database-name/database-version/expected-taxonomy.tsv``) | ||
Contains the known composition of the mock community (e.g., taxonomies or KEGG pathways), annotated according to a specific reference database. Compilation of expected composition data is not a trivial task, and requires careful review of database annotations to ensure that accurate annotations are applied to source data. See [Compiling expected taxonomy files](#compiling-expected-taxonomy-files) below for discussion of this topic. | ||
|
||
This file must be in ["classic BIOM" (tab-separated text) format](#classic-biom-formatted-tables). | ||
In these files, the first line must begin with the text ``Taxonomy``, followed by a tab-separated list of one or more sample identifiers. All sample identifiers provided here must be present in ``sample-metadata.tsv``. Each subsequent line should begin with the taxonomic name, followed by a tab-separated list of the relative abundances in each sample. The relative abundances must sum to 1.000 (to three decimal places) for each sample. See [example expected-taxonomy.tsv](./data/example-1/greengenes/13_8/expected-taxonomy.tsv) for an example file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly redundant with the text above for source/taxonomy.tsv
, but since that is optional I want to make sure contributors read this.
@@ -75,19 +75,29 @@ This file lists metadata for each individual sample contained in a mock communit | |||
### ``source/taxonomy.tsv`` (optional) | |||
This file lists the taxonomic and (when possible) strain affiliation of each strain added to the mock community, as well as its relative abundance. This file does not need to adhere to a particular taxonomic reference database, but please include as much information as possible (e.g., if this strain is available through a public repository, please list the repository strain ID). This information is usually provided by the developer(s) of the mock community. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note changes on lines below (remove mention of "classic BIOM format")
Got it, @gregcaporaso . I am done making the requested changes. I have added line notes to spots in |
Checks fail because one FTP link is missing. This link is a placeholder for data to be added this week in response to issue-7