Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated documentation and inventory #39

Merged
merged 11 commits into from
Sep 8, 2016
Merged

Updated documentation and inventory #39

merged 11 commits into from
Sep 8, 2016

Conversation

nbokulich
Copy link
Collaborator

Checks fail because one FTP link is missing. This link is a placeholder for data to be added this week in response to issue-7

@@ -1,7 +1,7 @@
name value
citation doi: 10.1016/j.cell.2012.10.052
qiita-id NA
raw-data-url http://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeProject&project=1319
raw-data-url ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh3/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this link have Turnbaugh3 in it? Shouldn't it be mock-X?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregcaporaso In the microbio.me FTP, we had the original dataset names for all directories and never changed them. Turnbaugh-3 would be consistent with this. If we are moving the raw data elsewhere, we can rename the directories there... until we decide on the final home I will leave this as is. Does that sound ok?

@gregcaporaso
Copy link
Member

@nbokulich, can you check that CONTRIBUTING.md is updated to reflect that tabular data files no longer start with a leading # (i.e., the change made in #40).


mockrobiota does not host raw data files (e.g., sequencing files). All sequencing data and other raw data files must be deposited on public, external websites. Stable, public depositories are preferred, but this requirement is not enforced by mockrobiota. mockrobiota ensures that valid, accessible links are provided in the dataset metadata (if not, integrity checks will fail and your dataset will not be accepted), but does not manage these external resources and will not guarantee the validity of raw data that are contributed by outside users. When preparing raw data for linking to mockrobiota datasets, please observe the following regulations:

1. All raw sequence data should be deposited in .fastq format and archived using standard compression formats, e.g., .gz or .zip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing period at end of sentence.

@gregcaporaso
Copy link
Member

Done reviewing this one @nbokulich.

@nbokulich
Copy link
Collaborator Author

Thanks @gregcaporaso I will make all changes above and then merge.

A couple changes (e.g., physical-specimen-contact --> contact-email in all other dataset metadata) will be in a separate PR. Wanted to get this PR and PR#40 through before making more changes to these files.

@gregcaporaso
Copy link
Member

Ok, makes sense. Thanks!

@nbokulich
Copy link
Collaborator Author

@gregcaporaso I have updated CONTRIBUTING.md to reflect that tabular data files no longer start with a leading # (i.e., the change made in #40). However, the document calls these "classic biom files" — without the leading "#", these are no longer classic biom format, correct?

@gregcaporaso
Copy link
Member

re: "classic biom files"

Good point - let's just drop that as the description of those files.

@nbokulich
Copy link
Collaborator Author

Got it. Am dropping those and will squash/merge this PR

On Thu, Sep 8, 2016 at 9:51 AM, Greg Caporaso [email protected]
wrote:

re: "classic biom files"

Good point - let's just drop that as the description of those files.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#39 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0bbPdZlHM5CglPYs0Ae1Puvi5mCsuiks5qoCDigaJpZM4J2B0E
.

@gregcaporaso
Copy link
Member

I can do the final merge (generally someone else merges, not the person who submits).


### Expected taxonomy (``database-name/database-version/expected-taxonomy.tsv``)
Contains the known composition of the mock community (e.g., taxonomies or KEGG pathways), annotated according to a specific reference database. Compilation of expected composition data is not a trivial task, and requires careful review of database annotations to ensure that accurate annotations are applied to source data. See [Compiling expected taxonomy files](#compiling-expected-taxonomy-files) below for discussion of this topic.

This file must be in ["classic BIOM" (tab-separated text) format](#classic-biom-formatted-tables).
In these files, the first line must begin with the text ``Taxonomy``, followed by a tab-separated list of one or more sample identifiers. All sample identifiers provided here must be present in ``sample-metadata.tsv``. Each subsequent line should begin with the taxonomic name, followed by a tab-separated list of the relative abundances in each sample. The relative abundances must sum to 1.000 (to three decimal places) for each sample. See [example expected-taxonomy.tsv](./data/example-1/greengenes/13_8/expected-taxonomy.tsv) for an example file.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly redundant with the text above for source/taxonomy.tsv, but since that is optional I want to make sure contributors read this.

@@ -75,19 +75,29 @@ This file lists metadata for each individual sample contained in a mock communit
### ``source/taxonomy.tsv`` (optional)
This file lists the taxonomic and (when possible) strain affiliation of each strain added to the mock community, as well as its relative abundance. This file does not need to adhere to a particular taxonomic reference database, but please include as much information as possible (e.g., if this strain is available through a public repository, please list the repository strain ID). This information is usually provided by the developer(s) of the mock community.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note changes on lines below (remove mention of "classic BIOM format")

@nbokulich
Copy link
Collaborator Author

Got it, @gregcaporaso . I am done making the requested changes. I have added line notes to spots in CONTRIBUTING.md where I have made new changes that were not specifically mentioned in your line notes (e.g., remove classic biom format details) to make it easier to spot all new changes.

@gregcaporaso gregcaporaso merged commit ced0025 into master Sep 8, 2016
@nbokulich nbokulich deleted the update-docs branch October 3, 2016 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants