Updated documentation and inventory #39

nbokulich · 2016-09-06T16:26:40Z

Checks fail because one FTP link is missing. This link is a placeholder for data to be added this week in response to issue-7

gregcaporaso · 2016-09-08T13:45:28Z

data/mock-7/dataset-metadata.tsv

@@ -1,7 +1,7 @@
 name	value
 citation	doi: 10.1016/j.cell.2012.10.052
 qiita-id	NA
-raw-data-url	http://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeProject&project=1319
+raw-data-url	ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Turnbaugh3/


Why does this link have Turnbaugh3 in it? Shouldn't it be mock-X?

@gregcaporaso In the microbio.me FTP, we had the original dataset names for all directories and never changed them. Turnbaugh-3 would be consistent with this. If we are moving the raw data elsewhere, we can rename the directories there... until we decide on the final home I will leave this as is. Does that sound ok?

gregcaporaso · 2016-09-08T13:54:39Z

@nbokulich, can you check that CONTRIBUTING.md is updated to reflect that tabular data files no longer start with a leading # (i.e., the change made in #40).

gregcaporaso · 2016-09-08T13:55:29Z

CONTRIBUTING.md

+
+mockrobiota does not host raw data files (e.g., sequencing files). All sequencing data and other raw data files must be deposited on public, external websites. Stable, public depositories are preferred, but this requirement is not enforced by mockrobiota. mockrobiota ensures that valid, accessible links are provided in the dataset metadata (if not, integrity checks will fail and your dataset will not be accepted), but does not manage these external resources and will not guarantee the validity of raw data that are contributed by outside users. When preparing raw data for linking to mockrobiota datasets, please observe the following regulations:
+
+1. All raw sequence data should be deposited in .fastq format and archived using standard compression formats, e.g., .gz or .zip


Missing period at end of sentence.

gregcaporaso · 2016-09-08T13:57:25Z

Done reviewing this one @nbokulich.

nbokulich · 2016-09-08T13:59:51Z

Thanks @gregcaporaso I will make all changes above and then merge.

A couple changes (e.g., physical-specimen-contact --> contact-email in all other dataset metadata) will be in a separate PR. Wanted to get this PR and PR#40 through before making more changes to these files.

gregcaporaso · 2016-09-08T14:01:24Z

Ok, makes sense. Thanks!

nbokulich · 2016-09-08T14:26:50Z

@gregcaporaso I have updated CONTRIBUTING.md to reflect that tabular data files no longer start with a leading # (i.e., the change made in #40). However, the document calls these "classic biom files" — without the leading "#", these are no longer classic biom format, correct?

gregcaporaso · 2016-09-08T14:51:14Z

re: "classic biom files"

Good point - let's just drop that as the description of those files.

nbokulich · 2016-09-08T14:52:03Z

Got it. Am dropping those and will squash/merge this PR

On Thu, Sep 8, 2016 at 9:51 AM, Greg Caporaso [email protected]
wrote:

re: "classic biom files"

Good point - let's just drop that as the description of those files.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#39 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0bbPdZlHM5CglPYs0Ae1Puvi5mCsuiks5qoCDigaJpZM4J2B0E
.

gregcaporaso · 2016-09-08T14:53:00Z

I can do the final merge (generally someone else merges, not the person who submits).

nbokulich · 2016-09-08T14:58:52Z

CONTRIBUTING.md


 ### Expected taxonomy (``database-name/database-version/expected-taxonomy.tsv``)
 Contains the known composition of the mock community (e.g., taxonomies or KEGG pathways), annotated according to a specific reference database. Compilation of expected composition data is not a trivial task, and requires careful review of database annotations to ensure that accurate annotations are applied to source data. See [Compiling expected taxonomy files](#compiling-expected-taxonomy-files) below for discussion of this topic.

-This file must be in ["classic BIOM" (tab-separated text) format](#classic-biom-formatted-tables).
+In these files, the first line must begin with the text ``Taxonomy``, followed by a tab-separated list of one or more sample identifiers. All sample identifiers provided here must be present in ``sample-metadata.tsv``. Each subsequent line should begin with the taxonomic name, followed by a tab-separated list of the relative abundances in each sample. The relative abundances must sum to 1.000 (to three decimal places) for each sample. See [example expected-taxonomy.tsv](./data/example-1/greengenes/13_8/expected-taxonomy.tsv) for an example file.


Mostly redundant with the text above for source/taxonomy.tsv, but since that is optional I want to make sure contributors read this.

nbokulich · 2016-09-08T15:00:39Z

CONTRIBUTING.md

@@ -75,19 +75,29 @@ This file lists metadata for each individual sample contained in a mock communit
 ### ``source/taxonomy.tsv`` (optional)
 This file lists the taxonomic and (when possible) strain affiliation of each strain added to the mock community, as well as its relative abundance. This file does not need to adhere to a particular taxonomic reference database, but please include as much information as possible (e.g., if this strain is available through a public repository, please list the repository strain ID). This information is usually provided by the developer(s) of the mock community.



Note changes on lines below (remove mention of "classic BIOM format")

nbokulich · 2016-09-08T15:05:51Z

Got it, @gregcaporaso . I am done making the requested changes. I have added line notes to spots in CONTRIBUTING.md where I have made new changes that were not specifically mentioned in your line notes (e.g., remove classic biom format details) to make it easier to spot all new changes.

nbokulich added 7 commits September 6, 2016 11:11

Update inventory.tsv

17ff826

Update CONTRIBUTING.md

9f08a92

Update README.md

d0c780f

Update dataset-metadata.tsv

b640f65

Update dataset-metadata.tsv

6b85080

Update dataset-metadata.tsv

040026d

Update dataset-metadata.tsv

f375336

gregcaporaso reviewed Sep 8, 2016
View reviewed changes

nbokulich added 3 commits September 8, 2016 09:13

Update dataset-metadata.tsv

bc408f1

Update CONTRIBUTING.md

f18dba0

Update CONTRIBUTING.md

2d4965d

Update CONTRIBUTING.md

1399764

nbokulich reviewed Sep 8, 2016
View reviewed changes

gregcaporaso merged commit ced0025 into master Sep 8, 2016

nbokulich deleted the update-docs branch October 3, 2016 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated documentation and inventory #39

Updated documentation and inventory #39

nbokulich commented Sep 6, 2016

gregcaporaso Sep 8, 2016

nbokulich Sep 8, 2016

gregcaporaso commented Sep 8, 2016

gregcaporaso Sep 8, 2016

gregcaporaso commented Sep 8, 2016

nbokulich commented Sep 8, 2016

gregcaporaso commented Sep 8, 2016

nbokulich commented Sep 8, 2016

gregcaporaso commented Sep 8, 2016

nbokulich commented Sep 8, 2016

gregcaporaso commented Sep 8, 2016

nbokulich Sep 8, 2016

nbokulich Sep 8, 2016

nbokulich commented Sep 8, 2016


		mockrobiota does not host raw data files (e.g., sequencing files). All sequencing data and other raw data files must be deposited on public, external websites. Stable, public depositories are preferred, but this requirement is not enforced by mockrobiota. mockrobiota ensures that valid, accessible links are provided in the dataset metadata (if not, integrity checks will fail and your dataset will not be accepted), but does not manage these external resources and will not guarantee the validity of raw data that are contributed by outside users. When preparing raw data for linking to mockrobiota datasets, please observe the following regulations:

		1. All raw sequence data should be deposited in .fastq format and archived using standard compression formats, e.g., .gz or .zip

		@@ -75,19 +75,29 @@ This file lists metadata for each individual sample contained in a mock communit
		### ``source/taxonomy.tsv`` (optional)
		This file lists the taxonomic and (when possible) strain affiliation of each strain added to the mock community, as well as its relative abundance. This file does not need to adhere to a particular taxonomic reference database, but please include as much information as possible (e.g., if this strain is available through a public repository, please list the repository strain ID). This information is usually provided by the developer(s) of the mock community.

Updated documentation and inventory #39

Updated documentation and inventory #39

Conversation

nbokulich commented Sep 6, 2016

gregcaporaso Sep 8, 2016

Choose a reason for hiding this comment

nbokulich Sep 8, 2016

Choose a reason for hiding this comment

gregcaporaso commented Sep 8, 2016

gregcaporaso Sep 8, 2016

Choose a reason for hiding this comment

gregcaporaso commented Sep 8, 2016

nbokulich commented Sep 8, 2016

gregcaporaso commented Sep 8, 2016

nbokulich commented Sep 8, 2016

gregcaporaso commented Sep 8, 2016

nbokulich commented Sep 8, 2016

gregcaporaso commented Sep 8, 2016

nbokulich Sep 8, 2016

Choose a reason for hiding this comment

nbokulich Sep 8, 2016

Choose a reason for hiding this comment

nbokulich commented Sep 8, 2016