Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update glossary and graphviz for repo/workflows #191

Merged
merged 11 commits into from
Mar 28, 2024
12 changes: 12 additions & 0 deletions redirects.yml
Original file line number Diff line number Diff line change
Expand Up @@ -159,14 +159,26 @@
from_url: /tutorials/quickstart.html
to_url: /tutorials/running-a-workflow.html

- type: page
from_url: /tutorials/running-a-workflow.html
to_url: /tutorials/running-a-phylogenetic-workflow.html

- type: page
from_url: /tutorials/zika.html
to_url: /tutorials/creating-a-workflow.html

- type: page
from_url: /tutorials/creating-a-workflow.html
to_url: /tutorials/creating-a-phylogenetic-workflow.html

- type: page
from_url: /tutorials/tb_tutorial.html
to_url: /tutorials/creating-a-bacterial-pathogen-workflow.html

- type: page
from_url: /tutorials/creating-a-bacterial-pathogen-workflow.html
to_url: /tutorials/creating-a-bacterial-phylogenetic-workflow.html

- type: page
from_url: /guides/share/nextstrain-groups.html
to_url: /guides/share/groups/index.html
Expand Down
2 changes: 1 addition & 1 deletion src/guides/share/groups/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Share via Nextstrain Groups
This how-to guide assumes familiarity with the :doc:`Nextstrain Groups
</learn/groups/index>` feature and the :doc:`Nextstrain dataset files
</reference/data-formats>` produced by :doc:`running a pathogen workflow
</tutorials/running-a-workflow>`. We recommend reading about those first
</tutorials/running-a-phylogenetic-workflow>`. We recommend reading about those first
if you're not familiar with them.

Log in with the Nextstrain CLI
Expand Down
6 changes: 3 additions & 3 deletions src/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,10 @@ team and other Nextstrain users provide assistance. For private inquiries,
:hidden:

Installing <install>
tutorials/running-a-workflow
tutorials/creating-a-workflow
tutorials/running-a-phylogenetic-workflow
tutorials/creating-a-phylogenetic-workflow
Exploring SARS-CoV-2 evolution <https://docs.nextstrain.org/projects/ncov/page/index.html>
tutorials/creating-a-bacterial-pathogen-workflow
tutorials/creating-a-bacterial-phylogenetic-workflow
tutorials/narratives-how-to-write
Analyzing genomes with Nextclade <https://docs.nextstrain.org/projects/nextclade/page/user/nextclade-web/index.html>

Expand Down
2 changes: 1 addition & 1 deletion src/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,7 @@ Try running Augur and Auspice
Next steps
==========

With Nextstrain installed, try :doc:`tutorials/running-a-workflow` next.
With Nextstrain installed, try :doc:`tutorials/running-a-phylogenetic-workflow` next.


Alternate installation methods
Expand Down
8 changes: 4 additions & 4 deletions src/learn/augur-to-auspice.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Auspice (visualization) components

It's helpful to start in Auspice and then work backwards to Augur.
In this section, we will walk through various components of Auspice and how
they relate to the :term:`dataset JSON <dataset>` (sometimes called an Auspice JSON).
they relate to the :term:`dataset JSON <phylogenetic dataset>` (sometimes called an Auspice JSON).

Phylogeny Tree Panel and Core Controls
--------------------------------------
Expand Down Expand Up @@ -226,7 +226,7 @@ various components:
.. image:: ../images/auspice-components-diversity-panel.png
:alt: Annotated screenshot of Auspice's diversity (entropy) panel

The diversity panel is enabled by data in the :term:`dataset JSON <dataset>`.
The diversity panel is enabled by data in the :term:`dataset JSON <phylogenetic dataset>`.
The top-level ``meta.genome_annotations`` provides the genome annotations
displayed and the individual tree nodes provide the mutations
via ``node.branch_attrs.mutations``, which are used to calculate the entropy
Expand Down Expand Up @@ -337,9 +337,9 @@ Exporting data via Augur
========================

We now consider how information flows through Augur, specifically
``augur export v2`` which produces the :term:`dataset (Auspice) JSON <dataset>`
``augur export v2`` which produces the :term:`dataset (Auspice) JSON <phylogenetic dataset>`
described above. This process combines data inputs with parameters configuring
aspects of the visualisation and produces :term:`dataset files <dataset>` for
aspects of the visualisation and produces :term:`dataset files <phylogenetic dataset>` for
Auspice to visualise.

.. graphviz::
Expand Down
112 changes: 84 additions & 28 deletions src/learn/parts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ example, you visit `nextstrain.org/mumps/na

Auspice displaying Mumps genomes from North America.

:term:`Datasets<dataset>` are produced by Augur and
:term:`Datasets<phylogenetic dataset>` are produced by Augur and
visualized by Auspice. These files are often referred to as :term:`JSONs`
colloquially because they use a generic data format called JSON.

Expand Down Expand Up @@ -118,7 +118,7 @@ colloquially because they use a generic data format called JSON.
Augur -> jsons -> Auspice;
}

:term:`Builds<build>` are recipes of code and data that produce these :term:`datasets<dataset>`.
A :term:`build` is a recipe of several commands and data that produce a single :term:`dataset`.

.. graphviz::
:align: center
Expand Down Expand Up @@ -165,9 +165,13 @@ colloquially because they use a generic data format called JSON.
metadata -> filter;
}

Builds run several commands and are often automated by workflow managers such as `Snakemake <https://snakemake.readthedocs.io>`__, `Nextflow <https://nextflow.io>`__ and `WDL <https://openwdl.org>`__. A :term:`workflow` bundles one or more related :term:`builds<build>` which each produce a :term:`dataset` for visualization with :term:`Auspice`.
A :term:`workflow` can bundle one or more related :term:`builds<build>` and are often automated by workflow managers
such as `Snakemake <https://snakemake.readthedocs.io>`__, `Nextflow <https://nextflow.io>`__
and `WDL <https://openwdl.org>`__.

As an example, our core workflows are organized as `Git repositories <https://git-scm.com>`__ hosted on `GitHub <https://github.com/nextstrain>`__. Each contains a :doc:`Snakemake workflow </guides/bioinformatics/augur_snakemake>` using Augur, configuration, and data.
Our :term:`pathogen repositories<pathogen repository>` are organized as `Git repositories <https://git-scm.com>`__
hosted on `GitHub <https://github.com/nextstrain>`__. Each repository can contain
one or more workflows.

.. graphviz::
:align: center
Expand All @@ -176,44 +180,96 @@ As an example, our core workflows are organized as `Git repositories <https://gi
graph [
fontname="Lato, 'Helvetica Neue', sans-serif",
fontsize=12,
]
];
node [
shape=box,
style="rounded, filled",
fontname="Lato, 'Helvetica Neue', sans-serif",
fontsize=12,
height=0.1,
colorscheme=paired10,
pad=0.1,
margin=0.1,
];
rankdir=LR
rankdir=LR;

subgraph cluster_ncov {
label = "SARS-CoV-2 repository";
subgraph cluster_ncov_phylo {
label = "Phylogenetic workflow";
build0 [width=1, label="Global build"];
build1 [width=1, label="Africa build"];
build2 [width=1, label="Europe build"];
output0 [width=1, label="dataset"];
output1 [width=1, label="dataset"];
output2 [width=1, label="dataset"];
ellipses1 [width=1, label="...", penwidth=0, fillcolor="white"];
ellipses2 [width=1, label="...", penwidth=0, fillcolor="white"];
}
}

subgraph cluster_0 {
label = "Zika workflow";
build0 [width=1, label="Zika build"]
dataset0 [width=1, label="dataset"]
subgraph cluster_zika {
label = "Zika repository";
nojustify = true;
subgraph cluster_zika_ingest {
label = "Ingest workflow";
build3 [width=1, label="ingest build"];
output3 [width=1, label="ingest dataset"];
}
subgraph cluster_zika_phylo {
label = "Phylogenetic workflow";
build4 [width=1, label="phylogenetic build"];
output4 [width=1, label="dataset"];
}
}

subgraph cluster_1 {
label = "SARS-CoV-2 workflow";
build1 [width=1, label="Global build"]
build2 [width=1, label="Africa build"]
build3 [width=1, label="Europe build"]
dataset1 [width=1, label="dataset"]
dataset2 [width=1, label="dataset"]
dataset3 [width=1, label="dataset"]
ellipses1 [width=1, label="...", penwidth=0, fillcolor="white"]
ellipses2 [width=1, label="...", penwidth=0, fillcolor="white"]
subgraph cluster_mpox {
label = "Mpox repository";
subgraph cluster_mpox_ingest {
label = "Ingest workflow";
build5 [width=1, label="ingest build"];
output5 [width=1, label="ingest dataset"];
}
subgraph cluster_mpox_phylo {
label = "Phylogenetic workflow";
build6 [width=1, label="mpxv build"];
build7 [width=1, label="hmpxv1 build"];
build8 [width=1, label="hmpxv1_big build"];
output6 [width=1, label="dataset"];
output7 [width=1, label="dataset"];
output8 [width=1, label="dataset"];

}
subgraph cluster_mpox_nextclade {
label = "Nextclade workflow";
build9 [width=1, label="all-clades build"];
build10 [width=1, label="clade-iib build"];
build11 [width=1, label="lineage-b.1 build"];
output9 [width=1, label="Nextclade dataset"];
output10 [width=1, label="Nextclade dataset"];
output11 [width=1, label="Nextclade dataset"];

}
}

build0 -> dataset0
build1 -> dataset1
build2 -> dataset2
build3 -> dataset3
build0 -> output0;
build1 -> output1;
build2 -> output2;
build3 -> output3;
build4 -> output4;
build5 -> output5;
build6 -> output6;
build7 -> output7;
build8 -> output8;
build9 -> output9;
build10 -> output10;
build11 -> output11;

{
edge[style=invis]
dataset0 -> build1 // arrange clusters on same row
ellipses1 -> ellipses2
edge[style=invis];
output0 -> build3; // arrange clusters on same row
output3 -> build5; // arrange clusters on same row
ellipses1 -> ellipses2;
}
}

Expand Down Expand Up @@ -242,5 +298,5 @@ quality checks, and phylogenetic placement. Nextclade can be used independently
of other Nextstrain tools as well as integrated into workflows.

With this overview, you'll be better prepared to :doc:`install Nextstrain
</install>` and :doc:`run a workflow </tutorials/running-a-workflow>` or :doc:`contribute
</install>` and :doc:`run a workflow </tutorials/running-a-phylogenetic-workflow>` or :doc:`contribute
to development </guides/contribute/index>`.
6 changes: 3 additions & 3 deletions src/reference/data-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ Workflow files
Files which correspond to several :term:`builds <build>` visible on nextstrain.org, e.g. all of builds under <nextstrain.org/ncov/open/…>.
These often include the full metadata table, sequences FASTA, titer matrix, etc.

We often call these "inputs" colloquially because they're often the top-level inputs to a :term:`workflow`, but some of the files are actually workflow-level outputs.
We often call these "inputs" colloquially because they're often the top-level inputs to a :term:`phylogenetic workflow`, but some of the files are actually workflow-level outputs.
(Albeit, outputs that can be used as time-saving inputs in later workflow runs.)

Build files
Files which correspond to a specific single :term:`build` visible on nextstrain.org, e.g. <`nextstrain.org/ncov/open/global/6m <https://nextstrain.org/ncov/open/global/6m>`__>.
These often include the subsampled metadata table, sequences FASTA, and Newick tree as well as the final :term:`dataset` JSONs.
These often include the subsampled metadata table, sequences FASTA, and Newick tree as well as the final :term:`phylogenetic dataset` JSONs.

We often call these "outputs" colloquially because they're produced by running a :term:`workflow`, but some of the files are actually the specific, subsampled inputs that went into the specific build.
We often call these "outputs" colloquially because they're produced by running a :term:`phylogenetic workflow`, but some of the files are actually the specific, subsampled inputs that went into the specific build.

Workflow and build files for public data are available from:

Expand Down
Loading
Loading