Skip to content

Commit

Permalink
Merge branch 'refs/heads/master' into stable
Browse files Browse the repository at this point in the history
# Conflicts:
#	chemistry_base/src/main/resources/sirius.build.properties
#	sirius_cli/src/main/resources/sirius_frontend.build.properties
  • Loading branch information
mfleisch committed Sep 2, 2024
2 parents 8d99490 + 8f48db2 commit dcff3cd
Show file tree
Hide file tree
Showing 51 changed files with 1,187 additions and 841 deletions.
33 changes: 21 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blueviolet.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Generic badge](https://img.shields.io/badge/Version-6.0.4-informational.svg)](https://shields.io/)
[![Build and Publish](https://github.com/sirius-ms/sirius/actions/workflows/distribute.yaml/badge.svg?branch=release-4-pre)](https://github.com/sirius-ms/sirius/actions/workflows/distribute.yaml)
[![Join community chat at https://gitter.im/sirius-ms/general](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/sirius-ms/general?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

*<span style="color: #808080;">Our methods are offered to the scientific community as freely available resources. (Re-)distribution of the
methods, in whole or in part, for commercial purposes is prohibited.
CSI:FingerID and CANOPUS web services hosted by the [Böcker group](https://bio.informatik.uni-jena.de/) are for academic research and education use only.
The SIRIUS web services (CSI:FingerID, CANOPUS, MSNovelist and others) hosted by the [Böcker group](https://bio.informatik.uni-jena.de/) are for academic research and education use only.
Please review the [terms of service](https://bio.informatik.uni-jena.de/terms-of-service-fsu-csi) of the academic version for details.
For non-academic users, the [Bright Giant GmbH](https://bright-giant.com) provides licenses and all related services.
We ask that users of our tools cite the corresponding papers in any resulting publications.</span>*

SIRIUS is a java-based software framework for the analysis of LC-MS/MS data of metabolites and other "small molecules of biological interest".
SIRIUS integrates a collection of our tools, including CSI:FingerID (with [COSMIC](https://bio.informatik.uni-jena.de/software/cosmic/)), [ZODIAC](https://bio.informatik.uni-jena.de/software/zodiac/) and
SIRIUS integrates a collection of our tools, including CSI:FingerID (with [COSMIC](https://bio.informatik.uni-jena.de/software/cosmic/)), [ZODIAC](https://bio.informatik.uni-jena.de/software/zodiac/),
[CANOPUS](https://bio.informatik.uni-jena.de/software/canopus/). In particular, both the
graphical user interface and the command line version of SIRIUS seamlessly integrate the CSI:FingerID and CANOPUS web services.
graphical user interface and the command line version of SIRIUS seamlessly integrate the CSI:FingerID, CANOPUS and MSNovelist web services.

Main developers of SIRIUS are the [Böcker group](https://bio.informatik.uni-jena.de/) and the [Bright Giant GmbH](https://bright-giant.com)

Expand All @@ -22,7 +23,7 @@ Main developers of SIRIUS are the [Böcker group](https://bio.informatik.uni-jen
- [Online Documentation](https://v6.docs.sirius-ms.io/)
- [Video tutorials](https://www.youtube.com/channel/UCIbW_ZFSADRUQ-T5nmgU4VA/featured)
- [Bookchapter on using SIRIUS 4](https://doi.org/10.1007/978-1-0716-0239-3_11) ([Preprint](https://bio.informatik.uni-jena.de/wp/wp-content/uploads/2020/12/SIRIUS4_book_chapter_preprint-2.pdf)) -- does not cover the new LC-MS/MS processing option
- [Demo data](data/demo.zip)
- [Demo data](data/demo-data.zip)
- [Logos for publications and presentations](https://bio.informatik.uni-jena.de/software/sirius/sirius-logos/)

<!--begin download-->
Expand Down Expand Up @@ -72,14 +73,18 @@ may be required.

### [Changelog](https://v6.docs.sirius-ms.io/changelog/)

### Contact
- To get news, help or ask questions please join our [Gitter Community `#sirius-ms:gitter.im`](https://matrix.to/#/#sirius-ms:gitter.im).
- For bug reports or feature request please use the issues on our [GitHub](https://github.com/sirius-ms/sirius). Or check the [documentation](https://v6.docs.sirius-ms.io/bugs/) for further information about this topic.

### Integration of CSI:FingerID, CANOPUS and MSNovelist

Fragmentation trees and spectra can be directly uploaded from SIRIUS to the CSI:FingerID, CANOPUS and MSNovelist web services.
Results are retrieved from the web service and can be displayed in the SIRIUS graphical user interface. This functionality is
also available for the SIRIUS command-line tool. Training structures for CSI:FingerID's predictors are available through the CSI:FingerID web API:
<!--begin training-->

- https://www.csi-fingerid.uni-jena.de/v3.0/api/fingerid/trainingstructures?predictor=1 (training structures for positive ion mode)
- https://csi-fingerid.uni-jena.de/v3.0/api/fingerid/trainingstructures?predictor=1 (training structures for positive ion mode)
- https://www.csi-fingerid.uni-jena.de/v3.0/api/fingerid/trainingstructures?predictor=2 (training structures for negative ion mode)

<!--end training-->
Expand Down Expand Up @@ -112,18 +117,18 @@ command-line tool.
<!--begin cite-->
## Main citations

Kai Dührkop, Markus Fleischauer, Marcus Ludwig, Alexander A. Aksenov, Alexey V. Melnik, Marvin Meusel, Pieter C. Dorrestein, Juho Rousu, and Sebastian Böcker,
Kai Dührkop, Markus Fleischauer, Marcus Ludwig, Alexander A. Aksenov, Alexey V. Melnik, Marvin Meusel, Pieter C. Dorrestein, Juho Rousu and Sebastian Böcker.
[SIRIUS 4: Turning tandem mass spectra into metabolite structure information.](https://doi.org/10.1038/s41592-019-0344-8)
*Nature Methods* 16, 299–302, 2019.

---
Stravs, Michael A. and Dührkop, Kai and Böcker, Sebastian and Zamboni, Nicola
[MSNovelist: De novo structure generation from mass spectra](https://doi.org/10.1101/2021.07.06.450875)
bioRxiv, 2021. (Cite if you are using: MSNovelist)
Michael A. Stravs and Kai Dührkop, Sebastian Böcker and Nicola Zamboni.
[MSNovelist: De novo structure generation from mass spectra.](https://doi.org/10.1038/s41592-022-01486-3)
*Nature Methods* 19, 865–870, 2022. (Cite if you are using: MSNovelist)

Martin A. Hoffmann and Louis-Félix Nothias and Marcus Ludwig and Markus Fleischauer and Emily C. Gentry and Michael Witting and Pieter C. Dorrestein and Kai Dührkop and Sebastian Böcker
[Assigning confidence to structural annotations from mass spectra with COSMIC](https://doi.org/10.1101/2021.03.18.435634)
bioRxiv, 2021. (Cite if you are using: *CSI:FingerID*, *COSMIC*)
Martin A. Hoffmann, Louis-Félix Nothias, Marcus Ludwig, Markus Fleischauer, Emily C. Gentry, Michael Witting, Pieter C. Dorrestein, Kai Dührkop and Sebastian Böcker.
[High-confidence structural annotation of metabolites absent from spectral libraries.](https://doi.org/10.1038/s41587-021-01045-9)
*Nature Biotechnology* 40, 411–421, 2022. (Cite if you are using: *CSI:FingerID*, *COSMIC*)

Kai Dührkop, Louis-Félix Nothias, Markus Fleischauer, Raphael Reher, Marcus Ludwig, Martin A. Hoffmann, Daniel Petras, William H. Gerwick, Juho Rousu, Pieter C. Dorrestein and Sebastian Böcker.
[Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.](https://doi.org/10.1038/s41587-020-0740-8)
Expand Down Expand Up @@ -151,6 +156,10 @@ Sebastian Böcker, Matthias C. Letzel, Zsuzsanna Lipták and Anton Pervukhin.

### Additional citations

Shipei Xing, Sam Shen, Banghua Xu, Xiaoxiao Li and Tao Huan.
[BUDDY: molecular formula discovery via bottom-up MS/MS interrogation.](https://doi.org/10.1038/s41592-023-01850-x)
*Nature Methods* 20, 881–890, 2023. (Cite if you are using: Bottom-up molecular formula generation)

Marcus Ludwig, Kai Dührkop and Sebastian and Böcker.
[Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints.](http://doi.org/10.1093/bioinformatics/bty245)
*Bioinformatics*, 34(13): i333-i340. 2018. Proc. of Intelligent Systems for Molecular Biology (ISMB 2018). (Cite for CSI:FingerID Scoring)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -549,7 +549,7 @@ public List<FingerprintCandidate> lookupFingerprintsByInchi(Iterable<CompoundCan
@Override
public void annotateCompounds(List<? extends CompoundCandidate> sublist) throws ChemicalDatabaseException {
try (final PooledConnection<Connection> c = connection.orderConnection()) {
final DataSource[] sources = DataSource.valuesNoALL();
final DataSource[] sources = DataSource.valuesNoMetaSources();
final PreparedStatement[] statements = new PreparedStatement[sources.length];
int k = 0;
for (DataSource source : sources) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ public void retrieveStructureAndAnnotate() {
assertEquals("InChI does not match. Standardization changed?", inChI, candidate.getInchi().in2D);
assertEquals("SMILES does not match. Standardization changed?", smiles, candidate.getSmiles());
assertEquals("Different name expected", name, candidate.getName());
assertTrue("Structure should be contained in Chebi.", (candidate.getBitset() & DataSource.CHEBI.searchFlag) > 0);
assertTrue("Structure should be contained in PubChem.", (candidate.getBitset() & DataSource.PUBCHEM.searchFlag) > 0);
assertTrue("Structure should be contained in Chebi.", (candidate.getBitset() & DataSource.CHEBI.flag()) > 0);
assertTrue("Structure should be contained in PubChem.", (candidate.getBitset() & DataSource.PUBCHEM.flag()) > 0);

//test annotation
candidate.setBitset(makeCompleteFlag()); //modify to test all available reference tables
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,17 @@ public CompoundCandidate(InChI inchi) {
this.inchikey = (inchi != null) ? inchi.key2D() : null;
}

/**
* checks if the list of links contains all @{@link DataSource}s that are specified via the bitset.
* If incomplete, the list of links is completed by adding links with 'null' IDs for each missing @{@link DataSource}.
*/
public void ensureSelfContainedLinks() {
Arrays.stream(DataSource.valuesNoMetaSources())
.filter(ds -> (ds.flag() & bitset) > 0)
.filter(ds -> links.stream().noneMatch(dbLink -> dbLink.getName().equals(ds.name())))
.forEach(ds -> links.add(new DBLink(ds.name(), null)));
}

public String getInchiKey2D() {
return inchikey;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,10 @@ public enum DataSource {
BloodExposome("Blood Exposome", 4194304, "pubchem_cid", "bloodexposome", "https://bloodexposome.org/#/description?qcid=%s", new Publication("Barupal DK and Fiehn O, Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach. Environ Health Perspect. 2019", "10.1289/EHP4713")),
TeroMol("TeroMOL", 8388608, "mol_id", "teromol", "http://terokit.qmclab.com/molecule.html?MolId=%s", new Publication("Zeng T et al.,Chemotaxonomic Investigation of Plant Terpenoids with an Established Database (TeroMOL). New Phytol. 2022", "10.1111/nph.18133")),

PUBCHEMANNOTATIONBIO("PubChem class - bio and metabolites", 16777216, null,null,null, 0, false, new Publication("Kim S et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021", "10.1093/nar/gkaa971")), //2**24; Pubchem Annotations now have a separate flag
PUBCHEMANNOTATIONDRUG("PubChem class - drug", 33554432, null,null,null, 0, false, new Publication("Kim S et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021", "10.1093/nar/gkaa971")),
PUBCHEMANNOTATIONSAFETYANDTOXIC("PubChem class - safety and toxic", 67108864, null,null,null, 0, false, new Publication("Kim S et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021", "10.1093/nar/gkaa971")),
PUBCHEMANNOTATIONFOOD("PubChem class - food", 134217728, null,null,null, 0, false, new Publication("Kim S et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021", "10.1093/nar/gkaa971")),
PUBCHEMANNOTATIONBIO("PubChem class - bio and metabolites", 16777216, null, null, null, new Publication("Kim S et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021", "10.1093/nar/gkaa971")), //2**24; Pubchem Annotations now have a separate flag
PUBCHEMANNOTATIONDRUG("PubChem class - drug", 33554432, null,null,null, new Publication("Kim S et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021", "10.1093/nar/gkaa971")),
PUBCHEMANNOTATIONSAFETYANDTOXIC("PubChem class - safety and toxic", 67108864, null,null,null, new Publication("Kim S et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021", "10.1093/nar/gkaa971")),
PUBCHEMANNOTATIONFOOD("PubChem class - food", 134217728, null,null,null, new Publication("Kim S et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021", "10.1093/nar/gkaa971")),

LOTUS("LOTUS", 268435456, "id", "lotus", "https://lotus.naturalproducts.net/search/simple/%s", new Publication("Rutz A et al., The LOTUS initiative for open knowledge management in natural products research. eLife. 2022", "10.7554/eLife.70780")),
FooDB("FooDB", 536870912, "fooddb_id", "foodDB", "https://foodb.ca/compounds/%s", new Publication("www.foodb.ca", null)),//todo not published yet?
Expand Down Expand Up @@ -87,24 +87,16 @@ public enum DataSource {
public final String realName;
public final String sqlIdColumn;
public final String sqlRefTable;
public final long searchFlag;
public final String URI;

public final Publication publication;
public final boolean mines;

DataSource(String realName, long flag, String sqlIdColumn, String sqlRefTable, String uri, Publication publication) {
this(realName, flag, sqlIdColumn, sqlRefTable, uri, flag, false, publication);
}

DataSource(String realName, long flag, String sqlIdColumn, String sqlRefTable, String uri, long searchFlag, boolean mines, Publication publication) {
this.realName = realName;
this.flag = flag;
this.sqlIdColumn = sqlIdColumn;
this.sqlRefTable = sqlRefTable;
this.URI = uri;
this.searchFlag = searchFlag;
this.mines = mines;
this.publication = publication;
}

Expand Down Expand Up @@ -147,12 +139,16 @@ public static DataSource[] valuesALLBio() {
return Arrays.stream(DataSource.values()).filter(DataSource::isBioOnly).toArray(DataSource[]::new);
}

public static DataSource[] valuesNoALL() {
return Arrays.stream(DataSource.values()).filter(it -> it != ALL).toArray(DataSource[]::new);
/**
*
* @return all actual @{@link DataSource}s excluding the 'meta' sources ALL and BIO which represent a combination of individual sources.
*/
public static DataSource[] valuesNoMetaSources() {
return Arrays.stream(DataSource.values()).filter(it -> (it != ALL) && (it != BIO) ).toArray(DataSource[]::new);
}


private final static DataSource[] BIO_DATABASES = new DataSource[] {MESH, HMDB, KNAPSACK,CHEBI,KEGG,HSDB,MACONDA,METACYC,GNPS,TRAIN,YMDB,PLANTCYC,NORMAN,SUPERNATURAL,COCONUT,BloodExposome,TeroMol,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONSAFETYANDTOXIC,PUBCHEMANNOTATIONFOOD,LOTUS,FooDB,MiMeDB,LIPIDMAPS,LIPID};
private final static DataSource[] BIO_DATABASES = new DataSource[] {MESH, HMDB, KNAPSACK,CHEBI,KEGG,HSDB,MACONDA,METACYC,GNPS,TRAIN,YMDB,PLANTCYC,NORMAN,SUPERNATURAL,COCONUT,BloodExposome,TeroMol,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONSAFETYANDTOXIC,PUBCHEMANNOTATIONFOOD,LOTUS,FooDB,MiMeDB,LIPIDMAPS};

// 4294401852
private static long makeBIOFLAG() {
Expand All @@ -166,7 +162,7 @@ private static long makeBIOFLAG() {

public static long makeALLFLAG(){
long allflag=0L;
for(int i = 1; i < 32; i++ ){
for(int i = 1; i <= 32; i++ ){ //<= 32 instead of < 32 to include LIPID flag (32). however, I don't know if this even has some influence, because these are not in the database.
if (i==7 || i==13 || i==15 ||i==19) continue;
allflag |=(1L << i);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,7 @@ public static Path getDatabaseDirectory() {
lastEnumBit = bits.cardinality();

NON_SEARCHABLE_LIST = new HashSet<>(getSourcesFromNames(
DataSource.TRAIN.name(), DataSource.LIPID.name(), DataSource.ALL.name(),
DataSource.PUBCHEMANNOTATIONBIO.name(), DataSource.PUBCHEMANNOTATIONDRUG.name(),
DataSource.PUBCHEMANNOTATIONFOOD.name(), DataSource.PUBCHEMANNOTATIONSAFETYANDTOXIC.name()
DataSource.TRAIN.name(), DataSource.LIPID.name(), DataSource.ALL.name()
));
}

Expand Down
4 changes: 2 additions & 2 deletions chemistry_base/src/main/resources/sirius.build.properties
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ de.unijena.bioinf.sirius.build.cplex_version=12.7.1
de.unijena.bioinf.sirius.build.glpk_version=1.7.0
de.unijena.bioinf.sirius.build.gurobi_version=9.1.1
de.unijena.bioinf.sirius.build.cbc_version=2.10.8.6
de.unijena.bioinf.sirius.version=5.1.2
de.unijena.bioinf.sirius.version=5.1.3
de.unijena.bioinf.sirius.chem.adducts.positive=[M+H]+,[M]+,[M+K]+,[M+Na]+,[M+H-H2O]+,[M+Na2-H]+,[M+2K-H]+,[M+NH4]+,[M+H3O]+,[M+MeOH+H]+,[M+ACN+H]+,[M+2ACN+H]+,[M+IPA+H]+,[M+ACN+Na]+,[M+DMSO+H]+,[M-H4O2+H]+,[2M+H]+,[2M+Na]+,[2M+K]+
de.unijena.bioinf.sirius.chem.adducts.negative=[M-H]-,[M]-,[M+K-2H]-,[M+Cl]-,[M-H2O-H]-,[M+Na-2H]-,M+FA-H]-,[M+Br]-,[M+HAc-H]-,[M+TFA-H]-,[M+ACN-H]-,[M-ClH-H]-,[M-CH2O3-H]-,[M+C2H4O2-H]-,[M+H2O-H]-,[M-CH3-H]-,[M-CO2-H]-,[M+CH2O2-H]-,[M-H3N-H]-,[2M-H]-,[2M+Cl]-,[2M+Br]-
de.unijena.bioinf.sirius.chem.adducts.negative=[M-H]-,[M]-,[M+K-2H]-,[M+Cl]-,[M-H2O-H]-,[M+Na-2H]-,[M+FA-H]-,[M+Br]-,[M+Hac-H]-,[M+TFA-H]-,[M+ACN-H]-,[M-ClH-H]-,[M-CH2O3-H]-,[M+H2O-H]-,[M-CH3-H]-,[M-CO2-H]-,[M-H3N-H]-,[2M-H]-,[2M+Cl]-,[2M+Br]-
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#
de.unijena.bioinf.fingerid.customdb.version=3
de.unijena.bioinf.fingerid.version=3.0.3
de.unijena.bioinf.fingerid.version=3.0.5
#
### Server props
# REST (DB) settings
Expand Down
14 changes: 14 additions & 0 deletions io/src/main/java/de/unijena/bioinf/babelms/MsExperimentParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,16 @@
import java.lang.reflect.InvocationTargetException;
import java.nio.file.Path;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Consumer;

public class MsExperimentParser {

protected static final Map<String, Class<? extends Parser<Ms2Experiment>>> KNOWN_ENDINGS = addKnownEndings();

protected static final Set<String> LCMS_ENDINGS = Set.of(".mzml", ".mzxml");

// there is no good solution without writing the endings here explicitly (otherwise DESCRIPTION can not be used in annotations)
public static final String DESCRIPTION = ".ms, .mgf, .mzxml, .mzml, .cef, .msp, .mat, .mb, .mblib, .txt (MassBank), .json (GNPS, MoNA), .zip";

Expand Down Expand Up @@ -104,10 +107,21 @@ public static boolean isSupportedFileName(final @NotNull String fileName) {
return isSupportedEnding(fileName.substring(index));
}

public static boolean isLCMSFile(final @NotNull String fileName) {
int index = fileName.lastIndexOf('.');
if (index < 0)
return false;
return isLCMSEnding(fileName.substring(index));
}

public static boolean isSupportedEnding(final @NotNull String fileEnding) {
return KNOWN_ENDINGS.containsKey(fileEnding.toLowerCase());
}

public static boolean isLCMSEnding(final @NotNull String fileEnding) {
return LCMS_ENDINGS.contains(fileEnding.toLowerCase());
}

private static Map<String, Class<? extends Parser<Ms2Experiment>>> addKnownEndings() {
final Map<String, Class<? extends Parser<Ms2Experiment>>> endings = new ConcurrentHashMap<>(3);
endings.put(".ms", JenaMsParser.class);
Expand Down
1 change: 1 addition & 0 deletions sirius_cli/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies {
implementation 'org.apache.commons:commons-collections4:4.4' //todo deprecated, can be removed with old SIRIUS ProjectSpaceManager
implementation "org.apache.commons:commons-configuration2:$commons_configuration_version"
implementation 'com.auth0:java-jwt:3.16.0' //jwt decoder
implementation 'org.apache.poi:poi-ooxml:5.3.0' // for exporting xlsx summaries

// wrong place END

Expand Down
Loading

0 comments on commit dcff3cd

Please sign in to comment.