Skip to content

WebappInstallation

Cunliang Geng edited this page Jul 22, 2022 · 10 revisions

Installing the NPLinker web application

Initial steps

You will need to install Docker on your local machine.. This guide assumes that you are using Windows, but most of the details will remain the same for other platforms like Linux and macOS.

To download Docker, go here.

Once Docker is installed, the next step is to download the NPLinker image from the Docker Hub repository. Start the Docker application, then open a terminal/console window (Windows: Win+R, type "cmd", hit return, Mac: open the "Terminal" app) and run the following command:

docker pull nlesc/nplinker:latest

You should see a series of files being downloaded by Docker, followed by "Pull complete" messages as each one finishes. The full download may take a few minutes depending on your internet connection speed as the full image is several gigabytes in size. This download process only has to be done once, and any future releases will typically only require smaller downloads to update existing local files.

Configuration

Before you can run the web application, you need to have a dataset that will be loaded into it. As NPLinker finds links between genomics and metabolomics objects, it requires data from GNPS (metabolomics) and BiG-SCAPE and antiSMASH (genomics). The loading process has been designed to be flexible, so there are several supported options:

  1. Load a dataset consisting of files stored entirely on your local machine
  2. Load a remotely hosted dataset from the Paired Omics platform by supplying a project ID listed there (note: not all projects are currently supported by NPLinker)
  3. Combine data from the Paired Omics platform, and local data. Typically this would mean NPLinker retrieving GNPS data from the platform and combining it with your own locally-stored antiSMASH/BiG-SCAPE data.

The simplest option to get started is to load a dataset from the Paired Omics platform. We will use the dataset with ID MSV000079284, which you can view details about on this page. The metadata for this project stored on the platform contains links to all the information NPLinker requires, so the only thing you need to provide is a basic configuration file to tell the application which project ID should be loaded.

NOTE: for a detailed guide on loading a local dataset, see LoadingLocalData.

Setting up a shared folder

The Docker version of NPLinker has no direct access to the filesystem on your local machine, so the first step is to choose a folder that you will use for sharing files with the NPLinker application. Create an empty folder called nplinker_shared anywhere you like, and then create an empty text file inside it called nplinker.toml. If you want to load a local dataset, this is also where those files would need to be stored (more on this later).

NPLinker configuration file

Next, you need to add a few lines to the nplinker.toml configuration file to tell the web application which dataset it should load. Open the file in any text editor and paste in these lines:

[dataset]
root = "platform:MSV000079284"

Then save and close the file. The root value is all that is required to load a dataset. It can be set to a local folder, or combined with the "platform:" prefix and a valid ID to the Paired Omics platform, download the selected project metadata, and then retrieve the individual dataset files. For more information on the configuration file, see here.

Running the web application

With the configuration file created, you're ready to launch the web application and open it in your browser. To start the app, run this command from your terminal/console window:

docker run --name webapp -p 5006:5006 -v <your shared folder>:/data:rw nlesc/nplinker

where <your shared folder> is the full path to the folder you created above. For example:

docker run --name webapp -p 5006:5006 -v c:\users\myusername\desktop\nplinker_shared:/data:rw nlesc/nplinker

The parameters are all quite simple: --name tells Docker to assign the name webapp to this container, the -p ... tells it to open a port so you can connect to the web app inside, and -v tells Docker where your shared folder is located.

Assuming everything is configured correctly, the NPLinker application will be launched and you will see messages being printed out as it begins to download and extracts the various files from the project specified in the configuration file. The downloaded files are cached inside your shared folder, so they won't be downloaded again if you run the web app more than once for the same dataset.

Here is a brief summary of the steps NPLinker performs during this process:

  1. Retrieve the project data from the paired platform
  2. Extract the URL for the corresponding GNPS job output and download this file from the GNPS website
  3. For each of the genome labels listed in the project data, attempt to resolve them to a RefSeq accession ID, look up the ID on the antiSMASH website, and download the genome data (if available)
  4. Download a copy of the MiBIG database (JSON format). The version retrieved can be controlled using the mibig_version setting in the configuration file, and the current default is "1.4".
  5. Extract all the downloaded files to the following path on your local machine: /nplinker_data/pairedomics/extracted/
  6. (Optionally) Run BiG-SCAPE on the downloaded antiSMASH data. BiG-SCAPE is packaged in the Docker image along with NPLinker, and is run by default if no local BiG-SCAPE data is provided. NOTE: NPLinker includes some BiG-SCAPE parameters by default (e.g. --mibig, --clans-off). If you want to adjust this, you will need to edit the extra_bigscape_parameters setting described in the sample configuration file.
  7. Parse the GNPS/BiG-SCAPE/antiSMASH data to create sets of Molecular Families, Spectra, GCFs, and BGCs
  8. Run Metcalf scoring on the objects with a relatively low threshold (which can be changed in the configuration file if required). This is done in order to preemptively filter out low scoring objects which do not appear to have any interesting links, which in turn increases the performance of the application
  9. Launch the web interface to allow you to explore the dataset and links
  10. Depending on the dataset, the loading process can take several minutes on the first attempt. This is a result of the multi-step process of downloading and extracting the dataset files, running BiG-SCAPE, parsing the files, and loading everything into the web application. However, the results of the most time-consuming steps (downloading files, running BiG-SCAPE) are all cached locally, so subsequent runs will be much faster.

When you see the following text displayed, it indicates the web application has completed all the loading steps and is ready to use:

    ==========================
    NPLinker server loading completed!
    ==========================

Once this appears, move on to using the web application