Skip to content

AndersenLab/CAENDR

Repository files navigation

CaeNDR

CaeNDR is the code used to run the Caenorhabditis elegans Natural Diversity Resource website.

GCP Credentials

Ask in MS teams for the DevOps service-account json file. Create a local folder under your home directory named ~/.gcp and copy the service account json file to that folder.

$ mkdir ~/.gcp
open -a Finder ~/.gcp 

The last line should open MacOS Finder on the ~/.gcp/ folder. Drop the .json service account file there.

MacOS setup - Requirements

Visual Studio Code

Download from https://code.visualstudio.com/ Install the extension: "Python" (should have 96M downloads)

Docker Mac

Download from https://docs.docker.com/desktop/install/mac-install/

Homebrew

$ cd $HOME
$ mkdir homebrew && curl -L https://github.com/Homebrew/brew/tarball/master | tar xz --strip 1 -C homebrew

Find out which shell you are using:

echo $SHELL

If you see "bash" then update the ~/.bash_profile in the next step. If you see zsh as your shell, then update the file ~/.zprofile

Add this line to the bottom of your file ~/.bash_profile, or .zprofile.

export PATH=$HOME/homebrew/bin:$PATH

Set terminal to run in x86_64 mode under Rosetta

  • Open Finder on your Mac
  • Navigate to Applications/Utilities/Terminal.app,
  • Right-click/GetInfo
  • Enable the checkbox "Open with Rosetta".
  • Close and reopen the terminal app.
  • Inside Terminal, type;
$ arch

Expected result:

i386

Install Dependencies:

arch -x86_64 brew  update
arch -x86_64 brew install pyenv OpenSSL readline gettext xz

Edit your ~/.bash_profile and add this to the bottom of the file. If the file ~/.bash_profile doens't exist check if you are using a different shell (eg: zsh, etc). In that case you might need to edit the file ~/.zshrc or ~/.zprofile.

# if using bash, do 
nano ~/.bash_profile
# if using zsh then 
nano ~/.zprofile

export PATH=$HOME/.pyenv/bin:$PATH
eval "$(pyenv init --path)"
eval "$(pyenv init -)"

# if using bash, do 
source ~/.bash_profile
# if using zsh then 
source ~/.zprofile

pyenv install 3.7.12
pyenv global 3.7.12
pip install virtualenv

Expected Outputs:

$ python -V
Python 3.7.12

$ virtualenv --version
virtualenv 20.13.0 from /Users/rbv218/.pyenv/versions/3.7.12/lib/python3.7/site-packages/virtualenv/__init__.py

Running local (work-in-progress)

Open one terminal window and run:

export GOOGLE_APPLICATION_CREDENTIALS=~/.gcp/NAME_OF_THE_SERVICE_ACCOUNT_FILE.json
export ENV=development
cd src/modules/site
make clean
make configure
make cloud-sql-proxy-start

Check that the cloud-sql-proxy docker container is running:

$ docker ps

Expected Result:

CONTAINER ID   IMAGE                                            COMMAND                  CREATED       STATUS        PORTS                    NAMES
9413d4f448f0   gcr.io/cloudsql-docker/gce-proxy:1.28.1-alpine   "/cloud_sql_proxy -i…"   3 weeks ago   Up 23 hours   0.0.0.0:5432->5432/tcp   caendr-cloud-sql-proxy-1

Please note that the CONTAINER_ID will be different on your machine.

Keep this docker container running while running the site below.

To make changes to the NEW site-v2 templates*

Open a second terminal window

export GOOGLE_APPLICATION_CREDENTIALS=~/.gcp/NAME_OF_THE_SERVICE_ACCOUNT_FILE.json
export ENV=development
cd src/modules/site-v2
make configure
make dot-env
make venv
code ../../..

The last command will open Visual Studio Code at the root of the project. From the DEBUG->List of options, select "Run Site-v2 (requires a local Postgres instance or cloud-sql-proxy)" and click "Play'.

Stopping the Database Proxy

Once you are done working on the site and no longer need the database, then close the connection:

make cloud-sql-proxy-stop

if this does not stop the container, do this:

docker ps

Expected Result:

CONTAINER ID   IMAGE                                            COMMAND                  CREATED          STATUS          PORTS                    NAMES
75ef941c1e64   gcr.io/cloudsql-docker/gce-proxy:1.28.1-alpine   "/cloud_sql_proxy -i…"   29 minutes ago   Up 29 minutes   0.0.0.0:5432->5432/tcp   caendr-cloud-sql-proxy-1

Find the CONTAINER_ID (first column) and stop the container manually with:

$ docker kill 75ef941c1e64

To make changes to the Legacy site (currently in production) Open a second terminal window

export GOOGLE_APPLICATION_CREDENTIALS=~/.gcp/NAME_OF_THE_SERVICE_ACCOUNT_FILE.json
export ENV=development
cd src/modules/site
make configure
make dot-env
make venv
code ../../..

Linux Setup

Setup requires make which can be installed with:

sudo apt-get update && sudo apt-get install build-essential

Makefile Help


To list all available MakeFile targets and their descriptions in the current directory:

make

or

make help

Requirements


To automatically install system package requirements for development and deployment:

make configure

Setting the Environment and gcloud login


To configure your local environment to use the correct cloud resources, you must set the default project and credentials for the Google Cloud SDK and define the 'ENV' environment variable:

gcloud init
gcloud auth login
gcloud auth application-default login
gcloud auth configure-docker
export ENV={ENV_TO_DEPLOY}

Running modules locally


Set ENV and GOOGLE_APPLICATION_CREDENTIALS environment variables:

export MODULE_DB_OPERATIONS_CONNECTION_TYPE=localhost
export MODULE_DB_TIMEOUT=3
export ENV={ENV_TO_DEPLOY}
export GOOGLE_APPLICATION_CREDENTIALS={PATH_TO_GCP_CREDENTIALS}

If the module requires a connection to the Cloud SQL instance, you will need to keep the Google Cloud SQL proxy running in the background:

./cloud_sql_proxy -instances=${GOOGLE_CLOUD_PROJECT_ID}:${GOOGLE_CLOUD_REGION}:${MODULE_DB_OPERATIONS_INSTANCE_NAME} -dir=/cloudsql &

or

make cloud-sql-proxy-start

Then switch to a different terminal prompt and change to the module's src directory:

make run

Deployment


Pre-requisites: Ensure that you are logged in to the GCLOUD GCP project in the CLI, or using a devops service account.

Open a terminal at the root of the project:

  1. Set ENV and GOOGLE_APPLICATION_CREDENTIALS environment variables:

    export ENV={ENV_TO_DEPLOY}
    export GOOGLE_APPLICATION_CREDENTIALS={PATH_TO_GCP_CREDENTIALS}
  2. Increment the versions for each module that is being updated as part of the deployment:

    • Update the version property for the module in the /env/{env}/global.env
    • Update version in the file src/modules/{module_name}/module.env
  3. Move to each module folder and configure the modules for deployment:

    cd src/modules/{module_name}
    make configure
    • The module root folder should now contain a .env file
    • The module root folder SHOULD NOT contain a venv folder
  4. Publish the module to GCR (src/modules/{module_name}):

    make publish
    • When the command completes, check the GCR and confirm your image with the proper version tag is appearing
  5. Deploy new app version:

    make cloud-resource-deploy

Troubleshooting:

  • Even if ENV and GOOGLE_APPLICATION_CREDENTIALS are set correctly you will need to be logged into gcloud and configure docker to enable publishing containers to GCR since the service account does not have permissions to publish.
  • Sometimes after deployment of the full application the ext_assets folder will not copy to the GCP static bucket, but terraform state will reflect the correct bucket resources. You'll notice the CeNDR logo and worms video will not show up on the home page. Simply redeploy the full application and the assets should be correctly copied to the GCP static bucket, fixing the issue.
  • Deployment will not work if a virtual environment exists in img_thumb_gen, giving an error like the following:
    ╷
    │ Error: Error while updating cloudfunction configuration: Error waiting for Updating CloudFunctions Function: Error code 14, message: The service has encountered an error during container import. Please try again later
    │ 
    │   with module.img_thumb_gen.google_cloudfunctions_function.generate_thumbnails,
    │   on modules/img_thumb_gen/cloud-function.tf line 1, in resource "google_cloudfunctions_function" "generate_thumbnails":
    │    1: resource "google_cloudfunctions_function" "generate_thumbnails" {
    │ 
    ╵
    make: *** [cloud-resource-deploy] Error 1
    
    Remove the venv directory and try redeploying.
  • Due to a race condition, sometimes Terraform will attempt to access the new site image before it has been built and published to GCP. Manually publishing the image by running make publish in src/modules/site (or src/modules/site-v2), then deploying, should fix this issue.

Deployment of Individual Components


Targeted deployment is under construction until isolated TF states can be establish for each module.

Website Requirements


To allow the website to write to the google sheet where orders are recorded, you must add the Google Sheets service account as an editor for the sheet {ANDERSEN_LAB_ORDER_SHEET}: {GOOGLE_SHEETS_SERVICE_ACCOUNT_NAME}@{GOOGLE_CLOUD_PROJECT_ID}.iam.gserviceaccount.com

You must also add the google analytics service account user to the Google Analytics account to view the 'about/statistics' page: {GOOGLE_ANALYTICS_SERVICE_ACCOUNT_NAME}@{GOOGLE_CLOUD_PROJECT_ID}.iam.gserviceaccount.com

Initial Setup


Create a new user and log in to the site. Once the account has been created, you can manually promote it to admin by editing the user entity in Google Cloud Datastore.

Containerized Tools (Nemascan, Heritability, Indel Primer)


Before these tools can be used for the first time, the available container versions must be loaded from docker hub. Visiting the 'Tool Versions' page in the 'Admin' portal will import this data automatically:

Admin -> Tool Versions

SQL Database


These steps describe how to add data to the strain sheet, load it into the site database, then load the strain data, wormbase gene information, and strain variant annotation data into the site's SQL database:

  • Admin -> Strain Sheet: The google sheet linked here must be populated with the strain data that you want to load into the site's internal database.
  • Admin -> ETL Operations: click 'New Operation' then 'Rebuild strain table from Google Sheet'. (No other fields are required)
  • Admin -> ETL Operations: click 'New Operation' then 'Rebuild wormbase gene table from external sources' (Wormbase Version number required)
  • Admin -> ETL Operations: click 'New Operation' then 'Rebuild Strain Annotated Variant table from .csv.gz file' (Strain Variant Annotation Version number required). This operation expects the .csv.gz source file to already exist in the Cloud Bucket location described below.

Strain Variant Annotation


The strain variant annotation data csv should be versioned with the date of the release having the format YYYYMMDD, compressed with gzip, and uploaded to:

${MODULE_DB_OPERATIONS_BUCKET_NAME}/strain_variant_annotation/c_elegans/WI.strain-annotation.bcsq.YYYYMMDDD.csv.gz

Release Files


To add a Dataset Release to the site through the Admin panel, you will first have to upload the release files to:

${MODULE_SITE_BUCKET_PUBLIC_NAME}/dataset_release/c_elegans/${RELEASE_VERSION}

using the file and directory structure described in the AndersenLab dry guide

Strain Photos


Strain photos should be named using the format <strain>.jpg and uploaded to a bucket where the img_thumb_gen module will automatically create thumbnails with the format <strain>.thumb.jpeg:

${MODULE_SITE_BUCKET_PHOTOS_NAME}/c_elegans/<strain>.jpg -> ${MODULE_SITE_BUCKET_PHOTOS_NAME}/c_elegans/<strain>.thumb.jpg

BAM/BAI Files


BAM and BAI files are stored in:

${MODULE_SITE_BUCKET_PRIVATE_NAME}/bam/c_elegans/<strain>.bam ${MODULE_SITE_BUCKET_PRIVATE_NAME}/bam/c_elegans/<strain>.bam.bai

Nemascan


Nemascan requires species data to be manually uploaded to cloud storage to make it accessible to the pipeline:

${MODULE_SITE_BUCKET_PRIVATE_NAME}/NemaScan/input_data

FAQ


Q: Why does it look like the site or db_operations are unable to connect to Cloud SQL (PostGres)? A: Check if the server exhausted the max_connections limit. Google Postgres has a hard limit on connections and there is a reserved number of connections for super-admin (backups, etc), that are not available for run-time apps/services/modules. Consider restarting (or stopping and starting) to close all the active connections. In GCP this can be viewed in the POSTGRES tab, select the "Active Connections" from the dropdown to view the stats.

Q: I'm getting errors installing numpy on MacOS running on M1/M2 chip. A: See below:

pip3 install cython
pip3 install --no-binary :all: --no-use-pep517 numpy

Q: Which version of terraform do I need to use? A: Use terraform version 1.1.8. Optional: use tfenv to manage the terraform version

Q: Missing pg_config when running on MacOS? A: Install via homebrew:

brew install postgresql

Q: I'm seeing this error when running make venv from the src/modules/site-v2 folder: "_libintl_textdomain", referenced from: _PyIntl_textdomain in libpython3.7m.a(_localemodule.o) _PyIntl_textdomain in libpython3.7m.a(_localemodule.o) A: Install gettext

$ arch -x86_64 brew install gettext

Q: I'm seeing this error when runing make venv from the src/modules/site-v2 folder: "ModuleNotFoundError: No module named 'readline'" A:

$ arch -x86_64 brew install readline

Q: I'm seeing this error when running make venv from the src/modules/site-v2 folder: "ERROR: The Python ssl extension was not compiled. Missing the OpenSSL lib?" A:

$ arch -x86_64 brew install openssl

Q: I get an ImportError when running the API in VSCode:

    ImportError: cannot import name 'Literal' from 'typing' (/Users/ ... /.pyenv/versions/3.7.12/lib/python3.7/typing.py)

A: The VSCode extension debugpy, version v2024.12.0, appears to break on Python 3.7 -- it's not clear if this is a bug, or intentional dropping of support for older versions. You can get around this by pinning your extension to v2024.10.0 -- see the VSCode docs for instructions.

(If this is a bug, further updates to debugpy might fix this issue. Keep an eye on it, and try the newest version if necessary.)