This repository contains all necessary files for an Introductory Python Bioinformatics Workshop, using BioPython and other related tools. To download, click on the large "Code" button on the top right, or click here.
The workshop introduces multiple basic bioinformatics topics, chiefly using BioPython, including:
- Accessing NCBI databases (e.g., for nucleotide sequences or taxonomy).
- Reading and writing sequence files with BioPython.
- Aligning sequences (e.g., with MAFFT or Clustal-Omega).
- Building phylogenetic trees (with FastTree or IQ-Tree).
- Parsing those trees and creating publication quality tree images with ETE-3.
If you have never used Python, I have a separate Introductory Python Workshop here.
If you need a Python refresher and/or an introduction to Jupyter, I recommend using the Jupyter & Python refresher that I produced here.
This workshop assumes that you are at least somewhat familiar with conda
and Jupyter. If you are not, please take a look at the Jupyter intro linked above, which covers conda.
- Make sure you have
conda
(through either Anaconda or Miniconda) and are using aconda
environment that has all the required packages. For example, in case you haven't already, I would recommend doing the following in a terminal(Mac/Linux) or the Anaconda prompt (Windows) after installing conda:
conda update --all -y
conda config --add channels bioconda
conda config --add channels conda-forge
conda create -n bioinfo jupyter jupyterlab biopython
This will create an environment named bioinfo
, containing all the necessary packages to get started with the workshop.
- If you already have such an environment, update it regularly (every couple of weeks):
conda update -y -n bioinfo
- Activate the environment (unless you're using the base/root environment):
conda activate bioinfo
- Start Jupter Notebook:
jupyter notebook
or Jupyter Lab: Lab:
jupyter lab
A browser window with Jupyter will open. Lab and Notebook are roughly equivalent. Lab is formally the future, but I would still recommend Notebook because it presently runs better with some tools (most significantly Plotly, which we won't use here). For the purposes of this workshop, both will work equally well.
I recommend that you:
- Open the file called .BEG.ipynb for the lesson that you want to follow.
- Open the corresponding .html file.
- Use the HTML file as reference to write the code in the .BEG.ipynb file.
Once you're done, your Notebook should look like the .END.ipynb file. Alternatively, you could just use the .END.ipynb file, which has all the code in it, but you would miss out on learning by writing the code yourself.
You need to activate the Windows Subsystem for Linux (WSL) and have a valid Linux installation inside your Windows machine. This is a little involved but much easier than in previous times. You don't need a virtual machine or dual-boot system. Linux just works like an app inside your Windows machine, but since Linux is so powerful, you need to allow some special permissions. Take a look here: https://docs.microsoft.com/en-us/windows/wsl/install-win10
A number of packages are not available for Windows via conda
. At the time of writing, this includes:
In the workshop documents, I will be installing these packages from inside the Jupyter notebooks using conda
. However, at the time of writing you cannot do that on Windows for these packages. I recommend that you first check whether they are available when you read this, but assuming they are not, then please download and install at least MAFFT and FastTree*, since we will be using both of them in the workshop. Then, whenever we run one of those programs inside the Jupyter cells in the workshop, you just run them outside Jupyter following the software instructions as you would any other software.
*For your real-world research, I strongly recommend that you look into IQ-Tree over FastTree.
©️Copyright: Everything in this repository (i.e. all of this workshop) is released under a CC-NC-BY-SA v4.0 license. That stands for Creative-Commons, non-commercial, attribution-required, share-alike license. Please do read the specifications of the license but, in brief, this means that I am happy for you to use any of these documents or code in your own research or teaching. If you do redistribute it, however -- whether you change it a little or a lot or leave it exactly as is -- you must credit the original, and you must never ever make any money off of the documents themselves, even if you make a lot of changes to them. You can of course get paid for teaching or for the work you create using what you learn here, but it must be clear to students that the documents themselves are all free (in both cost and access). Again, this applies both to my originals and to anything you make from them.
📝 I am always very happy to receive comments and suggestions for improvement at [email protected]