Setup and preparations
For this hands-on session, we will use the same setup as before.
The required software has been installed in two separate Conda environments which you will need to activate as follows:
conda activate Fa5
After activating this environment, you’ll be able to use most tools.
For one specific tool named ‘gatk’, you’ll need to use this environment:
conda activate gatk
To check if these environments are installed and if you’ve remembered their names correctly, you can list all installed environments by running:
conda env list
If any additional software is needed, the teaching team will assist you.
What is a Conda environment?
A Conda environment is an isolated workspace where you can install and manage specific software packages and their dependencies. This allows you to have different versions of software installed without causing conflicts between them.
In bioinformatics, you often need different tools for various tasks (like genome assembly, variant calling, or phylogenetic analysis). Each tool may require different software libraries or specific versions of those libraries. Conda environments help you keep everything organized by creating separate environments for each set of tools.
How does it work?
Creating an environment: You can create a new environment to install the tools and software packages required for a specific project.
Activating an environment: To work in a specific environment, you need to “activate” it. This makes the tools installed in that environment available to use. Typically you will run the command conda activate env_name
Deactivating and switching environments: When you’re done with an environment, you can deactivate it with the command conda deactivate
. You can easily switch between environments depending on which task you’re working on.
Why is it useful?
Conda environments are useful because different bioinformatics tools may require specific software versions or dependencies to run properly. By using Conda, you can avoid compatibility issues. For example, one project may need Python 3.9, while another might require Python 3.8. Conda lets you handle such cases seamlessly by managing these dependencies within isolated environments.
More info
Some more info on conda and how to enable it on your own machine can be found here: https://cuypers-wim.github.io/FA5-bioinformatics/content/unix/appendix-unix.html#further-reading
About the excercises and storyline
In the upcoming sections, we will be working with Amplicon data provided by the Malariology Unit at ITM. This data is not derived from the entire chromosome of Plasmodium; instead, it focuses on specific regions that have been amplified using PCR. This targeted approach is often more cost-effective compared to whole-genome sequencing.
Additionally, we have included exercises that follow a storyline relevant to a bioinformatician working in a hospital in Ethiopia. These exercises will be clearly marked in boxes titled “Storyline: Malaria in Ethiopia.” Below, you’ll find the beginning of the storyline, and by the end of these sessions, you’ll be able to solve the mystery!
A patient arrives at your research clinic with a high fever. Despite initially suspecting malaria, a rapid diagnostic test (RDT) is negative. However, a microscopic examination of the blood sample reveals Plasmodium parasites, confirming malaria. Microscopy allows you to determine that the parasite is Plasmodium falciparum, a species known to cause severe malaria. This situation raises several questions: How was the infection acquired? Is it a local strain or one from recent travel? Why did the rapid test fail?
The patient, originally from Ethiopia—where both P. falciparum and P. vivax are prevalent—has recently traveled to Southeast Asia, where both species also circulate. The failure of the rapid test could be linked to deletions in the hrp2 and hrp3 genes, which are common in P. falciparum and cause some RDTs to miss infections.
Your task is to analyze the parasite’s genetic material to confirm which strain it is, determine its likely geographic origin, and assess potential antimicrobial treatments. This analysis will also help clarify why the rapid test failed and whether the strain carries any mutations affecting test performance or drug resistance.
Succes!!