2 Setting up your own Unix shell
In order to get started, you need to get access to a Unix environment of your own. For this course hosted at the AHRI Institute, you will work with locally with environment installed on the computers from AHRI.
There are also online playgrounds/simulations available to try out the Unix shell, for example https://sandbox.bio/tutorials/playground. These are great to learn and we highly recommend checking out some of the tutorials on there, but the downside of course is that they are not true true environments and you cannot interact with your own files. You can use this site to follow along while learning some of the basic unix commands, but you will need to switch to a different option for the exercises eventually.
git
?
GitHub is a place to host code and software via a tool named git
, which is a version control system. It allows you to keep track of the history of your code, easily revert changes and allows for collaborating with multiple people on the same project. We will not go into further detail on using version control, but for now just remember that it can play an important role in scientific reproducibility.
If you want to learn more about git
already, you can have a look at the following resources:
Similar to Binder, Codespaces are development environments that are hosted in the cloud. This is a paid service provided by GitHub/Microsoft, which offers 60 hours of free usage per individual per month. Instead of Jupyter notebooks, Codespaces use code editors, like VSCode and Jetbrains IDEs, which come bundled with a bash
terminal too.
You can find more info in the GitHub Codespaces docs.
Binder is a service that allows people to share a customized compute environment based on a Git repository. It is mainly aimed at sharing Jupyter Notebooks (Python), but it also supports RStudio, Shiny and fortunately for us, a plain bash
terminal too.
You can find more info on the Binder website.
2.1 Local Unix environment
During this course, we will be using the Ubuntu operating system, which includes a Unix shell. By default, Ubuntu uses the bash shell, though you may also encounter zsh. For the tasks in this course, both will behave similarly. To open the terminal, click on the ‘Terminal’ icon as shown in the image below:
In case you are using a Windows machine, things are slightly more complex and different methods exist, each with their own pros and cons. You could use a fully-fledged virtual machine like VirtualBox to emulate a Linux machine within Windows. Or you could rely on the minimal bash
emulator that comes bundled with git for windows. However, nowadays we recommend that you use the Windows Subsystem for Linux (WSL), which was developed by Microsoft itself. In our opinion, it is one of the most polished methods to get access to a (nearly) full-featured Linux environment from within Windows, without the overhead of a full virtual machine or dual boot setup (dual boot means you install two different operating systems on your machine, and you switch between them when booting). For instructions on how to set it up, you can refer to this section.
2.1.1 Access the course files
Regardless of what type of local Unix environment you use, you will need to download the files that we will be using in our examples and exercises.
For this course, we have already uploaded the files under ~/Desktop/Fa/FA5-bioinformatics/
. When we use relative paths (we’ll explain what this means later), we will work relative to this FA5-bioinformatics
directory.
When you want to copy the data to your own workstation, you can also download this directly on the command line or by manually downloading the files in the correct location.
- Open your terminal and
cd
to a location where you want to place the training files. - Enter the command
git clone https://github.com/Cuypers-Wim/FA5-bioinformatics.git
. - Afterwards, a new directory named
FA5-bioinformatics
will have been created.
It should look similar to this:
$ git clone git clone https://github.com/Cuypers-Wim/FA5-bioinformatics.git
Cloning into 'FA5-bioinformatics'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 7 (delta 0), reused 7 (delta 0), pack-reused 0
Receiving objects: 100% (7/7), done.
Alternatively,
- Browse to https://github.com/Cuypers-Wim/FA5-bioinformatics/archive/refs/heads/main.zip
- Save the
.zip
file in a directory accessible by your Unix environment. For Windows/WSL, the easiest option is to choose the Linux file system (e.g.,\\wsl.localhost\Ubuntu\home\<yourName>
), which is accessible by clicking the Linux/WSL entry in your explorer. - Extract/unzip the file.
2.1.2 WSL installation
If you are using an updated version of Windows 10 (or 11), you should meet all the requirements and can simply follow the installation instructions listed here: https://learn.microsoft.com/en-us/windows/wsl/install. We recommend that you follow the instructions for WSL 2 (default), rather than the older WSL 1, and use the default Ubuntu 22.04 distribution (Linux comes in many different flavours, Ubuntu being one of the more popular ones).
Briefly:
- Open Windows PowerShell as administrator by right clicking your Windows Start Menu or searching for it in your list of applications.
- Type
wsl --install
and press enter. - Afterwards, restart your PC.
- You can then launch WSL by searching for
wsl
orUbuntu
in your start menu. - The first time you launch WSL, you will need to configure it.
If you use software like RStudio or VSCode, you can tell these programs to use WSL as their built-in terminal from now on, instead of Command Prompt.
WSL 2 is the newer version of WSL 1. For most tasks, WSL 2 tends to be much faster, hence why we (and Microsoft) recommend using it in favour of the previous version. However, WSL 2 is only faster when you interact with files that are stored directly on the WSL file system, rather than working directly on the Windows file system. More info on the distinction between these file systems can be found further below and in Microsoft’s WSL documentation.
You can switch between WSL1 and WSL2 on the fly by just calling wsl --set-version <distro_name> 2
(or 1) in PowerShell, so feel free to experiment for yourself.
For a full overview of the differences, check out: https://docs.microsoft.com/en-us/windows/wsl/compare-versions.
2.1.2.1 Configuring WSL
Microsoft also provides an excellent tutorial on setting up your WLS environment, which you can find here.
After installing WSL and a Linux distribution, you will have access to it via its own built-in terminal emulator. It should be located in your Windows Start Menu with a name corresponding to the distribution that you installed, e.g. Ubuntu 20.04 LTS
, or simply wsl
.
The first time you run WSL, you will need to setup a Linux username and password. Note that while you are entering a password, nothing will appear on the screen, but this is intended (blind typing). The username will determine, among other things, the name of your home folder, whereas the password will grant you administrator rights (referred to as super users or admins in Linux land; the sudo
command is used to invoke these rights, see Section A.3).
You will also need to upgrade the packages by running the following command: sudo apt update && sudo apt upgrade
, followed by your password.
For more information, check the docs.
2.1.2.2 Accessing files across the Windows and WSL file systems
Some of the information below might be a bit confusing at this point, but things should become more clear after working your way through the Unix section of this course.
Newer versions of WSL will automatically add a shortcut to the WSL file system in your Windows File Explorer (look for Tux, Linux’ penguin mascot). The file path will look similar to \\wsl$\Ubuntu\home\<user name>\Project
, indicating that Windows treats the WSL file system as a sort of network drive. You can also open a file location in Windows File Explorer from within a WSL terminal (e.g. after you browse to a particular directory cd ~/my-project
) by simply using the command explorer.exe .
(don’t forget the dot!).
Vice versa, you can also access the Windows file system from within WSL because it is mounted under /mnt/c
. So, you could for example do something like cp /mnt/c/Users/<user name>/Downloads/file-downloaded-via-webbrowser ~/projects/filename
.
More information can be found in the WSL documentation.
2.1.2.3 Windows Terminal
Even though WSL comes with its own terminal application, it is rather bare-bones and can make some operations like copying and pasting via CTRL+C
/CTRL+V
a bit tricky (you will need to use CTRL+SHIFT+C
to copy and right mouse click to paste). Fortunately, Microsoft has also been working on a new terminal emulator that is much nicer to work with. Meet the Windows Terminal.