14  Downloading Genomic Data

14.1 Downloading Reference Genomes

To find a high-quality reference genome for mapping your reads, a reliable resource is the NCBI Genomes section.

You can search for the reference genome of interest on this site. It’s important to note that genomes with a RefSeq accession tend to be of higher quality compared to those with only a GenBank accession. On the download page, you will have the option to create a ZIP file containing your genome in FASTA format, along with related files such as annotation files.

14.2 Downloading Sequencing Reads

After identifying the sequencing reads you need for your analysis, there are several methods to download them. One command-line option is the fasterq-dump program from NCBI, which allows you to retrieve sequencing reads directly by specifying their SRA accession number. These accessions can be found either from an existing publication or directly from the Sequence Read Archive (SRA).

For this course, we will use the SRA Explorer tool. Here, you can search for the accessions of interest to generate download links or a script to download the reads using wget.

In the example below, we searched for ‘Salmonella Concord’, selected six random samples, and downloaded them by prepending ‘wget’ to each download link.

SRA explorer get links

SRA explorer get links

The download script for the selected samples looks as follows:

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/006/ERR9516346/ERR9516346_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/006/ERR9516346/ERR9516346_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/004/ERR9516344/ERR9516344_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/004/ERR9516344/ERR9516344_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/003/ERR9516343/ERR9516343_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/003/ERR9516343/ERR9516343_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/005/ERR9516345/ERR9516345_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/005/ERR9516345/ERR9516345_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/007/ERR9516347/ERR9516347_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/007/ERR9516347/ERR9516347_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/008/ERR9516348/ERR9516348_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR951/008/ERR9516348/ERR9516348_2.fastq.gz