# Genome Assembly using Velvet

## What the heck is Genome Assembly?

Genome assembly is the process of constructing long contiguous sequences from shorter sequences. Think of this problem at a genomic scale. Same approach, just a lot more data.

## What the heck is Velvet?

Velvet is a genome assembler that uses a de Bruijn graph to generate contigs. If you are interested in reading the paper describing how Velvet works, feel free to read Velvet: Algorithms for de novo short read assembly using de Bruijn graphs.

### Installing Velvet

I say that the most important part of using software is figuring out how to install it. Sometimes it can be harder than you think.

Here is how you install Velvet:

• Optional: Check out the sweet Velvet website complete with web 2.0 design. Al Gore would be proud.
2. Go to the directory in which you downloaded the file $cd ~/Downloads and unzip the file $ tar zxvf velvet_1.2.10.tgz
3. Go to the just unzipped directory $cd velvet_1.2.10 and compile Velvet by issuing the command $ make
• Error warning!! If you get an error that says something along the lines of fatal error: zlib.h: NO such file or directory then try installing the package zlib1g-dev then running $make again 4. If you didn’t get any errors, it looks like you have installed Velvet! If you got errors, Google the error and figure out how to fix it. ### Running Velvet To execute the Velvet program make sure that you are in the velvet_1.2.10 directory and then type $ ./velveth and it should return a short help message. If it didn’t, check to see if you are in the correct directory by issuing the command $pwd. # Assembling the Zika virus genome ## Prepping the files for assembly We have some reads from the Zika virus, fresh from Florida. We want to assemble the Zika virus genome to help find a cure. Download the reads zika.read1.fastq and zika.read2.fastq, then run this command $ ./velveth zika_genome 20 -fastq -shortPaired ~/Downloads/zika.read1.fastq ~/Downloads/zika.read2.fastq. This command is a sort of preprocessing command that constructs your dataset so that it can assemble it. Here are what the parameters mean:

• ./velveth- the program that we use
• zika_genome- this is the output directory of all the files
• 20- this is the hash (in other words, kmer) size that we use, you will want to play around with this
• -fastq- this is the type of input files that we have
• -shortPaired- this is the type of input reads that we have
• ~/Downloads/zika.read1.fastq- this is the first file of reads
• ~/Downloads/zika.read2.fastq- this is the second file of reads

Note: You can have an unlimited number of input files.