Alle berichten van ken

A first look at NGS data

Let’s say you have run the DNA of your study species on a next generation sequencing machine, such as Illumina HiSeq. You now have a huge collection of short sequences to make sense of. The data is a text file that is too big to open in a regular text editor.

To have a look at the raw data, type:
less [name_of_data_file]

The format (fastq) tends to evolve, but lately it looks something like this:

@HWI-962:55:C0A4UACXX:4:1101:1375:2341 1:N:0:TCTCAG

The first bit includes the machine name and information of the run and position of the cluster on the flowcell. The first line ends with the barcode sequence. Then follow nucleotide sequence and a quality score. The quality scores can be in various formats.
You will want to have a closer look at the data quality before you proceed. The fastcq toolkit is very useful for this.
It generates quality plots like this:


The quality of the first few bases is slightly lower, because the machine starts sequencing at lower intensity, which allows it to locate the clusters more unambiguously. You’ll also notice that the quality drops off at the end of the read, which is due to reagents getting old. This is normal, and in this case it stays within an acceptable range. However, some trimming or clipping is often necessary. The fastx toolkit can be used for this. Alternatively, you can use a package that does all the trimming, clipping, adapter removal etc automatically. Many of these tend to be quite harsh. I tend to use Trimmomatic, which seems to do a good job. The command line looks a bit messy. Here is an example:

java -classpath /usr/local/Trimmomatic/Trimmomatic-0.17/trimmomatic-0.17.jar org.usadellab.trimmomatic.TrimmomaticPE -phred33 read1.fastq read2.fastq read1_forward_paired.fq read1_forward_unpaired.fq read2_reverse_paired.fq read2_reverse_unpaired.fq ILLUMINACLIP:illuminaClipping.fa:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Turkey 2012

Better known for its holiday potential, the Antalya region of Turkey is an excellent place for orchids. Spending a few hours in a plane full of jobbo’s on their way to the compounds along the Turkish riviera was a bit of a culture shock. However, a short drive took us up into the hills where the people are friendly, the tea is sweet and the orchids are too.

Cephalanthera kurdica

Finding orchids in Turkey requires a bit of effort. We visited lots of graveyards (where the people don’t collect orchid root bulbs for salep and the goats don’t eat orchids), few of which were actually any good. But then, some of them were very, very good.

Ophrys antalyensis

The other place to look for orchids is high up in the hills, far away from everybody else.

Orchis spitzelli

After a week of orchid hunting, we couldn’t resist checking out the brown fish owls at Oymapinar. For those who don’t know the story: a few years ago, a population of brown fish owls was discovered in Turkey, a species that was thought to be extinct west of India. The birds were found along streams in remote regions of Antalya province. Meanwhile over at Oymapinar, tourists were being shown large owls as part of a boat trip on a reservoir on a daily basis for years. In 2011, a birder happened to be on one of these boats together with his family. When he saw the owl, he thought “what the f***!!!” Since then the tourists are joined by birders from all over Europe. So, we joined some 30 Russian holiday makers and enjoyed the sight of three brown fish owls – accompanied by the sound of loud disco music. Surely one of the more surreal birding experiences: having one of the rarest birds of Europe stare at you with “I’m sexy and I know it” pounding your eardrums.