Human Trickery: Consensus sequence from forest maps

In a world far far away from the forest, humans started sequencing DNA with  something called DNA sequencers [1]. If you don't know what DNA is, this is not the blog for you. I can't explain every damn thing! This is just too much pressure.  Sorry my therapist says I come off aggressive. I am working on it. Here is a link for you. I promise I am a nice person.

Ok. Back to DNA sequencers. DNA sequencers....drum roll....sequence DNA.

Sequencers sequence DNA, but they don't do it in a single stretch. Instead they do it in small chunks...just like the little maps made by forest animals. If you wanted to know what your sequence really looked like, you would have to create something called a-"consensus sequence"[2].

Humans also have something similar to "the Legendary Map". They have the reference sequence.

Humans ever the thieving scoundrels stole the idea of the consensus map and used it to make Consensus Sequences. Then they started talking fancy. Things like- Read Alignment, Genome Assembly, Variant Calling...

In this blog, I shall reveal their trickery to you.

Little Maps

DNA sequencers come out with something called "reads" in fastq format. This is fancy talk for DNA chunk. The size of the chunk varies between sequencers but generally it's about 100 base pairs[3].

The animals stood in line as a Jungle Government official took down their name and their map and made a note about the reliability of the information.Then put everything in a thick binder. If anyone had to guess, the colour of this binder was beige[4].

A fastq file looks eerily similar.

The Legendary Map

A reference genome is like the Legendary Map. There is one for every species. And it looks a lot like a consensus sequence, only it is just used for reference.

It looks like this..

Comparing the Maps

The animals laid out their maps side by side and compared them against the legendary map.  The humans do the same. Only they call it alignment.This produces a BAM file. Or a SAM file. The SAM file is human readable. BAM file is not. Humans are funny creatures. Only they can make up a language they can’t read themselves.

A sam file is not particularly readable either.

Difference of Opinion

When the animals fought with each other whether there was a banyan tree at location 9.9312° N, 76.2673° E- , we called it a difference of opinion or more aptly- a reason for bickering. When such things happen in the human realm between reads, they call it a variant. Then they write those down in something called a “VCF” file.

Rejected Votes

Sometimes the animals in the map-making committee get dodgy reports. A distracted kitten or two who say they found a new path but aren’t too sure. As responsible members, they have to reject such reports. In human terms, this would be called a sequencing error.

Its really hard to know a sequencing error from a rare variant. Maybe the kitten strayed too far away from the path and really did discover a new place but since no one else saw the place and the kitten seemed distracted by a butterfly, the animals thought she couldn’t be trusted.

Then there is the map from the compulsive liar fox. The animals unanimously decided that the fox couldn’t be trusted. You can have untrustworthy reads too. And the humans filter them by quality by literally scoring them by “quality score”[5].

Consensus Sequence

A consensus Sequence is used for all sorts of things- like telling a patient exact strain of SARS-CoV-2 they have. Or to find a protein binding site. This is really important stuff to humans.

In the next blog, I will show you how humans use computers to make them.

FOOTNOTES

1. Rather unimaginative name. Just like consensus map. If it was me, I would have named it Confusorium-machinificus. But since it wasn't me who came up with the machine, we have to make do with the name DNA sequencers.

2. Maybe a Zoom meeting was in order, to come up with a better name. Maybe they didn't have one because a "zoom" meeting never zooms by. It slows time and at the same time, ages the humans in it.

3. Imagine a helicopter rescuing a man shipwrecked in open sea. Now imagine a rope ladder dangling from the helicopter. Do you see it? Now think of this rope ladder as a DNA sequence. If DNA is the rope ladder, the rungs in the ladder is a base pair.

If this ladder was the length of the SARS-CoV-2 virus genome, the man would have climbed up into the stratosphere and now in range of being hit by the airplanes passing by, in a massively wobbly helicopter and no air to breathe.

4. Infact, the official colour of the Jungle Government is beige. It is a nice non-commital colour and goes very nicely with all the dust accumulated over the years.

5. Are people TRYING to be boring? This has to be deliberate.