|

Chapter
4
Methodology of Sequencing
the Human Genome
The techniques that were used to read the whole Human Genome
(3 billions letter of code) are very sophisticated and time
consuming. The main tools adopted by these techniques are :
1- Genetically modified bacteria (GM bacteria) that
harboured a piece of Human Genome.
2- Huge computers that can perform millions of calculations
3- Robots that organise and collate information generated by
the computers.
Though the concept of sequencing the Human DNA is the same
as pioneered by the double Nobel Laureate Fred Sanger in
1977, who developed to read the 5.375 letters of the genetic
code of a simple Virus. Now as Sanger remarked “there is a
lot of robotic now- we had to measure out things with
pipettes and test-tubes”.
The DNA molecule is too big to be read in one step, and the
scanning tunnelling microscope that can take picture of the
DNA is not perfect yet. Thus the current method that was
adopted to read the whole Genome is to break it down into
manageable readable limited number of letters “base-pairs”.
Once these small pieces –500 letter long- are sequenced and
read, then how to put them together again in the right
position, through overlapping sequences.
There are two approaches to sequence the whole Genome :
A- The approach that was first adopted by Human Genome
projects consortium, which is funded by public money. Their
strategy was based on 2-steps. They call it two-step shotgun
process. The rationale behind their strategy is that the
Human genetic code is so huge that an intermediate step is
needed to get a rough map of the Genome as illustated in Fig
(18 ) and explained below :
*
The 3 billion letter Genome are broken down by shotgun into
pieces of DNA, their length varies between 40000 and 200000
letters. Each of these fragments is tagged with a unique
identification tag which help to identify the order of the
fragment.
*
The tagged fragment is ligated to a bacterial artificial
chromosome (BAC)
*
The BAC is then cloned into a bacterium to make more copies
of that fragment. Whenever the bacterium divides, it
multiplies not only its own genetic message but also the
foreign piece of DNA, which has been inserted. As a result
of that, millions of copies of BAC can be made and can be
studied in further details needed for genetic map.
*
Landmarks of these pig pieces are identified, so overlapping
pieces can be identified and the Genome then put back
together resulting in what they called a map. As Dr.
Sulston the director of Sanger centre at Cambridge and
major contributor to Genome project, said, “you construct
the Genome as a jigsaw puzzel, at the level of 40000-200000
bases”. Another analogy drawn by Steve Jones (1993), “The
positions of the cuts (Like those of the words and, but and
banana) provide a set of landmarks along the DNA. Once we
know where they are we have made a first step to making a
physical map of the book itself based on the order of the
letters and words it contains. The process is close to that
carried out by the students who stormed the American Embassy
in Tehran after the fall of the Shah. With extraordinary
labour they pieced together secret documents which had been
put through a shredding machine. By seeing how the
individual fragments fitted together the students
reconstituted a long, complicated and compromising
message”.
*
This preliminary chromosomal map is used to locate the
smaller pieces that are sequenced “read” from each end to
end of the segment.
The second step of this strategy is as follow :
1- Each fragment that was ligated to BAC in previous steps
is further broken down randomly into smaller pieces.
*
Each of these smaller pieces is ligated a gain into a ring
of DNA called plasmid or gene taxi that is capable of
transferring that piece of DNA into a bacterium and
replicated into million copies (refer to outline of genetic
engineering technique).
*
Sequencing technique pioneered by Sanger in 1975 can then
sequence these pieces from both ends of the fragment. The
principles of sequencing technique are :
-
This technique depends on the ability of DNA molecule to
copy itself when a special enzyme is provided along with a
mixture fed with the A, T, C & G bases.
-
The reaction involves growing copies of gradually
lengthening radio-labelled pieces of a DNA strand (primer)
from one end to the other.
-
Four separate experiments (each using different base) are
started at the same time. Each begins the process at the
same place in the DNA. By chemical trickery, some of the
growing strands are stopped each time a base is added.
-
This produces a set of DNA pieces with different length,
each stopped at specific base.
-
The Electrophoresis of the mixtures on the same gel gives
four parallel lines of DNA fragments in increasing length.
-
Reading across and down the gel gives the order of the
bases. Refer to illustration in Fig (19) (Harms and Damen,
1998). Though the reading process in Human Genome project is
computerised and the labelled letters glow in a laser beam
with a lot of robotics involvement, which is beyond the
scope of this book.
B- The second approach adopted by Celera Genomics which, is
faster and used a huge super computer where millions of
calculations are performed reducing the time significantly
compared to public funded project. The man behind this
private company is Dr. Graig Venter who devised a way to
blow the whole Genome with “whole Genome shotgun”.
*
The Genome end up into many small pieces ranging from
2000-10000 base pairs (letters) length.
*
These small pieces are then sequenced using large
computerised sequencing machines, regardless of their
position on the chromosomes.
*
Using supercomputer and clever computer programmes to
compare the 3 billion letter of code sequenced and to find
the overlapping regions. Once these regions are founded then
the whole Genome are reassembled again.
Thought Celera admitted that they rely on the map produced
by the public funded project, which is accessible through
the Internet.
Dr. Sulston claimed that their data is to help everybody in
this field.
The combination of these two complementary Genome sequence
and assembly approaches has greatly reduced the time
necessary to finish the sequence of whole Human Genome by
5-years. The time proposed earlier to finish the whole
project was 15-years, but with the help of new techniques
pioneered by Celera (Capillary electrophoresis and Super
computers) reduced that time significantly. Details on the
methodology of Human Genome Project can be found in the web
sites mentioned in the references section.

|