The First Cellular Genome, Part 1
1995 marks the beginning of the true genomics era as the first genome of a cellular organism was published. The shotgun sequencing approach proved to be successful. This episode goes over the technical details of this historic accomplishment.
Welcome back. I am Brad Goodner, Professor of Biology at Hiram College. We have reached the point of the first genome sequence from a cellular organism, published in July of 1995 in the journal SCIENCE.
As we discussed earlier in Episode 4, Craig Venter and his colleagues at TIGR, The Institute for Genomics Research, came up with a probability-based approach, a shotgun approach, to sequencing a genome. Break it up into pieces and sequence enough pieces to cover the genome at least 5 times to hopefully obtain 99% of the genome sequence.
The 1995 SCIENCE article by Robert Fleischmann and 39 coauthors, including Craig Venter, focused on the genome of Haemophilus influenzae strain Rd. This strain is a nonpathogenic sister of strains that can cause inner ear infections, respiratory infections and even bacterial meningitis. Many of you have been vaccinated against several pathogenic strains of Haemophilus influenzae. The genome of H. influenzae strain Rd is 1.83 million base pairs present as a single circular chromosome. This genome was chosen for its small size and because its G+C content of 38% was very close to that of humans.
Fleischmann and coworkers grew up a culture of the bacterial strain and isolated DNA. They then randomly sheared the DNA into fragments using sonication and separated the fragments using gel electrophoresis. DNA of two size ranges were purified from the gel – 1500 to 2000 bp and 15,000 to 20,000 bp. The purified DNA fragments were then treated with DNA polymerases and exonucleases to generate blunt ends with phosphorylation 5’ ends. The blunt-ended fragments were ligated into a plasmid vector to make two libraries – small insert and large insert.
From the small insert library, the researchers sequenced both ends of over 7000 plasmid clones and one end of over 9000 more clones. The average size of the sequence reads were around 450 bases. Overall, this resulted with over 11.6 million bases of sequence, just over 6X the size of the genome. The shotgun approach is action!
Now the work was turned over to computer algorithms that looked for overlaps between the sequence reads that met a set sequence identity criterion. In this way, the initial sequence reads were assembled into 140 larger fragments called contigs. The researchers estimated that the remaining gaps between the contigs averaged about 100 bases in size. Some of the gaps were due to the randomness of the shotgun cloning methods while other gaps were due to the fact that certain genome fragments were somehow lethal to the E. coli host cells carrying the library plasmid clones.
To close the gaps required human ingenuity. For example, the researchers used the ends of each contig to see if any of them encoded parts of the same protein. If so, they designed PCR primers from each potential adjoining end and using those primers with H. influenzae genomic DNA as the PCR template. In addition, the researchers also used the contig ends as hybridization probes on Southern blots of DNA from the large insert library clones. If the ends of two different contigs hybridized to the same large insert library clone, then the same PCR strategy could be used as well as the two ends of the large insert were sequenced. Using these strategies and a few others, the researchers were able to close all of the gaps.
In this way, Fleischmann and coworkers figured out the first complete genome sequence of a cellular organism. The shotgun strategy was proven a success and became the model for virtually all subsequent genome projects. The cost of this project turned out to 48 cents per finished base pair or just under $900K. Since then, the cost of genome projects has dropped precipitiously to the point that today the same size genome could be sequenced for about $500.
The genome era truly came alive with this publication, but there are biological implications beyond the technological ones. We will deal with the biological implications next time.
As we discussed earlier in Episode 4, Craig Venter and his colleagues at TIGR, The Institute for Genomics Research, came up with a probability-based approach, a shotgun approach, to sequencing a genome. Break it up into pieces and sequence enough pieces to cover the genome at least 5 times to hopefully obtain 99% of the genome sequence.
The 1995 SCIENCE article by Robert Fleischmann and 39 coauthors, including Craig Venter, focused on the genome of Haemophilus influenzae strain Rd. This strain is a nonpathogenic sister of strains that can cause inner ear infections, respiratory infections and even bacterial meningitis. Many of you have been vaccinated against several pathogenic strains of Haemophilus influenzae. The genome of H. influenzae strain Rd is 1.83 million base pairs present as a single circular chromosome. This genome was chosen for its small size and because its G+C content of 38% was very close to that of humans.
Fleischmann and coworkers grew up a culture of the bacterial strain and isolated DNA. They then randomly sheared the DNA into fragments using sonication and separated the fragments using gel electrophoresis. DNA of two size ranges were purified from the gel – 1500 to 2000 bp and 15,000 to 20,000 bp. The purified DNA fragments were then treated with DNA polymerases and exonucleases to generate blunt ends with phosphorylation 5’ ends. The blunt-ended fragments were ligated into a plasmid vector to make two libraries – small insert and large insert.
From the small insert library, the researchers sequenced both ends of over 7000 plasmid clones and one end of over 9000 more clones. The average size of the sequence reads were around 450 bases. Overall, this resulted with over 11.6 million bases of sequence, just over 6X the size of the genome. The shotgun approach is action!
Now the work was turned over to computer algorithms that looked for overlaps between the sequence reads that met a set sequence identity criterion. In this way, the initial sequence reads were assembled into 140 larger fragments called contigs. The researchers estimated that the remaining gaps between the contigs averaged about 100 bases in size. Some of the gaps were due to the randomness of the shotgun cloning methods while other gaps were due to the fact that certain genome fragments were somehow lethal to the E. coli host cells carrying the library plasmid clones.
To close the gaps required human ingenuity. For example, the researchers used the ends of each contig to see if any of them encoded parts of the same protein. If so, they designed PCR primers from each potential adjoining end and using those primers with H. influenzae genomic DNA as the PCR template. In addition, the researchers also used the contig ends as hybridization probes on Southern blots of DNA from the large insert library clones. If the ends of two different contigs hybridized to the same large insert library clone, then the same PCR strategy could be used as well as the two ends of the large insert were sequenced. Using these strategies and a few others, the researchers were able to close all of the gaps.
In this way, Fleischmann and coworkers figured out the first complete genome sequence of a cellular organism. The shotgun strategy was proven a success and became the model for virtually all subsequent genome projects. The cost of this project turned out to 48 cents per finished base pair or just under $900K. Since then, the cost of genome projects has dropped precipitiously to the point that today the same size genome could be sequenced for about $500.
The genome era truly came alive with this publication, but there are biological implications beyond the technological ones. We will deal with the biological implications next time.