The First Individual Genome: One Is the Loneliest Number
When Craig Venter published the complete sequence of his genome in PLOS Biology in 2007, in some ways, it was old news. Though Venter didn’t admit it until he left Celera as president in 2002, many scientists suspected that the human genome his company sequenced far ahead of schedule, thanks to his groundbreaking shotgun sequencing approach, was his own. (Venter and his colleague Ham Smith supplied the male portion of Celera’s genome.)
But the real news when Venter published his genome (highlighted here as part of PLOS Biology’s 10th anniversary) wasn’t whose genome had been sequenced, but the fact that it had been done.
Seven years earlier, when Venter and Francis Collins of the government-funded Human Genome Project announced the completion of the draft human genome amid great fanfare at the White House, President Bill Clinton called it “the most important, most wondrous map ever produced by humankind.” As wondrous as the map was, it actually conflated several related maps into an unnatural chimera. The Celera and HGP genomes are both composite haploid assemblies: they represent a single set of chromosomes derived from several individuals. The assemblies provide a map of all of our genes and where they occur on the chromosomes, a herculean task, given the 3 billion base pairs that make up the human genome. But as diploid organisms, we inherit two sets of chromosomes, 23 from mom and 23 from dad. They carry (mostly) the same genes, of course, but the genes aren’t exactly the same (between the two copies and between individuals). And it’s these variations that spell the difference between short and tall, lean and stout—and most importantly for biomedical researchers, sickness and health.
To figure out how variant forms of the same gene, or alleles, contribute to disease, you need to sequence diploid genomes—both sets of chromosomes—to see first how the gene pairs differ and then how the differences might contribute to disease. But the prospect of sequencing a known individual’s genome was, and still is, a highly sensitive affair. What if someone carries alleles associated with a high risk of breast cancer, Alzheimer’s or another potentially fatal disease? How should that information be communicated? Might it affect a person’s ability to get insurance at a fair price?
Venter thinks such concerns have impeded the field’s progress. “That’s the reason we turned to my genome in the first place,” he told me. “I decided it’s not fair to ask other people to do something I’m unwilling to do.” (It’s also why Venter helped pass The Genetic Information Nondiscrimination Act of 2008, which prohibits discrimination in employment or health coverage based on genetic data.)
He blames “hysteria and misinformation” for spreading fears about posting an individual’s genome data online. When some still thought the genome had 100,000 to 300,000 genes, rather than the roughly 22,000 we’ve come to discover, “you had this genetic determinism point of view that there was going to be a gene for everything,” Venter says. People thought once you knew which alleles someone had, you’d know their destiny. Though there are some alleles that accurately foretell a person’s risk of disease, he says, “other than Huntington’s disease I’m not sure there are any other examples.”
When James Watson (one of the main instigators of the Human Genome Project) had his genome sequenced, he withheld the data about his ApoE gene, which has an allele long associated with increased risk of late-onset Alzheimer’s disease. Venter withheld nothing about his genome. He learned that he’s a heterozygote for ApoE4—he inherited a single copy of the allele, tripling his risk of getting the disease. Having two copies of ApoE4 is associated with a even greater risk of disease than having one, but predicting what this means for a given individual is incredibly difficult. Releasing his genome has produced “nothing but benefit,” Venter says, by giving researchers an ability to match gene variants to phenotype. But at this point, “the reality is nobody knows how to interpret a human genome. Nobody can tell accurately what it means to be a heterozygote for ApoE4.”
Although Venter’s genome data places him at risk of developing Alzheimer’s, a recent brain scan looking for the amyloid plaques characteristic of the disease came out “100 percent negative,” he says. “What works statistically for a population with genomics does not work statistically for individuals. Either you have something or you don’t. You don’t have 30 percent of Alzheimer’s.”
Venter published his genome to set an example and allay fears about the potential risks of doing so and to show the research community the type of data you can produce. Venter’s team, led by Samuel Levy, did what’s called haplotype phasing—they mapped the gene variants at the same loci on matched chromosome pairs. To understand how our genes contribute to our biology, researchers have to separate out the linear arrays of genes we get from each of our parents, figure out how the genes differ and how the differences sort with phenotypes, like male pattern baldness or Alzheimer’s disease. The PLOS Biology paper “had more haplotype phasing than anything before,” Venter says. “Then we’ve added to that by sequencing from sperm cells to get a high percentage of it separated into parental chromosomes.”
But in the end, one genome doesn’t tell you a lot, he says. Real progress in personal genomics depends on the law of large numbers. “The goal is to take not one or two or ten people, but tens of thousands.”
There’s no denying that sequencing your genome can come with risks—like finding out the man you grew up calling Dad is not your biological father. And there’s no denying that the genomics community has a long way to go to build the necessary infrastructure so that people who share their genomes fully understand what their genetic information does–and does not–mean. But Venter thinks that amassing more and more genome sequences without phenotypic information is largely a waste of time. “It’s not just a nice thing or curious thing to link the genome back to an individual,” he says. “It’s absolutely essential to understand the genome.”
See the Tenth Anniversary PLOS Biology Collection or read the Biologue blog posts highlighting the rest of our selected articles.
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, & Venter JC (2007). The Diploid Genome Sequence of an Individual Human PLoS Biology, 5 (10) DOI: 10.1371/journal.pbio.0050254
[…] When Craig Venter published the complete sequence of his genome in PLOS Biology in 2007, in some ways, it was old news. […]