You won’t find CpG islands on any ocean chart, but they do feature in a different sort of map – the map of the mammalian genome. And in genome maps, CpG islands act as navigation points for gene hunters and for those charting how genes are turned on and off at the right place and time. But how do you actually locate these elusive islands in an ocean of DNA? Until a few years ago, this was a challenging, time-consuming venture. But not these days, partly in thanks to Adrian Bird and colleagues, who in 2008 discovered a new way to locate CpG islands in the human genome, revealing crucial new insights about their biology along the way.
But before we start talking about this island hunting approach in more detail, you may well first want to know more about what a CpG island actually is. (And if you’re already familiar with these genomic landmarks, feel free to skip the next couple of paragraphs.)
In seminal work carried out by Adrian Bird and others, during the 1980’s and 1990’s, it was discovered that the mouse genome contains short stretches of DNA in which “CpG dinucleotides” – a C nucleotide followed immediately by a G nucleotide in the sequence – occur with an unusually high frequency relative to the rest of the genome. These so called ‘CpG islands’ were notable for two other traits: unlike most CpGs in the genome, the C nucleotide wasn’t methylated (this is where a cytosine (C) nucleotide is converted into 5-methylcytosine), and many occurred near the start sites – called the promoter regions – of genes.
Methylation is strongly associated with the silencing of genes. And the subsequent discovery of these unmethylated CpG islands near gene promoters – in both the mouse and human genome – hinted at their involvement in the turning off and on of those genes. But to investigate their functions, researchers first had to map and find CpG islands and this proved to a pretty challenging task. This is because – at the time – CpG island identification depended heavily on bioinformatic tools to identify genomic regions likely to contain CpG sites. The problem was that these tools were not really fit for purpose: they didn’t factor in the methylation status of DNA and would include transposons and exons in the analysis, which – while rich in CpGs – are heavily methylated. The end results would either include lots of false positives or miss out bona fide CpG islands.
So questions remained as to how CpG islands could be identified, and their methylation status assessed, on a genome-wide scale – an advance that would allow researchers to better chart how DNA methylation was involved in regulating the silencing or activation of genes at the right time and place across the genome.
In their research article published in PLOS Biology in 2008, entitled ‘A Novel CpG Island Set Identifies Tissue-Specific Methylation at Developmental Gene Loci ‘, Adrian Bird and colleagues reported their adaptation of a method to purify and identify CpG islands from the human genome. As Bird explained to me, ‘Until our paper [was published], CpG islands were seen as bioinformatically defined entities, ignoring one of their most consistent features, namely absence of DNA methylation. As a result, the most commonly used algorithm detected >100,000 in the human genome, most of which subsequently turned out to be spurious.’
The method Bird and colleagues reported in their PLOS Biology study made use of a cysteine-rich protein domain – called the CXXC3 domain – that can strongly bind to clustered, non-methylated CpG sites. They used this domain effectively to trap non-methylated CpGs from DNA obtained from human blood. The trapped DNA was then purified, sequenced and the sequences searched against the ENSEMBL database to find out what they were.
This analysis unexpectedly revealed that only about 50% of CpG islands are located near the promoters of genes – the rest are located within or between genes. The team also used their library of ~17,000 human CpG islands to create a microarray enriched for CpG-island containing DNA. They then used this array to compare patterns of CpG island methylation across four different human tissues – brain, muscle, spleen and sperm. To do this, they used a different protein domain – the MBD domain from the MECP2 protein- to see whether any of the CpG islands in these tissues had become methylated.
So what did these experiments reveal? Importantly, they found that methylated CpG islands were often present in genes that have essential roles in normal embryonic development (for example the the HOX and PAX gene families), and that their patterns of methylation varied between tissues. Such a pattern of CpG island methylation strongly supported the idea that the methylation of CpG islands helps to regulate gene expression during development by ensuring that genes are switched off (CpG island methylated) or on (CpG island unmethylated) at the right place and time.
Their findings also raised questions for future research. For example, when developmental genes were removed from the analysis, the relationship between CpG island methylation and gene expression became much less clear. These and other findings added new layers of complexity as to how CpG islands might function in the genome and in regulating gene expression.
So this work not only provided a new CpG island-mapping tool to the epigenetics community, it also revealed novel and intriguing insights into their biology.
PLOS Biology’s Editorial Board Member Eric Nestler sums up the impact this work had on the field: ‘This was one of the first studies to demonstrate the genome-wide pattern of CpG islands and their methylation throughout the genome of several diverse tissues. It provided a critical foundation for subsequent work which has continued to define the role of CpG methylation in controlling chromatin structure and gene expression.’
Robert Illingworth, Alastair Kerr, Dina DeSousa, Helle Jørgensen, Peter Ellis, Jim Stalker, David Jackson,, & Chris Clee, Robert Plumb, Jane Rogers,Sean Humphray, Tony Cox, Cordelia Langford, Adrian Bird (2008). A Novel CpG Island Set Identifies Tissue-Specific Methylation at Developmental Gene Loci PLOS Biology, 6 (1) DOI: 10.1371/journal.pbio.0060022