Over the past couple of decades we’ve grown used to the idea of the Human Genome – the genetic blueprint that all of us carry within our cells, containing all the instructions for life.
The information within our DNA constructs all the parts of the body as we grow from a single fertilised egg into a baby and then an adult. Our genome also tells us something (but not everything) about all kinds of traits and conditions, from skin or eye colour to height and weight, or our risk of developing certain diseases.
Increasingly, healthcare services and pharmaceutical companies are using genetic and genomic data to develop tests and treatments – an approach known as precision medicine.
All of us share a human genome that has evolved over hundreds of thousands of years, picking up distinctive regional variations as our species has spread across the globe. As we’ve written about before, some of these seemingly small differences can have a big impact on health and the potential effectiveness of precision medicine.
Yet the vast majority of the genomic data used to develop precision diagnostics and therapies comes from people with white, European ancestry, including the standard ‘Human Genome’ reference sequence.
Global Gene Corp was founded to address this imbalance, by gathering and analysing data from populations all over the world.
Much of the work looking at genetic differences between populations has focused on small changes to individual ‘letters’ of DNA, which are known as single nucleotide polymorphisms or SNPs. So we were very excited to see a new paper in Nature Communications from UCSF’s Pui-Yan Kwok and his collaborators in the US and China, revealing that there are typical patterns of much larger ‘cut and paste’ rearrangements within the genomes of particular groups of people.
These large structural variants are hard to spot using conventional DNA sequencing techniques, but they could have a big impact on the function and activity of nearby genes. For example, Kowk’s team found that Africans and East Asians tend to have the highest number of characteristic rearrangements within their genomes.
They also discovered population-specific large structural variations that affect the levels of a protein called pepsinogen in the blood, which is sometime used as a screening test for stomach cancer, suggesting that regional genomic data will be essential for developing more effective diagnostic tests.
Even more intriguingly, the researchers uncovered around 60 million ‘letters’ of DNA that aren’t currently included in the standard reference human genome.
This paper is another powerful proof that we need to move away from the concept of the ‘Human Genome’, based on a single reference genome derived from primarily white, European DNA data.
As the authors point out in their paper, “The large number of samples from many populations in this study allows us to define the major haplotypes in these regions, and it is clear that the reference genome assembly is but one haplotype of many.
“Without a comprehensive set of alternate haplotypes representing the variation within and across populations, aligning short-read sequences to the complex regions will lead to errors in analysis.”
We need to build the ‘Global Genome’ – a collection of datasets that incorporate genetic and other kinds of health information from diverse populations all over the world, in order to bring the benefits of future advances in precision healthcare to everyone, wherever they live and wherever they come from.
- Genome maps across 26 human populations reveal population-specific patterns of structural variation. Levy-Sakin et al. Nature Communications 10: 1025 (2019) https://www.nature.com/articles/s41467-019-08992-7