Why we must work to increase diversity in genomic datasets
May 23, 2019

The release of the first draft Human Genome sequence in 2001 was rightly hailed as a momentous occasion in human history. Doctors and scientists gazed into an exciting future promising new, better ways of diagnosing, preventing and treating disease based on each patient’s genetic makeup – even if the precise technology was yet to be developed.

Over the past couple of decades, doctors and pharmaceutical companies have been moving towards a model of precision medicine, developing tests and treatments based on the presence of specific genetic or molecular targets.

There is also growing awareness of the importance of understanding pharmacogenomics, understanding the ways in which individual genetic variations influence how the body uses and breaks down drugs, which can have a significant impact on the suitability or dosage of many therapies.

But precision medicine only works if the underlying genomic information used to develop the treatment or test is relevant to that patient.

If we want to create drugs and diagnostics that are applicable across the whole global population, it’s vital that we capture the full range of human genetic diversity in the datasets that are used to underpin their development.

Scientists are finally waking up to the fact that most genetic research has been largely focused on populations with European ancestries. In fact, as much as 78 per cent of current genomic datasets are derived from people with Caucasian backgrounds. Asian populations account for just 10 per cent, and Africans just 2 per cent.

Ancestry category of studies in GWAS catalogue
Distribution of ancestry in the overall catalogue of GWAS studies (left) and individuals within studies (right), adapted from Sirugo et al. 2019.

This means we’re missing huge chunks of genomic information that simply may not be present in European populations. In turn, this means that billions of people around the world stand to miss out on the promise of precision medicine.

The mantra of “Right Drug, Right Patient, Right Dose” only works if we have the Right Genome too.

Professor Sarah Tishkoff, a human evolutionary genomics expert at the University of Pennsylvania has been studying this issue and its impacts, publishing a commentary in the journal Cell earlier this year. She and her colleagues present a compelling case for everyone involved in genomics research to push harder for diversity and the inclusion of underrepresented populations.

As an example, she highlights the case of G6PD deficiency, which affects 400 million people worldwide. It is caused by a particular genetic variation and can lead to serious health problems under certain conditions (for example, triggering the breakdown of red blood cells after eating fava beans).

The variant is rare in some populations but fairly common among people with African, Asian, Middle Eastern or Mediterranean ancestry. However, the condition can be easily missed if the connection isn’t made between a patient’s symptoms and their genetic ancestry, leading to a worsening of their health while they wait for a correct diagnosis and treatment.

There is also huge variation in the incidence of different types of cancers around the world. Some of this is connected to environmental influences – such as diet, lifestyle and exposure to cancer-causing agents such as pollution or viruses – but there is also a significant genetic component too.

Carrying a mutation in the BRCA1 or BRCA2 genes raises the risk of developing breast, ovarian or prostate cancer. However, the scale of the increase depends on the underlying alteration. Different ethnic populations carry different variants, which may be missed if the catalogue of mutations is based mainly on European-derived data.

As yet another example, Tishkoff highlights Cystic Fibrosis (CF), which is caused by a single gene fault. The condition affects around one in every 2000-3000 European babies, with the same genetic fault (DF508) turning up in around 70 per cent of cases.

Only one in around 17,000 African American children are diagnosed with CF, yet less than a third of people with African ancestry have the DF508 mutation. A different, rarer mutation (3120+1G/A) accounts for up to two thirds of South African CF patients with African ancestry.

Overall, there are more than 2,000 rare CF mutations found in various populations around the world. Given the recent advances in precision CF therapies targeted at specific mutations, it is essential that these rarer variations are taken into account to prevent many thousands of children missing out on potentially beneficial treatment.

The solution is to make a bigger effort to include understudied populations in genomic research – something we’re working hard to address here at Global Gene Corp. But there are a number of issues to be aware of.

Firstly, there are understandable concerns within ethnic communities about data privacy and informed consent that must be taken seriously. There is a grim legacy of exploitation of certain groups by biomedical research organisations and companies, which must never be allowed to happen again.

Secondly, many genomic studies rely on gathering data from large groups of people to ensure that any results are statistically valid. This can mean smaller populations are at a disadvantage and may be excluded from studies as the strength of the findings can’t be guaranteed.

Europeans also have much lower levels of genetic diversity than African populations, meaning that people’s genomes are more similar to each other. If researchers are trying to find specific variations that are linked to a particular condition, the underlying changes will be more easy to spot in samples that are more similar to start with – this might be analogous to trying to find a spot of colour on simple striped blanket, compared with a multicoloured patchwork comforter.

As a research community, we cannot afford to be complacent about the issue of genomic diversity.

Everyone in the world has the right to benefit from the medical advances that are being brought by the genetic revolution. Increasing diversity in genomics research will also lead to new methods and targets with which to tackle diseases across the global population.

At Global Gene Corp, we’re working hard to ensure that genomic datasets are fully representative of the populations they serve, with the aim of bringing precision healthcare to everyone, everywhere.

Follow us on Twitter for the latest news and updates: @globalgenecorp