Dr Andrew Peterson, Scientific Chairman, GenomeAsia 100K, led the scientific efforts of the consortium that produced a comprehensive catalogue of genetic variation across Asia as part of GenomeAsia 100K. He is also the lead author of the pilot paper. He shared his insights in an exclusive conversation with Editor CH Unnikrishnan. Excerpts:
The first stage of the project had anticipated sequencing of 10,000 Asians. Even as this pilot dataset represents a wide genetic diversity, it has described only 1,739 individuals, and that too with 596 publicly available human genome sequences from previous studies. Does this mean that the task that you have in hand is hugely challenging?
The task is a very challenging one for many reasons. It is important to point out that publication trails actual results by at least a year or two and the consortium members have sequenced more than 10,000 genomes at this point. Without belaboring the point, the publication process involves extensive review by outside experts, revision of the presentation and alignment with the publication schedules of a journal that, in this case, is publishing the top scientific findings from across the world. So there is much competition for space. We chose to publish the findings from a relatively small number of genomes as soon as we had finished that phase of the project, because they represent a very different type of population compared to what we are sequencing as we move forward. The groups represented in the first publication are tribal groups and other well-defined and isolated peoples. These results were a good way to launch the project.
We are working to accomplish shared goals with a consortium that is not tied together by a common funding source. The consortium members all want to improve the situation with respect to deficiencies in understanding genetics in Asian populations, but we are working across national boundaries and the very significant logistics of managing a project of this size is daunting.
As the head of the project, how excited are you about the data that the pilot paper has described?
Very excited. The samples that we started with were focused on unique population groups, but did not come with information about the health status of the individuals. This meant that we were limited with what we could say about medical relevance, but even with that limitation, we were able to bring insights on cancer predisposition genes and drug response genes — findings that are of great human relevance.
In addition to the unique populations, we also had a few ordinary people from India. That allowed us to focus on the question of what makes population groups in Asia different from populations in Europe. One obvious answer is that Asians and Europeans have lived mostly distinct lives, apart from each other, for most of their history without intermarriage. That is something that we can intuitively recognize, but we could see the evidence of it in the genomes of the people we characterized. What was more surprising is that what we call founder effects are present throughout Asia. Founder effects are produced when people within a particular group can all have very few shared ancestors in the past one or two thousand years. This can happen if people are all isolated on an island, as in the case of Iceland, but can also happen because of cultural habits of marriage. This characteristic of Asian populations is important because it makes it easier to make new discoveries about the role of genes in disease.
Your (GA100K) mission statement says that you are committed to open information and to make the data available to the public. Do you think it is a sustainable model without participation from governments and public funds?
It is sustainable as long as people of goodwill continue to work together as we have to date. National interests are often attached to public funding and that can lead to restrictions on data access within national borders. Commercial interests can mean that proprietary control of data is necessary to ensure that there is a return on investment to sustain continuing data generation. We can work within the reality of these constraints and still make sure that the data goes towards the public good. That is really the central principle behind our efforts, and we strive for as much access and transparency as is possible while working toward that underlying goal. At the same time, we need to ensure that the interests of those who generate the data are protected and so we are flexible about the timing, mode and manner of data release to help the overall efforts move forward.
The genome database from Asia is very important and critical as it covers 40% of the world’s population and this continent has many unique genetic diversities, which can provide very valuable clinical insights. Since the pilot paper describes 598 individuals belonging to 55 ethnic groups from India, how useful are the insights from this dataset to enable cures for rare and inherited diseases and other complex diseases such as cancer, diabetes and heart ailments which are prevalent in this subcontinent?
It is absolutely true that the actual number of people from India whose genomes we have characterised is small compared to the total population of 1.3 billion. [Therefore] strategies for making sure that the data we generate have the highest impact possible are essential. One strategy was to focus on a broad diversity of different groups (55 from India as you point out) to provide broad relevance. So, 598 people from 55 groups have much more impact than 598 from one group. A second strategy was to make the data public and therefore multiply the impact because it allows many researchers to use the data. Finally, we considered how to make the data available to maximize its utility for the broadest number of researchers. Based on that, we are providing the data in three different forms to maximize its utility to catalyse additional studies. By doing all of these things, the impact of our data toward improving human health is not limited by the findings that we were able to glean from it ourselves, but we have designed our studies to launch new studies. Many, many studies by thousands of researchers will be needed to solve the problems of human health. Catalysing science provides the highest value.