Editor’s note: This essay responds to an invitation (issued here and here) to submit commentaries on the ethical implications of partnerships between social media companies and biomedical researchers. The invitation is ongoing.
Last June, the direct-to-consumer genetic testing company 23andMe announced that it had reached the milestone of 1,000,000 genotyped costumers. While the company celebrated this milestone as a “turning point” in genetic testing, we believe it is in fact cause for concern. Our concern is that the growing importance of 23andMe’s database as a resource for research – and recently also a recipient of public funding – will aggravate existing biases in disease research, leading to impoverished knowledge and exacerbated inequalities.
For over a decade, studies have drawn attention to the stark underrepresentation of people of non-European descent in genomic research. Some authors have also criticized the lack of a systematic effort to remedy this. Although several initiatives have attempted to increase population diversity in research-oriented genomic databases, the number of genotyped participants resulting from these programs is still relatively small. The Exome Aggregation Consortium (ExAC), which some geneticists consider to be one of the most useful resources ever for variant assessment thanks to the wide range of ethnic groups represented, currently holds genetic data from around 90,000 people. Moreover, attempts to aggregate datasets from different provenance, as ExAC has done, come with its lot of thorny challenges. One is the amount of labor that is needed to ensure the compatibility of different datasets with a common data model before they can be combined. ExAC has struggled with this, and sometimes the obstacles faced by projects to “scale up” and integrate existing datasets are so high that it seems more effective to build new databases from scratch.
Against this backdrop, the idea that the underrepresentation of certain ethnic groups in large datasets can be fixed by integrating dispersed small datasets together appears unrealistic, and thus the fact that 23andMe owns the world's largest health DNA database (aside from forensic DNA collections) makes the issue of representation of genotyped participants especially critical.
How diverse is the population in the 23andMe database? Little information is available on the demographics of users of DTC personal genomic testing. We are aware of four surveys, all U.S.-based, conducted from 2011 to 2015: see here, here, here, and here. Although the later studies better represent ethnic minorities and socio-economic background, the findings are overall consistent: white, educated, and affluent groups seem to be starkly overrepresented among the users of DTC personal genomic tests.
Expanding privately owned datasets is likely to exacerbate health and health-research disparities. This, in turn, may skew research towards those conditions that affect educated, wealthy, white people, either because they can be more easily studied or because the company and its investors are most interested in developing therapies for people who can afford to pay for them.
Beyond exacerbating health and health-research disparities, expanding privately owned datasets may also represent an impoverishment of data and knowledge that contrasts starkly with the promises of data abundance and unbiased research that accompany data-driven genomics. Every instance of data gathering, coding, exchange, and analysis requires researchers to make methodological decisions that depend on their interests, resources, and access to relevant groups. If the datasets required to pursue a specific research question are not there and no resources are available to gather them, then this question will not be answered. This is a moral issue as well as a scientific one.
Issues of representation are not limited to problems of external validity and social utility of research, but include disempowering effects in research agenda-setting for disadvantaged groups. Paradoxically, even some participatory features of genomics and ancestry companies could exacerbate the issue of representation. For example, 23andMe encourages customers to suggest new research questions and projects, and upload information on phenotypes and lifestyles. However, if such customers disproportionately represent certain segments of the population that already have “voices” in research agenda-setting, even promising developments of medical research towards more participatory models can go astray.
It might be objected that these considerations of, broadly speaking, social justice in research do not apply to the private sector. There is indeed a difference in the extent of the obligations the public and private sectors have with respect to issues of social justice. It is the former that is in charge of providing social goods. However, the public sector often fulfills its duties with regard to social justice by guiding and regulating the behavior of the private sector, and especially so when it has been decided that a particular good – in this case genomic datasets for research – is to be produced by private companies. This is the case as public money is directly channeled into the expansion and use of privately held databases. Though a $1.4 million grant by the National Institutes of Health to 23andMe is not be high in comparison to the total NIH funding for genomic research, it has a symbolic value. It legitimizes this type of approach to building genomic databases, even as the NIH openly complains about the cost of supporting biology databases and the NIH genome institute (NHGRI) announces its plans to scale back support of genomic (and other “omic”) databases.
The expansion of datasets collected by private companies poses other moral and organizational questions. In contrast to public biobanks where information on demographic features of samples is not proprietary and can be accessed by researchers and sometimes also by the public, at the moment private companies have no accountability in this regard. If these private enterprises are to receive public funding, they should submit themselves to public governance with regard to key strategic questions.
Today, through the sheer weight of numbers, private companies dominate the sector of genomic datasets and personalized medicine. Given their remit and their commercial strategy, we should think about ways to improve on recruiting strategies for DTC companies, nudge their R&D departments towards studying these issues, and make public funding conditional to the disclosure of demographic information and the enactment of mechanisms of oversight. We should strive to prevent data abundance from impoverishing genomic research and exacerbating current inequalities in the allocation of research resources.
Less than two decades ago, a publicly vocal, and at times bitter, race took place over sequencing the whole human genome. At stake, was the disputed question of whether the human genome sequence should be public knowledge or could be privatized towards commercial ends. Should the fact that today, the largest DNA database is held by a private company not make us pause and reflect?
The authors are with the Data and Information Technologies in Health and Medicine Lab,Department of Social Science, Health and Medicine, King’s College London.