This ongoing collaborative project with Catherine Lee (Rutgers Sociology) uses computational text analysis to study the use of diversity and population terms in ~2.6M biomedical abstracts from 1990-2020sts race nlp viz ethics
While existing research on race and affirmative action have detailed the growth of diversity projects in education and employment that obfuscate the language of race and racism, the field of science & technology studies have documented scientists persistent attempts to molecularize race in biomedical research. In this paper, we employ computational text analysis to examine quantitative trends in the use of diversity terms, OMB/Census terms, and other population labels in a sample of 2.6+ million biomedical abstracts spanning the last 30 years. These analyses demonstrate that while racial, ethnic, and OMB/Census terms have mostly stagnated during the past 15 years, the use of “diversity” continues to rise, suggesting that biomedical researchers increasingly use a different grammar to construct population differences. We also observe that some population labels have grown dramatically with specific kinds of population terms outpacing others. Most notably, we find that the use of national and continental terms rose exponentially while terms like race and OMB/Census labels have actually dropped over the past decade. Together, our work points to a potential shift in the way that biomedical researchers enact population distinctions, relying more on biogeographical labels rather than racial or ethnic classifiers. Overall, this work not only informs STS and bioethics scholars’ research on the politics of diversity in biomedical research, but also shares a novel set of methodological tools, including open source software and a new public dataset, to research this topic.