Research

I am broadly interested in the areas of population health and social networks. My current research agenda focuses on:

Investigating health and mortality disparities in the United States using large-scale linked administrative datasets, such as UC Berkeley’s CenSoc project.
Developing social-network based methods for sampling and enumerating hard to reach populations.

Working Papers

Casey F. Breen. “The Black-White Mortality Crossover: New Evidence from Social Security Mortality Records.” Revise and Resubmit, Demography. Abstract

The Black-White mortality crossover is well-studied demographic paradox. Black Americans experience higher age-specific mortality rates than White Americans throughout most of the life course, but this puzzlingly reverses at advanced ages. The leading explanation for the Black-White mortality crossover centers around selective mortality over the life course. Black Americans who survived higher age-specific mortality risk throughout their life course are highly selected on robustness, and have lower mortality than White Americans in late life. However, skeptics argue the Black-White mortality crossover is simply a data artifact from age misreporting or related data quality issues. We use large-scale linked administrative data (N = 2.3 million) to document the Black-White mortality crossover for cohorts born in the early 20th century. We find evidence the crossover is not a data artifact and cannot be uncrossed using sociodemographic characteristics alone.
Casey F. Breen, Till Koebe, and Ridhi Kashyap. “Digital Diffusion: Mobile Internet and the Spread of Cultural Scripts about Family Life.” Abstract

Global cultural diffusion theories have long emphasized how institutional actors such as intergovernmental organizations, international non-governmental organizations, and state agencies spread cultural scripts. We theorize that mobile internet constitutes a qualitatively distinct pathway for the diffusion of these scripts, shifting exposure from institutionally mediated and episodic to more individualized, continuous, and algorithmically curated. This reconfiguration changes both which cultural models individuals encounter and the frequency with which individuals are exposed to them. Using the staggered rollout of mobile internet across Nigeria as an empirical case, we find that expanding coverage reduces fertility, corresponding to a 9% relative decline in the annual probability of birth, consistent with shifts in contraceptive use, ideal family size, and women’s autonomy in financial and healthcare decision-making. These findings suggest that digital infrastructure does not simply extend existing diffusion processes but can reshape them, with measurable consequences for family life.

Publications

Casey F. Breen and Nathan Seltzer. “Structured Inequality, Uncertain Lifespans: Demographic Perspectives on Predicting Individual-level Longevity.” 2026. Population and Development Review. padr, 70065. Abstract

There are striking disparities in longevity across sociodemographic groups in the United States. Yet, can sociodemographic characteristics meaningfully explain individual-level variation in longevity? Here, we leverage machine-learning algorithms and large-scale administrative data to predict individual-level mortality using an array of social,economic, and demographic predictors measured in early adulthood. We conduct two distinct analyses: a cohort analysis, which predicts the exact age of death for individuals in the same birth cohort, and a period analysis, which predicts whether individuals aged 54–95 will die within the next 10 years. We are not able to make accurate predictions in either our cohort analysis (R² = 0.014) or our period analysis (R² = 0.166).Together, these analyses demonstrate that later life longevity is unpredictable using sociodemographic characteristics alone, and underscore the crucial need to account for stochastic processes in demographic theory.
Casey F. Breen, Masoomali Fatehkia, Jiani Yan, Douglas Leasure, Ingmar Weber, and Ridhi Kashyap. “Mapping Subnational Gender Gaps in Internet and Mobile Adoption Using Social Media Data.” 2025. Proceedings of the National Academy of Science. 122(42):e2416624122. Abstract

The digital revolution has ushered in many societal and economic benefits. Yet access to digital technologies such as mobile phones and internet remains highly unequal, especially by gender in the context of low- and middle-income countries (LMICs). While national-level estimates are increasingly available for many countries, reliable, quantitative estimates of digital gender inequalities at the subnational level are lacking. These estimates, however, are essential for monitoring gaps within countries and implementing targeted interventions within the global sustainable development goals, which emphasize the need to close inequalities both between and within countries. We develop estimates of internet and mobile adoption by gender and digital gender gaps at the subnational level for 2,075 regions in 117 LMICs from 2015 through 2025, a context where digital penetration is low and national-level gender gaps disfavoring women are large. We construct these estimates by applying machinelearning algorithms to Facebook user counts, geospatial data, development indicators, and population composition data. We calibrate and assess the performance of these algorithms using ground-truth data from subnationally representative household survey data from 33 LMICs. Our results reveal striking disparities in access to mobile and internet technologies between and within LMICs. These disparities imply that as of 2025, women are 19% less likely to use the internet and 8% less likely to own a mobile phone in LMICs, corresponding to over 190 million fewer women owning a mobile phone and over 320 million fewer women using the internet.
Casey F. Breen, Saeed Rahman, Christina Kay, Joeri Smits, Steve Ahuka, and Dennis M. Feehan. “Estimating Death Rates in Complex Humanitarian Emergencies Using the Network Survival Method.”2025. American Journal of Epidemiology. 101-00, 1–11. Abstract

Reliable estimates of death rates in complex humanitarian emergencies are critical for assessing the severity of a crisis and for effectively allocating resources. However, in many humanitarian settings, logistical and security concerns make conventional methods for estimating death rates infeasible. In this study, we develop and test a new method for estimating death rates in humanitarian emergencies. Our method is based on the idea that reports about deaths in survey respondents’ social networks can be used to estimate death rates. To test our method, we collected original data in a setting where reliable estimates of death rates are in high demand, Tanganyika Province of the Democratic Republic of the Congo. Qualitative work suggested testing two different types of personal networks as the basis for death rate estimates: deaths among immediate neighbors and deaths among extended kin. We evaluate our new method for estimating mortality rates in humanitarian emergencies by benchmarking against a contemporaneous retrospective household mortality survey. Our empirical results illustrate the settings and assumptions under which we would expect our new method to produce reliable estimates of crude death dates.
Casey F. Breen. “The Longevity Benefits of Homeownership: Evidence from Early 20th-Century U.S. Male Birth Cohorts.” 2024. Demography. 61(6):1731-1757. Abstract

Owning a home has long been touted as a key component of the idealized "American Dream." Homeownership is associated with greater wealth and better health, but the causal impact of homeownership on health remains unclear. Using linked complete-count census and Social Security mortality records, we document Black-White disparities in homeownership rates and produce the first U.S.-based estimates of the association between homeownership in early adulthood and longevity. We then use a sibling-based identification strategy to estimate the causal effect of homeownership on longevity for cohorts born in the first two decades of the 20th century. Our results indicate homeownership has a significant positive impact on longevity, which we estimate at approximately 4 months.
Casey F. Breen and Dennis M. Feehan. “New Data Sources for Demographic Research.” 2024. Population and Development Review. 51(1):539–73. Abstract

We are in the early stages of a new, uncertain era of demographic research, driven by the scientific possibilities that are opened up by new sources of data, such as the digital traces that arise from ubiquitous social computing, massive longitudinal datasets produced by the digitization of historical records, and information about previously inaccessible populations reached through innovations in classic modes of data collection. Such new data sources seemingly offer exciting opportunities to quantify demographic phenomena at a scale and resolution once unimaginable. In this commentary, we describe five promising new sources of demographic data and their potential appeal: detailed patterns of human mobility and migration, population-scale measurements of social connectedness, minute-by-minute records of cultural change, detailed timelines of partnership formation, and more. Yet, realizing the full potential of these data sources will demand innovative methodological developments and continued investment in high-quality, traditional surveys and censuses. Together, such advances will lead demographers to develop new theories and revisit and sharpen old ones.
Casey F. Breen, Maria Osborne, and Joshua R. Goldstein. 2023. “CenSoc: Public Linked Administrative Mortality Records for Individual-level Research.” Scientific Data. 42(3):36. Abstract

In the United States, much has been learned about the determinants of longevity from survey data and aggregated tabulations. However, the lack of large-scale, individual-level administrative mortality records has proven to be a barrier to further progress. We introduce the CenSoc datasets, which link the complete-count 1940 U.S. Census to Social Security mortality records. These datasets—CenSoc-DMF (N = 4.7 million) and CenSoc-Numident (N = 7.0 million)—primarily cover deaths among individuals aged 65 and older. The size and richness of CenSoc allows investigators to make new discoveries into geographic, racial, and class-based disparities in old-age mortality in the United States. This article gives an overview of the technical steps taken to construct these datasets, validates them using external aggregate mortality data, and discusses best practices for working with these datasets. The CenSoc datasets are publicly available, enabling new avenues of research into the determinants of mortality disparities in the United States.
Joshua R. Goldstein, Casey F. Breen, Maria Osborne, and Serge Atherwood. 2023. “Mortality Modeling of Partially Observed Cohorts Using Administrative Death Records.” Population Research and Policy Review. 42(3):36. Abstract

New advances in data linkage provide mortality researchers with access to administrative datasets with millions of mortality records and rich demographic covariates. Although these new datasets allow for high-resolution mortality research, administrative mortality records often have technical limitations, such as limited mortality coverage windows and incomplete observation of survivors. We describe a method for fitting truncated distributions that can be used for estimating mortality differentials in administrative data. We apply this method to the CenSoc datasets, which link U.S.1940 Census records to Social Security administrative mortality records. Our approach may be useful in other contexts where administrative data on deaths are available. As a companion to the paper, we release the R package gompertztrunc, which implements the methods introduced in this paper.
Casey F. Breen. 2022. ''Late‑Life Changes in Ethnoracial Self‑identification: Evidence from Social Security Administrative Data.'' Population Research and Policy Review. 42(1):10. Abstract

Researchers generally recognize that ethnoracial identification may shift over the life course. However, the prevalence of these shifts across cohorts and among older adults remain open questions. Using administrative data from Social Security applications from 1984 to 2007, we quantify the magnitude and direction of later life shifts in ethnoracial self-identification among Black, White, Asian, American Indian, and Hispanic members of the “Greatest Generation,” those born between 1901 and 1927. Overall, 2.3% persons in these data changed their ethnoracial identity after the age of 57, with distinct patterns of change for ethnoracial subgroups. By linking to the 1940 Census, we find a positive and significant association between socioeconomic status in early lifeand a shift from non-White to non-Hispanic White identity in later life. We conclude that ethnoracial self-identification fluidity continues even among older adults, varying in response to social position, ethnoracial climate, and events in greater society.
Casey F. Breen, Dennis M. Feehan, and Ayesha S. Mahmud. 2022. ''Using Multilevel Regression with Poststratification to Estimate Subnational Contact Patterns.'' PLOS Computational Biology. e1010742. Abstract

The spread and transmission dynamics of directly transmitted airborne pathogens, such as SARS-CoV-2, are fundamentally determined by in-person contact patterns. Reliable quantitative estimates of contact patterns are critical to modeling and reducing the spread of directly transmitted infectious diseases. While national-level contact data are available in many countries, including the United States, local-level estimates of age-specific contact patterns are key since disease dynamics and public health policy vary by geography. However, collecting contact data for each state would require a very large sample and be prohibitively expensive. To overcome this challenge, we develop a flexible model to estimate age-specific contact patterns at the subnational level using national-level interpersonal contact data. Our model is based on dynamic multilevel regression with poststratification. We apply this approach to a national sample of interpersonal contact data collected by the Berkeley Interpersonal Contact Study (BICS). Results illustrate important state-level variation in levels and trends of contacts across the US.
Casey F. Breen and Joshua R. Goldstein. 2022. “Berkeley Unified Numident Mortality Database: Public Administrative Records for Individual-Level Mortality Research.” Demographic Research. Volume 47, Page 111-142. Abstract

While much progress has been made in understanding the demographic determinants of mortality in the United States using individual survey data and aggregate tabulations, the lack of population-level register data is a barrier to further advances in mortality research. With the release of Social Security application (SS-5), claim, and death records, the National Archives and Records Administration (NARA) has created a new administrative data resource for researchers studying mortality. We introduce the Berkeley Unified Numident Mortality Database (BUNMD), a cleaned and harmonized version of these records. This publicly available dataset provides researchers access to over 49 million individual-level mortality records with demographic covariates and fine geographic detail, allowing for high-resolution mortality research.
Casey F. Breen, Elissa M. Redmiles, and Cormac Herley. 2022. “A Large-Scale Measurement of Cybercrime against Individuals.” The 2022 ACM Conference on Human Factors in Computing Systems (CHI) Pp.1–41. Abstract

We know surprisingly little about the prevalence and severity of cybercrime in the U.S. Yet, in order to prioritize the development and distribution of advice and technology to protect end users, we require empirical evidence regarding cybercrime. Measuring crime, including cybercrime, is a challenging problem that relies on a combination of direct crime reports to the government – which have known issues of under-reporting – and assessment via carefully-designed self-report surveys. We report on the first large-scale, nationally representative academic survey (n=11,953) of consumer cybercrime experiences in the U.S. Our analysis answers four research questions: (1) What is the prevalence and (2) the monetary impact of these cybercrimes we measure in the U.S.?, (3) Do inequities exist in victimization?, and (4) Can we improve cybercrime measurement by leveraging social-reporting techniques used to measure physical crime? Our analysis also offers insight toward improving future measurement of cybercrime and protecting users.