Research

I am broadly interested in the areas of population health and social networks. My current research agenda focuses on:

  • Developing social-network based methods for sampling and enumerating hard to reach populations.

  • Investigating health and mortality disparities in the United States using large-scale linked administrative datasets, such as UC Berkeley’s CenSoc project.

Working Papers


  1. Casey F. Breen and Nathan Seltzer. “The Unpredictability of Individual-level Longevity.” Abstract

    How accurately can age of death be predicted using basic sociodemographic characteristics? We test this question using a large-scale administrative dataset combining the complete count 1940 Census with Social Security death records. We fit eight machine learning algorithms using 35 sociodemographic predictors to generate individual-level predictions of age of death for birth cohorts born at the beginning of the 20th century. We find that none of these algorithms are able to explain more than 1.5% of the variation in age of death. Our results suggest mortality is inherently unpredictable and underscore the challenges of using algorithms to predict major life outcomes.
  2. Casey F. Breen. “The Longevity Benefits of Homeownership.” Abstract

    Owning a home has long been touted as a key component of the idealized “American Dream.” Homeownership is associated with greater wealth and better health, but the causal impact of homeownership on health remains unclear. Using linked complete-count census and Social Security mortality records, we document Black-White disparities in homeownership rates and produce the first U.S.-based estimates of the association between homeownership in early adulthood and longevity. We then use a sibling-based identification strategy to estimate the causal effect of homeownership on longevity for cohorts born in the first two decades of the 20th century. Our results indicate homeownership has a significant positive impact on longevity, which we estimate at approximately 0.4 years.

Publications


  1. Casey F. Breen, Maria Osborne, and Joshua R. Goldstein. 2023. “CenSoc: Public Linked Administrative Mortality Records for Individual-level Research.” Scientific Data. 42(3):36. Abstract

    In the United States, much has been learned about the determinants of longevity from survey data and aggregated tabulations. However, the lack of large-scale, individual-level administrative mortality records has proven to be a barrier to further progress. We introduce the CenSoc datasets, which link the complete-count 1940 U.S. Census to Social Security mortality records. These datasets—CenSoc-DMF (N = 4.7 million) and CenSoc-Numident (N = 7.0 million)—primarily cover deaths among individuals aged 65 and older. The size and richness of CenSoc allows investigators to make new discoveries into geographic, racial, and class-based disparities in old-age mortality in the United States. This article gives an overview of the technical steps taken to construct these datasets, validates them using external aggregate mortality data, and discusses best practices for working with these datasets. The CenSoc datasets are publicly available, enabling new avenues of research into the determinants of mortality disparities in the United States.
  2. Joshua R. Goldstein, Casey F. Breen, Maria Osborne, and Serge Atherwood. 2023. “Mortality Modeling of Partially Observed Cohorts Using Administrative Death Records.” Population Research and Policy Review. 42(3):36. Abstract

    New advances in data linkage provide mortality researchers with access to administrative datasets with millions of mortality records and rich demographic covariates. Although these new datasets allow for high-resolution mortality research, administrative mortality records often have technical limitations, such as limited mortality coverage windows and incomplete observation of survivors. We describe a method for fitting truncated distributions that can be used for estimating mortality differentials in administrative data. We apply this method to the CenSoc datasets, which link U.S.1940 Census records to Social Security administrative mortality records. Our approach may be useful in other contexts where administrative data on deaths are available. As a companion to the paper, we release the R package gompertztrunc, which implements the methods introduced in this paper.
  3. Casey F. Breen. 2022. ''Changes in Racial Self-Identification for the Greatest Generation: Evidence from Social Security Administrative Data.'' Population Research and Policy Review. 42(1):10. Abstract

    Researchers generally recognize that ethnoracial identification may shift over the life course. However, the prevalence of these shifts across cohorts and among older adults remain open questions. Using administrative data from Social Security applications from 1984 to 2007, we quantify the magnitude and direction of later life shifts in ethnoracial self-identification among Black, White, Asian, American Indian, and Hispanic members of the “Greatest Generation,” those born between 1901 and 1927. Overall, 2.3% persons in these data changed their ethnoracial identity after the age of 57, with distinct patterns of change for ethnoracial subgroups. By linking to the 1940 Census, we find a positive and significant association between socioeconomic status in early lifeand a shift from non-White to non-Hispanic White identity in later life. We conclude that ethnoracial self-identification fluidity continues even among older adults, varying in response to social position, ethnoracial climate, and events in greater society.
  4. Casey F. Breen, Dennis M. Feehan, and Ayesha S. Mahmud. 2022. ''Using Multilevel Regression with Poststratification to Estimate Subnational Contact Patterns.'' PLOS Computational Biology. e1010742. Abstract

    The spread and transmission dynamics of directly transmitted airborne pathogens, such as SARS-CoV-2, are fundamentally determined by in-person contact patterns. Reliable quantitative estimates of contact patterns are critical to modeling and reducing the spread of directly transmitted infectious diseases. While national-level contact data are available in many countries, including the United States, local-level estimates of age-specific contact patterns are key since disease dynamics and public health policy vary by geography. However, collecting contact data for each state would require a very large sample and be prohibitively expensive. To overcome this challenge, we develop a flexible model to estimate age-specific contact patterns at the subnational level using national-level interpersonal contact data. Our model is based on dynamic multilevel regression with poststratification. We apply this approach to a national sample of interpersonal contact data collected by the Berkeley Interpersonal Contact Study (BICS). Results illustrate important state-level variation in levels and trends of contacts across the US.
  5. Casey F. Breen and Joshua R. Goldstein. 2022. “Berkeley Unified Numident Mortality Database: Public Administrative Records for Individual-Level Mortality Research.” Demographic Research. Volume 47, Page 111-142. Abstract

    While much progress has been made in understanding the demographic determinants of mortality in the United States using individual survey data and aggregate tabulations, the lack of population-level register data is a barrier to further advances in mortality research. With the release of Social Security application (SS-5), claim, and death records, the National Archives and Records Administration (NARA) has created a new administrative data resource for researchers studying mortality. We introduce the Berkeley Unified Numident Mortality Database (BUNMD), a cleaned and harmonized version of these records. This publicly available dataset provides researchers access to over 49 million individual-level mortality records with demographic covariates and fine geographic detail, allowing for high-resolution mortality research.
  6. Casey F. Breen, Elissa M. Redmiles, and Cormac Herley. 2022. “A Large-Scale Measurement of Cybercrime against Individuals.” The 2022 ACM Conference on Human Factors in Computing Systems (CHI) Pp.1–41. Abstract

    We know surprisingly little about the prevalence and severity of cybercrime in the U.S. Yet, in order to prioritize the development and distribution of advice and technology to protect end users, we require empirical evidence regarding cybercrime. Measuring crime, including cybercrime, is a challenging problem that relies on a combination of direct crime reports to the government – which have known issues of under-reporting – and assessment via carefully-designed self-report surveys. We report on the first large-scale, nationally representative academic survey (n=11,953) of consumer cybercrime experiences in the U.S. Our analysis answers four research questions: (1) What is the prevalence and (2) the monetary impact of these cybercrimes we measure in the U.S.?, (3) Do inequities exist in victimization?, and (4) Can we improve cybercrime measurement by leveraging social-reporting techniques used to measure physical crime? Our analysis also offers insight toward improving future measurement of cybercrime and protecting users.