Research

I am interested broadly in the areas of population health and social networks. My current research agenda focuses on:

  • Developing social-network based methods for sampling and enumerating hard to reach populations.

  • Investigating health and mortality disparities in the United States using large-scale linked administrative datasets, such as UC Berkeley’s CenSoc project.


Current Projects


1. Casey F. Breen, Dennis M. Feehan, and Ayesha S. Mahmud. “Using Multilevel Regression with Poststratification to Estimate Subnational Contact Patterns.” [ working paper]

Abstract The spread and transmission dynamics of directly transmitted airborne pathogens, such as SARS-CoV-2, are fundamentally determined by in-person contact patterns. Reliable quantitative estimates of contact patterns are critical to modeling and reducing the spread of directly transmitted infectious diseases. While national-level contact data are available in many countries, including the United States, local-level estimates of age-specific contact patterns are key since disease dynamics and public health policy vary by geography. However, collecting contact data for each state would require a very large sample and be prohibitively expensive. To overcome this challenge, we develop a flexible model to estimate age-specific contact patterns at the subnational level using national-level interpersonal contact data. Our model is based on dynamic multilevel regression with poststratification. We apply this approach to a national sample of interpersonal contact data collected by the Berkeley Interpersonal Contact Study (BICS). Results illustrate important state-level variation in levels and trends of contacts across the US.


2. Casey F. Breen. ‘‘Changes in Racial Self-Identification for the Greatest Generation: Evidence from Social Security Administrative Data.’’ [ working paper ]

Abstract Researchers generally recognize that racial identification may shift over the life course. However, there is less consensus about the prevalence of these shifts. Previous estimates suggest as many as 6% of Americans shift their racial identity. Using administrative data on Social Security applications from 1984 to 2007, we quantify the magnitude and direction of shifts in racial and ethnic self-identification among Black, White, Asian, American Indian, and Hispanic members of the “Greatest Generation,” those born between 1901 and 1927 (N = 410,388). Approximately 9,274 (2.3%) persons in this dataset changed their racial or Hispanic identity, with distinct patterns of change for racial-ethnic subgroups. Overall, the most common shift was from a non-White identity to a non-Hispanic White identity. We then link to the 1940 Census to investigate whether social status in youth and young adulthood predicts a shift in identity in later life, and we find a positive and significant association between socioeconomic status in early life and a shift from non-White to non-Hispanic White identity. These systematic patterns would be unlikely if these shifts were due entirely to measurement error. We conclude the prevalence of racial fluidity is itself contingent, varying across time and cohort with response to racial climate, events in greater society, and social position.


3. Casey F. Breen, Elissa M. Redmiles, and Cormac Herley. “A Large-Scale Measurement of Cybercrime against Individuals.” [draft available upon request]

Abstract We know surprisingly little about the prevalence and severity of cybercrime in the U.S. Yet, in order to prioritize the development and distribution of advice and technology to protect end users, we require empirical evidence regarding cybercrime. Measuring crime, including cybercrime, is a challenging problem that relies on a combination of direct crime reports to the government -- which have known issues of under-reporting -- and assessment via carefully-designed self-report surveys. We report on the first large-scale, nationally representative academic survey (n=11,953) of consumer cybercrime experiences in the U.S. Our analysis answers four research questions -- (1) What is the prevalence and (2) the monetary impact of these cybercrimes we measure in the U.S.?, (3) Do inequities exist in victimization?, and (4) Can we improve cybercrime measurement by leveraging social-reporting techniques used to measure physical crime? -- and offers insight toward improving future measurement of cybercrime and protecting users.


4. Casey F. Breen and Dennis Feehan. “New Approaches to Collecting Data From a Respondent-Driven Sample” [draft available upon request]

Abstract One of the most pressing problems in population research is sampling hard-to-reach populations. Respondent-driven sampling (RDS) is the dominant method to sample such understudied and underserved populations. The key insight behind RDS is that individuals in these hidden populations are connected through an underlying social network. Conventional RDS begins with a convenience sample of “seed” individuals, who are interviewed and then refer their peers to the study. In turn, these peers refer their peers until the recruitment chains die out or the desired sample size is obtained. RDS has been endorsed by the Center for Disease Control and Prevention (CDC), World Health Organization (WHO), and the Joint United Nations program on HIV/AIDS (UNAIDS) and is the dominant method for sampling people who inject drugs, sex workers, and other populations hard-to-reach due to their illegal or stigmatized behaviors. RDS sampling represents a powerful tool for data collection, but the efficacy of RDS is limited if the underlying network structure has low connectivity or high clustering. These properties can lead to inaccurate RDS estimates or cause recruitment chains to die out before the desired sample size is obtained. We introduce a new approach to collecting RDS data by modifying who is asked to refer whom. Specifically, the new approach allows members of the hidden population to recruit highly connected social referents to improve the underlying network structure. Here, we describe our new approach to RDS data collection and illustrate it with empirical results.x