Department of Sociology | University of Texas at Austin
2026-01-22
R”Course website (if you want to learn more):
R is a powerful tool for social science researchR syntax, data types, and data structuresR, RStudio, and code formatsR programming fundamentals (syntax, operators, data types, data structures, sequencing)R, RStudio, and code formatsLearning objectives:
Installing R and RStudio
Why R?
Understanding R Scripts, R notebooks, Quarto documents
R and RStudioR is a statistical programming language
RStudio is an integrated development environment (IDE) for R programming
R?Free, open source — great for reproducibility and open science
Powerful language for data manipulation, statistical analysis, and publication-ready data visualizations
Excellent community, lots of free resources
RStudio panesRStudio?All-in-one development environment: streamlines coding, data visualization, and workflow
Extensible: supports R — but also Python, SQL, and Git
Rich community: eases learning and problem-solving
R Scripts vs. R NotebooksR Scripts
Simple: just code
Best for simple tasks (and multi-script pipelines)
R Notebooks (Quarto, R Notebook)
Integrated: Mix of code, text, and outputs for easy documentation
Interactive: real-time code execution and output display
“Notebook” Style: supports interactive code and text
Code cells: segments for code execution
Text chunks: annotations or explanations in Markdown format.

Run all code in a quarto document (or R script, or R notebook)
To run a single line of code in a code cell
Ctrl + Enter (Windows/Linux) or Cmd + Enter (Mac).To run a full code cell (or script)
Ctrl + Shift + Enter (Windows/Linux) or Cmd + Shift + Enter (Mac).Create a new quarto document
File -> New File -> Quarto Document -> CreateCreate a new code cell
Insert -> Executable cell -> RPractice running code below
R programming fundamentalsLearning objectives:
Comprehend R objects and functions
Master basic syntax, including comments, assignment, and operators
Understand data structures and types in R
Vectors: Ordered collection of same type
Data Frames: Table of columns and rows
Function: Reusable code block
List: Ordered collection of objects
[1] 7
Use <- or = for assignment
<- is preferred and advised for readabilityFormally, assignment means “assign the result of the operation on the right to object on the left”
| Operator | Symbol |
|---|---|
| AND | & |
| OR | | |
| NOT | ! |
| Equal | == |
| Not Equal | != |
| Greater/Less Than | > or < |
| Greater/Less Than or Equal | >= or <= |
| Element-wise In | %in% |
There are lots of data structures; we’ll focus on vectors and data frames.
Vectors: One-dimensional arrays that hold elements of a single data type (e.g., all numeric or all character).
Data frames: Two-dimensional tables where each column can have a different data type; essentially a list of vectors of equal length.
Vectors and data framesVector example[1] 1 2 3 4 5
Data frame exampleEach vector or data frame column can only contain one data type:
Numeric: Used for numerical values like integers or decimals.
Character: Holds text and alphanumeric characters.
Logical: Represents binary values - TRUE or FALSE.
Factor: Categorical data, either ordered or unordered, stored as levels.
NA (missing) values in RNA represents missing or undefined data.
NA_character_ and NA_integer_)NA values can affect summary statistics and data visualization.
What happens when you run the code below?
Rc():), creates sequences with increments of 1seq() Function: More flexible and allows you to specify the start, end, and by parameters.Function: Input arguments, performs operations on them, and returns a result
For each of the below functions, what are the:
Input arguments?
Operations performed?
Results?
Insert new code cell
macOS: Cmd + Option + I
Windows/Linux: Ctrl + Alt + I
Run full code cell or script
macOS: Cmd + Shift + Enter
Windows/Linux: Ctrl + Shift + enter
Assignment operator (creates <-)
macOS: option + -
Windows/Linux: option + -
Assignment (e.g., x <- 4)
Logical expressions (e.g., x > 10)
Creating a basic sequence
Your turn next…
x and y to take values 3 and 4.z as the product of x and y.three_squared.three_squared is greater than 10.x is not greater than 10. Use the negate symbol (!).x and y to take values 3 and 4.z as the product of x and y.three_squared.three_squared is greater than 10.three_squared is not greater than 10. Use the negate symbol (!).vectors and data framesLearning objectives
Select elements from vectors and columns from data frames
Subset data frames
Investigate characteristics of data frames
[1] 1
[1] 3
data framesData frames are the most common and versatile data structure in R
Structured as rows (observations) and columns (variables)
| id | name | age | gender | score |
|---|---|---|---|---|
| 1 | Alice | 25 | F | 90 |
| 2 | Bob | 30 | M | 85 |
| 3 | Carol | 22 | F | 88 |
| 4 | Dave | 28 | M | 92 |
| 5 | Emily | 24 | F | 89 |
data frameshead()- looks at top rows of the data frame
$ operator - access a column as a vector
data framesMethods:
$: Single column by name.
df[i, j]: Row i and column j.
df[i:j, k:l]: Rows i to j and columns k to l.
Conditional Subsetting: df[df$age > 25, ].
Which rows and will this return?
data frame characteristicsCheck number of rows
Check number of columns
Check column names
Learning objectives
Overview of tidyverse suite of packages
Fundamentals of data manipulation with dplyr
Data visualization with ggplot
dplyr.ggplot2.
dplyrfilter: Select rows based on conditions.
select: Choose specific columns
mutate: Add or modify columns
summarize or summarise: Aggregate or summarize data based on some criteria
group_by: Group data by variables. Often used with summarise().
%>% (or |> ) in RTakes the output of one function and passes it as the first argument to another function
What’s the below code doing?
Sometime you want to recode a variable to take different values (e.g., recoding exact income to binary high/low income variable)
The case_when() function in R is part of the dplyr package and is used for creating new variables based on multiple conditions:
Filter data
Selecting data
Calculating summary statistics by group
Creating and recoding variables
# A tibble: 6 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Make a histogram of the price of diamonds
For diamonds great than 1 carat (hint: filter()), what is average price by cut (hint: group_by + summarize?
Assign your answer from (2) to a data.frame called price_by_cut. Now use ggplot() + geom_col to visualize this.
R for data science (https://r4ds.hadley.nz/)
Data visualization: a practical introduction (https://socviz.co/)
Please turn in your Qmd file (whatever you have completed) on Canvas so you can get credit.
Comments
Use
#to start a single-line commentComments are an important way to document code