A data frame is a way to organize data into rows and columns, similar to a spreadsheet or table. Data frames are one of the most common ways to store and work with data in statistics, data science, and R.
Each row represents one observation (like a person, event, or item)
Each column represents one variable (like age, height, or score)
For example:
Each row is one student
Each column is information about the students
This structure makes it easier to explore data and build models.
In R, data frames are used to store datasets:
# create a data frame called "students" with 3 variables
student <- c("Alex", "Jordan", "Casey")
hours_studied <- c(5, 8, 2)
exam_score <- c(78, 92, 61)
students <- data.frame(student, hours_studied, exam_score)
# alternate method for creating a data frame
students <- data.frame(
student = c("Alex", "Jordan", "Casey"),
hours_studied = c(5, 8, 2),
exam_score = c(78, 92, 61)
)
You can then use this data to build a model:
lm(exam_score ~ hours_studied, data = students)
This example looks at how hours studied is related to exam score.
Data frames help you:
Organize data clearly
Explore relationships between variables
Build statistical models
Analyze and visualize data
Most data analysis in R starts with a data frame.