Data
Data are recorded observations, measurements, or information collected to answer questions, test hypotheses, or make decisions. In statistics and data science, data are the raw material we analyze to discover patterns, relationships, and insights.
Data can come in many forms, including:
Numbers (e.g., exam scores, temperatures, income)
Categories or labels (e.g., gender, species, product type)
Text (e.g., survey responses, reviews)
Dates and times (e.g., timestamps, durations)
Logical values (TRUE/FALSE)
In R, data are commonly stored in structured objects, especially:
Vectors – one-dimensional data of a single type
Data frames – tabular data where rows are observations and columns are variables
Tibbles – a modern version of data frames (from the tidyverse)
students <- data.frame(
name = c("Alex", "Jordan", "Sam"),
score = c(88, 92, 79),
passed = c(TRUE, TRUE, FALSE)
)
The result will be a data frame where:
Each row represents one observation (a student)
Each column represents a variable (name, score, passed)
Data allow us to:
Describe what is happening (descriptive statistics)
Make comparisons and detect patterns
Test claims and hypotheses about the Data Generating Process (DGP)
Make predictions and informed decisions