A variable is a measurable characteristic or attribute that can take on different values across cases (i.e., observational units such as people, companies, or time points).
Variables can be classified by the type of values they take:
Quantitative (Numerical) – Numeric values with meaningful arithmetic operations (e.g., height, age, income).
Categorical (Qualitative) – Group labels or categories (e.g., major, treatment group, species).
In R:
Quantitative variables are often stored as numeric.
Categorical variables are often stored as factor.
In a modeling framework, variables take on specific roles:
Outcome variable – the variable we are trying to explain or predict.
Explanatory (Predictor) variable(s) – the variable(s) used to explain or predict the response.
# Thumb = Height + Error
lm(Thumb ~ Height, data = Fingers)
Thumb is the outcome variable.
Height is the explanatory variable.
In R, datasets are commonly stored as data frames that are organized in tidy data format.
Rows represent cases (e.g., individual people in a study).
Columns represent variables (measured attributes of those cases).
Height – the person’s height
Thumb – the length of the person's thumb
Year – the person’s year in school
A variable is:
A column in a data frame
A measurable characteristic of the cases (observational units)
A component of a statistical model
Something that can vary