Variables

Variables

A variable is a measurable characteristic or attribute that can take on different values across cases (i.e., observational units such as people, companies, or time points).

Types of Variables

Variables can be classified by the type of values they take:

  • Quantitative (Numerical) – Numeric values with meaningful arithmetic operations (e.g., height, age, income).

  • Categorical (Qualitative) – Group labels or categories (e.g., major, treatment group, species).

In R:

  • Quantitative variables are often stored as numeric.

  • Categorical variables are often stored as factor.

Variables in Statistical Models

In a modeling framework, variables take on specific roles:

  • Outcome variable – the variable we are trying to explain or predict.

  • Explanatory (Predictor) variable(s) – the variable(s) used to explain or predict the response.


For example, in the R code below:

# Thumb = Height + Error

lm(Thumb ~ Height, data = Fingers)


  • Thumb is the outcome variable.

  • Height is the explanatory variable.

Variables in a Data Frame

In R, datasets are commonly stored as data frames that are organized in tidy data format.

  • Rows represent cases (e.g., individual people in a study).

  • Columns represent variables (measured attributes of those cases).


For example, suppose we are working with the Fingers dataset that contains measurements of people’s hands:
  • Height – the person’s height

  • Thumb – the length of the person's thumb

  • Year – the person’s year in school


Here, Height is a variable because it is an attribute that varies from person to person. In the data frame, Height appears as a column.

Key Ideas

A variable is:

  • A column in a data frame

  • A measurable characteristic of the cases (observational units)

  • A component of a statistical model

  • Something that can vary

    • Related Articles

    • quantitative variables

      Quantitative variables are values that represent some quantity.
    • bivariate distribution

      Bivariate distribution is the pattern of variation in the values of two variables.
    • random assignment

      Random assignment is a method of assigning subjects to one experimental condition or another; it helps us rule out confounding variables by ensuring that any variables that affect our outcome variable, whether positively or negatively, will balance ...
    • randomization

      Randomization is a computational technique that breaks whatever relationship exists between two variables; also known as the permutation approach.
    • permutation approach

      The permutation approach is a computational technique that breaks whatever relationship exists between two variables; also known as randomization.