Statistical Model

Statistical Model

A statistical model is a simplified way of describing how data are generated. It helps us separate what we can explain using known information from what we cannot explain perfectly. We use statistical models for three main purposes:


(1) to understand patterns in data,
(2) to predict future or unknown outcomes, and
(3) to improve decisions in complex systems.

At a basic level, many statistical models follow the same idea, which can be written as a word equation:

Data = Model + Error

  • Data are the values we actually observe.

  • Model represents the systematic part we are trying to explain or predict using variables we measure.

  • Error captures everything the model does not explain, such as randomness, measurement noise, or missing information.


Example 1: Exam scores

Suppose we want to understand students’ exam scores.

Exam score = Effect of study time + Error

Here, the model explains exam scores using study time, while the error includes factors like test anxiety, question difficulty, or luck.


Example 2: House prices

If we want to predict house prices:

House price = Effect of size and location + Error

The model uses measurable features (such as square footage and neighborhood), and the error accounts for things like buyer preferences or unmeasured features of the house.

In statistics and data science, this idea is often written more formally using General Linear Model (GLM) notation:

Yi = β0 + β1Xi = εi

Here:

  • (Y) is the outcome we want to explain or predict (for example, exam scores or house prices).

  • (X) is a set of input variables (such as study time, house size, or location).

  • (β) represents the model’s parameters, which describe how strongly each input variable affects the outcome.

  • (ε) (epsilon) is the error term, representing unexplained variation.


In this course, we focus on statistical models that use this structure to generate a predicted value (or score) for each observation in a dataset. For example, a model might predict:
  • a student’s exam score,

  • the expected price of a house,

  • or the probability that an email is spam.

By comparing these predicted values to the actual data, we can evaluate how well a model performs, and we can use data to make informed, evidence-based decisions.


    • Related Articles

    • complex model

      A complex model is a model with at least one explanatory variable.
    • SS Model

      SS model is the reduction in error (measured in sums of squares) due to the model; the area of all the squared deviations based on the distance between the complex model predictions and the null model predictions.
    • null model

      Null model uses the mean to model the distribution of a quantitative variable; called "null" because it does not have any explanatory variables; also called an empty model; sometimes referred to as a simple model because it is simpler than models ...
    • simple model

      A simple model is any model that is relatively more simple; in this course we typically compare relatively more complex models with one explanatory variable to a simple model that does not have any explanatory variables.
    • empty model

      Empty model uses the mean to model the distribution of a quantitative variable; it is called "empty" because it does not have any explanatory variables. It is also called a null model and sometimes referred to as a simple model because it is simpler ...