Mistake

In statistics and data science, mistakes are errors that happen because a person or system does something incorrectly when collecting, recording, entering, or processing data. Mistakes are one possible source of variation in data.

Unlike ordinary measurement error, mistakes are usually accidental problems that could potentially be prevented or corrected.

Examples of Mistakes

Typing 500 instead of 50
Recording the wrong student’s test score
Forgetting to include a decimal point
Selecting the wrong answer on a survey form
Copying data into the wrong row or column

These mistakes can create unusual or misleading values in a dataset.

Mistakes vs. Measurement Error

Source of Variation	Description
Measurement Error	Small differences caused by imperfect measurement
Mistakes	Incorrect values caused by accidents or human error

For example:

A ruler measuring slightly differently each time → measurement error
Typing the wrong number into a spreadsheet → mistake

Measurement error is often expected in real-world data. Mistakes are usually unintended problems.

Why Mistakes Matter

Mistakes can:

Distort patterns in data
Create outliers
Reduce model accuracy
Lead to incorrect conclusions

A single mistake can sometimes have a large effect on an analysis.

Detecting Mistakes

Data scientists often look for values that seem unusual or impossible, such as:

Negative ages
Test scores above 100%
Duplicate observations
Extremely large or small values

Finding and correcting mistakes is part of data cleaning.

In a Modeling Context

Models try to explain meaningful variation in data. Mistakes add variation that usually does not reflect real patterns in the data-generating process.

Because of this, mistakes can make models:

less accurate
less reliable
harder to interpret

Reducing Mistakes

Researchers and analysts reduce mistakes by:

Double-checking data entry
Using automated data collection when possible
Writing clear instructions
Validating data values
Cleaning datasets before analysis

Related Articles
Sampling Error
Sampling error is the difference between a result calculated from a sample and the corresponding value in the population or data generating process (DGP). Sampling error occurs because different random samples contain different observations. Even ...
Sampling Variation
Sampling variation is the natural tendency for different random samples from the same population or data generating process (DGP) to produce different results. Because each random sample contains different observations, sample statistics and model ...
Using the R Sandbox
The R Sandbox The R Sandbox is a separate R window you can use to experiment with R code and play around with ideas from the course. It comes pre-loaded with the base R packages and all of the packages and datasets we reference in the textbook. ...
Measurement Error
Measurement error is the difference between a value we measure and the true value we are trying to measure. Measurement error happens because measurements are rarely perfectly exact. Examples of Measurement Error A scale gives slightly different ...

Mistake

Mistake

Examples of Mistakes

Mistakes vs. Measurement Error

Why Mistakes Matter

Detecting Mistakes

In a Modeling Context

Reducing Mistakes

Related Articles

Sampling Error

Sampling Variation

Using the R Sandbox

Measurement Error