Mistake

Mistake

In statistics and data science, mistakes are errors that happen because a person or system does something incorrectly when collecting, recording, entering, or processing data. Mistakes are one possible source of variation in data.

Unlike ordinary measurement error, mistakes are usually accidental problems that could potentially be prevented or corrected.

Examples of Mistakes

  • Typing 500 instead of 50

  • Recording the wrong student’s test score

  • Forgetting to include a decimal point

  • Selecting the wrong answer on a survey form

  • Copying data into the wrong row or column

These mistakes can create unusual or misleading values in a dataset.

Mistakes vs. Measurement Error

Source of Variation

Description

Measurement Error

Small differences caused by imperfect measurement

Mistakes

Incorrect values caused by accidents or human error

For example:

  • A ruler measuring slightly differently each time → measurement error

  • Typing the wrong number into a spreadsheet → mistake

Measurement error is often expected in real-world data. Mistakes are usually unintended problems.

Why Mistakes Matter

Mistakes can:

  • Distort patterns in data

  • Create outliers

  • Reduce model accuracy

  • Lead to incorrect conclusions

A single mistake can sometimes have a large effect on an analysis.

Detecting Mistakes

Data scientists often look for values that seem unusual or impossible, such as:

  • Negative ages

  • Test scores above 100%

  • Duplicate observations

  • Extremely large or small values

Finding and correcting mistakes is part of data cleaning.

In a Modeling Context

Models try to explain meaningful variation in data. Mistakes add variation that usually does not reflect real patterns in the data-generating process.

Because of this, mistakes can make models:

  • less accurate

  • less reliable

  • harder to interpret

Reducing Mistakes

Researchers and analysts reduce mistakes by:

  • Double-checking data entry

  • Using automated data collection when possible

  • Writing clear instructions

  • Validating data values

  • Cleaning datasets before analysis

    • Related Articles

    • Using the R Sandbox

      The R Sandbox The R Sandbox is a separate R window you can use to experiment with R code and play around with ideas from the course. It comes pre-loaded with the base R packages and all of the packages and datasets we reference in the textbook. ...
    • Measurement Error

      Measurement error is the difference between a value we measure and the true value we are trying to measure. Measurement error happens because measurements are rarely perfectly exact. Examples of Measurement Error A scale gives slightly different ...