In statistics and data science, mistakes are errors that happen because a person or system does something incorrectly when collecting, recording, entering, or processing data. Mistakes are one possible source of variation in data.
Unlike ordinary measurement error, mistakes are usually accidental problems that could potentially be prevented or corrected.
Typing 500 instead of 50
Recording the wrong student’s test score
Forgetting to include a decimal point
Selecting the wrong answer on a survey form
Copying data into the wrong row or column
These mistakes can create unusual or misleading values in a dataset.
For example:
A ruler measuring slightly differently each time → measurement error
Typing the wrong number into a spreadsheet → mistake
Measurement error is often expected in real-world data. Mistakes are usually unintended problems.
Mistakes can:
Distort patterns in data
Create outliers
Reduce model accuracy
Lead to incorrect conclusions
A single mistake can sometimes have a large effect on an analysis.
Data scientists often look for values that seem unusual or impossible, such as:
Negative ages
Test scores above 100%
Duplicate observations
Extremely large or small values
Finding and correcting mistakes is part of data cleaning.
Models try to explain meaningful variation in data. Mistakes add variation that usually does not reflect real patterns in the data-generating process.
Because of this, mistakes can make models:
less accurate
less reliable
harder to interpret
Researchers and analysts reduce mistakes by:
Double-checking data entry
Using automated data collection when possible
Writing clear instructions
Validating data values
Cleaning datasets before analysis