Sampling

Sampling is the process of selecting which individuals, objects, or observations to study from a larger data generating process (DGP). Sampling determines what your data represent and how far your conclusions can generalize.

Why Sampling Matters

Sampling affects:

What part of the data generating process your data reflect
How accurate your estimates are
Whether patterns reflect reality or bias
How confident you can be in your model

Even the best statistical model cannot fix poor sampling.

Data Generating Process vs. Sample

Understanding sampling begins with two key ideas:

Term	Definition	Example
Data Generating Process (DGP)	The broader process that produces the data you care about	All students who take Intro Statistics
Sample	The subset of observations you actually measure	300 students surveyed

Goal of Sampling:
Choose a sample that represents the data generating process.

Common Sampling Methods

Random Sampling

Each observation in the data generating process has an equal chance of being selected.

Example:

Randomly selecting student IDs

Why it's useful:

Reduces bias
Supports generalization to the DGP

Convenience Sampling

Selecting individuals who are easiest to access.

Examples:

Students in your class
People near you

Limitations:

Often biased
May not represent the DGP well

Volunteer Sampling

People choose themselves to participate.

Example:

Online surveys

Limitations:

Volunteers may differ from non-volunteers in the DGP

Sampling Bias

Sampling bias occurs when the sample is systematically different from the data generating process.

Examples:

Surveying only morning classes
Polling only social media users
Studying only one school

Sampling bias can lead to misleading models.

Sample Size

Larger samples generally:

Reduce variability
Improve precision
Strengthen model estimates

However, large biased samples are still biased.

Example:

10,000 responses from one class may still not represent the broader DGP.

Sampling and Generalization

We use samples to make inferences about the data generating process.

Better sampling → Better generalization to the DGP
Poor sampling → Limited conclusions

Example:

If you sample only psychology majors, your conclusions apply mainly to psychology majors within the broader data generating process.

Key Takeaway

Sampling determines what part of the data generating process your data represent.
Good sampling allows you to generalize your models. Poor sampling limits what conclusions you can make.

Related Articles
Random Sampling
Random sampling is a method of selecting observations in which each member of a population has a known chance of being included in the sample. Random sampling helps create samples that are representative of the population and reduces the risk of ...
sampling distribution
Sampling distribution is the distribution of an estimate across many possible samples.
sampling error
Sampling error is the variation that occurs from sample to sample due to the fact that no sample is a perfect representation of the population; can be biased or unbiased; also known as sampling variation.
sampling variation
Sampling variation is the variation that occurs from sample to sample due to the fact that no sample is a perfect representation of the population; can be biased or unbiased; also known as sampling error.
Independent Sampling
Independent sampling is a sampling process in which the selection or measurement of one observation does not affect the selection or measurement of another observation. In an independent sample, each observation provides its own information ...

Sampling

Sampling

Why Sampling Matters

Data Generating Process vs. Sample

Common Sampling Methods

Random Sampling

Convenience Sampling

Volunteer Sampling

Sampling Bias

Sample Size

Sampling and Generalization

Key Takeaway

Related Articles

Random Sampling

sampling distribution

sampling error

sampling variation

Independent Sampling