Knowledge Base | Glossary | Knowledge Base

Glossary
empirical rule
Approximately 68 percent of the scores in a normal distribution are within one standard deviation, plus or minus, of the mean; approximately 95 percent of the scores are within two standard deviations; and approximately 99.7 percent of scores are ...
asymptotic
Asymptotic is used to describe a probability distribution with a tail (or tails) that goes on forever and ever.
theoretical probability distribution
Theoretical probability distribution shows us the pattern of variation in the probabilities of every possible event and thus allows us to estimate the probability of particular kinds of events; also called probability distribution.
z score
A z score represents the number of standard deviations a score is above (if positive) or below (if negative) the mean.
standard deviation
Standard deviation the square root of the variance; we generally prefer thinking about error in terms of standard deviation because it yields a number that makes sense using the original scale of measurement.
variance
Variance is also called MS, Mean Square; approximated by the sum of squares (SS) divided by the degrees of freedom (i.e., n-1); the MSE from the empty model can be thought of as roughly the average squared deviation.
Mean Square (MS)
Mean Square (MS) is also called variance; approximated by the sum of squares (SS) divided by the degress of freedom (i.e., n-1); the MSE from the empty model can be thought of as roughly the average squared deviation.
Sum of Squares (SS)
Sum of Squares (SS) is the total area of the squared residuals; a way to quantify error, which gets around the problem of the sum of residuals adding up to 0; an important feature of SS is that the one number model that uniquely minimizes it is the ...
Sum of Absolute Deviations (SAD)
Sum of Absolute Deviations (SAD) is the total of the absolute values of each deviation from the mean; a way to quantify error, which gets around the problem of the sum of deviations adding up to 0; also called SAE, Sum of Absolute Error.
ANOVA
ANOVA means ANalaysis Of VAriance; partitions variation.
parameters
Parameters are numbers that summarizes something about a population; parameters are estimated from data; in GLM notation, the parameters (e.g., or ) summarize something about the population.
General Linear Model
General Linear Model is a mathematical notation that represents DATA = MODEL + ERROR; for example, the GLM form of the null model is written as , the GLM form of a more complex model with an additional parameter is written as.
residual
Residual is the difference between our model prediction and an actual observed score.
unbiased estimate
Unbiased estimate is an estimate that is just as likely to be too high as it is too low.
simple model
A simple model is any model that is relatively more simple; in this course we typically compare relatively more complex models with one explanatory variable to a simple model that does not have any explanatory variables.
null model
Null model uses the mean to model the distribution of a quantitative variable; called "null" because it does not have any explanatory variables; also called an empty model; sometimes referred to as a simple model because it is simpler than models ...
Cohen’s d
Cohen’s d is the measure of effect size; it indicates the size of a group difference in number of standard deviation.
balancing point
Balancing point is the point at which all things below it are equal to all things above it; for example, the mean is the number that balances the deviations above and below it, yielding the same amount of error above it as below it.
random assignment
Random assignment is a method of assigning subjects to one experimental condition or another; it helps us rule out confounding variables by ensuring that any variables that affect our outcome variable, whether positively or negatively, will balance ...
observational study
Observational study is a research design that involves taking a random sample from a population and then measuring some variables; also known as a correlational study.
experimental design
Experimental design is a research design that involves randomly assigning members of a sample to either an experimental group or a comparison group.
correlational study
Correlational study is a research design that involves taking a random sample from a population and then measuring some variables; also known as an observational study.
confounding variable
Confounding variable is a variable that, though not measured, causes variation in both the explanatory variable and the outcome variable; also know as a "lurking variable" or a "third variable".
probability distribution
Probability distribution shows us the pattern of variation in the probabilities of every possible event and thus allows us to estimate the probability of particular kinds of events.
unexplained variation
Unexplained variation is everything included in the "other stuff" part of a word equation.
explained variation
Explained variation is the portion of the total variation we were able to attribute to a variable.
explain variation
When we can make a better prediction of the outcome variable if we knew something about the explanatory variable; accounting for variation in the outcome variable by using variation in another variable.
word equation
Word equation is a statistical model expressed in words rather than numbers or mathematical notation (e.g., Thumb = Gender + other stuff); provides a way of representing relationships, real or hypothesized, in our data or in the DGP.
path diagram
Path diagram is, by convention, the explanatory variable "points to" the outcome variable.
outcome variable
Outcome variable is a variable whose variation we are trying to explain.
explanatory variable
Explanatory variable is a variable we use to explain variation in an outcome variable.
within-group variation
Within-group variation is variation among members of the same group.
leftover variation
Leftover variation is variation in an outcome variable that remains after having accounted for variation due to at least one explanatory variable.
mode
Mode is the most frequent score.
outliers
An outlier is a value more than 1.5 IQRs above Q3 or below Q1.
Boxplots
Note: If you are viewing this article within the Help Desk widget from within the course textbook, we recommend opening the article in a new tab with the button in the top right corner (as seen below) so that the images will be easier to see. A ...
range
Range is the difference between the min and the max.
interquartile range (IQR)
Interquartile range (IQR) is the range for just the middle .50 of values (i.e., Q3 - Q1).
Five-Number Summary
The five-number summary is a handy way to describe the spread of a distribution around its median. To calculate the five-number summary, we first sort the data points from smallest to largest along the scale on which the variable is measured. Next we ...
sampling without replacement
Sampling without replacement takes a sample from a population, remove the object sampled, then sample again from the remaining objects in the population; the R function sample() does this.
Next page

Popular Articles
tally()
The tally() function will count, or tally, the number of cases that are observed in each category of a variable. Example 1: Use tally() to count the number of observations in each category of a categorical variable. # Use tally() to count the number ...
desc()
The desc() function can be used with the arrange() function to arrange a variable in a data frame in descending order. Example 1: For instance, when we use the arrange() function to sort the Fingers data frame by Thumb, it will sort the values for ...
favstats()
The favstats() function will compute a set of common summary statistics ("favorite stats") for a given variable, including the five-number summary (minimum, Q1, median/Q2, Q3, maximum), the mean, the standard deviation, the sample size (n), and the ...
arrange()
The arrange() function will arrange a data frame by a specific variable, in ascending order. You can use the desc() argument with the arrange() function to arrange the data frame in descending order. NOTE: The arrange() function is similar to the ...
Statement on Sex and Gender
Many people use sex and gender interchangeably, but in truth, they’re distinct concepts. Sex is a classification based on biological characteristics, including DNA and anatomy. Gender refers to the socially constructed roles, behaviors, ...

Knowledge Base | Glossary | Knowledge Base

Glossary

empirical rule

asymptotic

theoretical probability distribution

z score

standard deviation

variance

Mean Square (MS)

Sum of Squares (SS)

Sum of Absolute Deviations (SAD)

ANOVA

parameters

General Linear Model

residual

unbiased estimate

simple model

null model

Cohen’s d

balancing point

random assignment

observational study

experimental design

correlational study

confounding variable

probability distribution

unexplained variation

explained variation

explain variation

word equation

path diagram

outcome variable

explanatory variable

within-group variation

leftover variation

mode

outliers

Boxplots

range

interquartile range (IQR)

Five-Number Summary

sampling without replacement

Next page

Popular Articles

tally()

desc()

favstats()

arrange()

Statement on Sex and Gender