Glossary
Histogram Bins
Histogram bins What they are To understand the bins of a histogram, we must first understand a histogram. A histogram is a distribution of a quantitative, and often continuous, variable. A histogram divides the observations of the variable into ...
Bar Graphs vs Histograms
What is the difference between a bar graph and a histogram? Bar graphs and histograms are both visualizations of data, but they visualize different types of data. Bar graphs are used for categorical variables, while histograms are used for ...
The Pipe Operator %>%
The Pipe Operator %>% What it is In R, %>% is called the "pipe" operator. It takes the output of one statement and uses it as the input for the following statement. Another way of saying this is that it "pipes" or "chains" together a string of ...
Faceted Plots
Faceted plots are meant for use when you have a categorical variable. You can use the categorical variable to partition the data into different groups and the groups are then plotted in different panels called facets. Thus, you can use a facet grid ...
Bonferroni Adjustment
A Bonferroni adjustment (also known as a Bonferroni correction) is used when making simultaneous comparisons, to make sure that your overall error rate is reduced. You simply divide your acceptable alpha (e.g., .05) by the number of simultaneous ...
Type I and Type II Error
Type I and II Error describe the possible errors we might make when drawing conclusions about the DGP based on our data. Type I error is when we should adopt the empty model but we adopt the complex model in error. Type II error is when we should ...
t-test
The t-test uses the t-distribution (a sampling distribution of t) as a method of model comparison when you are evaluating a complex model against the empty model. It is very similar to using the F-distribution for model comparison, and, actually, ...
Alpha Level
The alpha level (α) is the criterion for how low a probability needs to be to count as "unlikely"; a probability of Type I error that we are willing to accept. A common alpha level is α = .05 or less, meaning we are willing to accept a 5% or less ...
F-Distribution
The F-Distribution is a probability distribution that models the sampling distribution of F under the empty model (the null hypothesis that there is no effect of the explanatory variable); this theoretical distribution takes into account both model ...
randomization
Randomization is a computational technique that breaks whatever relationship exists between two variables; also known as the permutation approach.
permutation approach
The permutation approach is a computational technique that breaks whatever relationship exists between two variables; also known as randomization.
p-value
P-value is the likelihood that a value more extreme than our sample statistic would be generated from the empty model; a number between 0 and 1; a small p-value (e.g., ≤ 0.05) indicates our sample would be "unlikely" given the null hypothesis, so you ...
F test
F test is a method for using the F ratio and F distribution to compare statistical models.
critical t score
Critical t score is the distance between the boundaries of the "unlikely" area for a t distribution and the mean of that distribution expressed in number of standard deviations; for example, if we assume a sampling distribution is t-shaped, the ...
critical z score
Critical z score is the distance between the boundaries of the "unlikely" area for a normal distribution and the mean of that distribution expressed in number of standard deviations; for example, if we assume a sampling distribution is normal, the ...
margin of error
Margin of error is also called critical distance; the distance between the boundaries of the "unlikely" area for a sampling distribution and the mean of that distribution expressed in the same unit as the variable (e.g., mm, lbs, dollars).
t distribution
T distribution is a probability distribution that is very similar but slightly more variable than the normal distribution. The t distribution has a slightly different shape depending on the degrees of freedom used to estimate . And for very large ...
z distribution
Z distribution is also called normal distribution or bell-shaped distribution; unimodal, symmetrical, most scores clumped in the center, few scores far away from center; this is a frequently used probability distribution; the shape of the z ...
bootstrapped distribution
Bootstrapped distribution is a distribution made resampling (with replacement) from the data collected.
confidence intervals
Confidence intervals are the lower and upper bounds that represent the possible population means below and above which would be unlikely to have generated the sample mean that we observed.
Central Limit Theorem (CLT)
Central Limit Theorem (CLT) describes the shape, center, and spread of a distribution of sample means of equal size when each sample is randomly chosen from some population.
standard error
Standard error is the standard deviation of a sampling distribution.
sampling distribution of means (SDoM)
Sampling distribution of means (SDoM) is a distribution made up of the means of many samples.
sampling distribution
Sampling distribution is the distribution of an estimate across many possible samples.
unstandardized slope
Unstandardized slope measures steepness of the best-fitting line; because It will depend on the units of measurement of the variables included, it must be interpreted in context.
Pearson’s r
Pearson’s r is a special case of slope in which both the outcome and explanatory variables are transformed into z scores prior to analysis; also known as the correlation coefficient.
correlation coefficient
Correlation coefficient is a special case of slope in which both the outcome and explanatory variables are transformed into z scores prior to analysis; also known as Pearson's r.
univariate distribution
Univariate distribution is the pattern of variation in the values of a single variable.
regression line
A kind of model used when both the outcome and explanatory variables are quantitative; a two-parameter model, the parameters being the slope () and y-intercept ().
bivariate distribution
Bivariate distribution is the pattern of variation in the values of two variables.
complex model
A complex model is a model with at least one explanatory variable.
degrees of freedom (df)
Degrees of freedom (df) is the number of independent pieces of information that went into calculating the estimate. We find it’s helpful to think about degrees of freedom as a budget. The more data (represented by n) you have, the more degrees of ...
effect size
Effect size is a measure of the size of the effect of the explanatory variable on the outcome variable.
SS Total
SS Total is the amount of error revealed by the empty model (the mean); it is the total area of the squared residuals based on the distance of each score from the mean.
SS Model
SS model is the reduction in error (measured in sums of squares) due to the model; the area of all the squared deviations based on the distance between the complex model predictions and the null model predictions.
SS Error
SS error is the amount of error left unexplained by the model; the area of all the squared residuals based on the distance of each score from the model prediction.
proportion reduction in error (PRE)
Proportion reduction in error (PRE) is the proportion of error that has been reduced by a more complex model compared with a simpler model, which in our course is always the empty model. When comparing to the empty model, PRE is calculated as SS ...
intercept
Intercept is the value where a line intersects with the y-axis; the value of y when the x is 0; for example, in the equation for a line, y = mx + b, the y-intercept is represented by b.
grand mean
Grand mean is the mean for everyone in the sample.
unlikely
Statisticians, as a community, have decided to count .05 and lower probabilities as unlikely.
Next page
Popular Articles
tally()
The tally() function will count, or tally, the number of cases that are observed in each category of a variable. Example 1: Use tally() to count the number of observations in each category of a categorical variable. # Use tally() to count the number ...
desc()
The desc() function can be used with the arrange() function to arrange a variable in a data frame in descending order. Example 1: For instance, when we use the arrange() function to sort the Fingers data frame by Thumb, it will sort the values for ...
favstats()
The favstats() function will compute a set of common summary statistics ("favorite stats") for a given variable, including the five-number summary (minimum, Q1, median/Q2, Q3, maximum), the mean, the standard deviation, the sample size (n), and the ...
arrange()
The arrange() function will arrange a data frame by a specific variable, in ascending order. You can use the desc() argument with the arrange() function to arrange the data frame in descending order. NOTE: The arrange() function is similar to the ...
Statement on Sex and Gender
Many people use sex and gender interchangeably, but in truth, they’re distinct concepts. Sex is a classification based on biological characteristics, including DNA and anatomy. Gender refers to the socially constructed roles, behaviors, ...