R Functions
desc()
The desc() function can be used with the arrange() function to arrange a variable in a data frame in descending order. Example 1: For instance, when we use the arrange() function to sort the Fingers data frame by Thumb, it will sort the values for ...
do()*
The do()* function runs one or more lines of code the number of times specified inside the parentheses and returns the results as a data frame. Example 1: # take a random sample (n=10) of a variable, with replacement, # and calculate the standard ...
abs()
The abs() function will produce the absolute value for a number. Example: The code below will take the residuals from the empty model for Thumb in the Fingers data frame, and give back their absolute value. empty_model <- lm(Thumb ~ NULL, data = ...
lm()
The lm() function fits a linear model to data. Example 1: # fits an empty model lm(Thumb ~ NULL, data = Fingers) Example of output from running the code above: Example 2: # fits a model with one explanatory variable lm(Thumb ~ Gender, data = Fingers) ...
gf_jitter()
The gf_jitter() function will generate a jitter plot. A jitter plot is a point plot (similar to a scatter plot, such as gf_point()) but the points are moved slightly ("jittered") so that they do not overlap as much. This can help make it easier to ...
gf_lm()
The gf_lm() function overlays the best-fitting regression line on a scatter plot when chained onto gf_point(). Example: # adds a regression line gf_point(Thumb ~ Height, data = Fingers) %>% gf_lm( color = "orange", size = 2 )
gf_point()
The gf_point() function can be used to create a scatterplot, or to plot a specified point, such as a specific value along the x-axis. Example 1: gf_point( Thumb ~ Height , data = Fingers , size = 2 ) Example of output from running the code above: ...
gf_bar()
The gf_bar() function creates a bar graph. It can be used to visualize the distribution of a categorical variable by counting the number of observations for each group of the category. Bar graphs can also be used with the gf_facet_grid() function. ...
gf_boxplot()
The gf_boxplot() function will generate a boxplot (also known as a box and whiskers plot). A boxplot splits the data into quartiles, where each whisker and each half of the box contains 25% of all the observations. They are helpful for visualizing ...
gf_dist()
The gf_dist() function can graphically overlay a number of different mathematical probability distributions. If we want to overlay the normal distribution, we’ll have to specify that with the argument “norm”, and then enter in the mean and standard ...
gf_facet_grid()
The gf_facet_grid() function will create separate plots for each group of a categorical variable. It can be chained onto plots such as gf_histogram(), gf_jitter(), and gf_bar(). Example 1: # Density histogram of Thumb faceted by Gender gf_dhistogram( ...
gf_dhistogram()
The gf_dhistogram() function is very similar to the gf_histogram() function, however, the difference is that gf_dhistogram() will create a density histogram for a quantitative variable. This means it will show the percentage of cases for each value ...
gf_vline()
The gf_vline() function will add a vertical line onto a plot. You can plot the line by referencing a value in a data frame (Example 1), or by specifying the point along the axis where the line should run through (Example 2). Example 1: # Save the ...
gf_density()
The gf_density() function will overlay a density plot onto a density histogram (i.e., gf_dhistogram(), not gf_histogram()). The density plot is a smoothed out version of the distribution. They can be helpful when you want to get a better idea of the ...
gf_labs()
The gf_labs() function can be used to modify the labels of your plots. You can add a title for your plot, or modify the label for the x- or y-axis. Example: # Add a title and change the label for the x-axis gf_histogram(~Thumb, data = Fingers) %>% ...
gf_histogram()
The gf_histogram() function will create a frequency histogram for a quantitative variable. This means it will show the number of cases observed in the data for each value of the variable. (See gf_dhistogram() for information on density histograms). ...
ntile()
The ntile() function can be used to create equal sized groups (n-tiles) out of a quantitative variable. It can be modified to make any number of groups. For instance, if you take a quantitative variable such as Height from the Fingers data frame, you ...
recode()
The recode() function can be used to rename any of the values of a variable. Example: # recode the variable `Year` and save it is a new column in the data frame Fingers$Year_recode <- recode(Fingers$Year, "1" = "First", "2" = "Second", "3" = "Third", ...
factor()
The factor() function will convert a quantitative variable into a factor (a categorical variable). This is often needed when categorical variables are dummy coded as numeric values so R treats them as a quantitative variable. For example, when a ...
as.numeric()
The as.numeric() function will convert a factor or character value into a numeric value. For instance, if you need to change a factor such as `Gender` into numeric values, the as.numeric() function will assign each group a number and convert it from ...
filter()
The filter() function will find rows/cases where the conditions indicated are true. It is often used with operators such as the following: > (greater than) < (less than) >= (greater than or equal to) <= (less than or equal to) == (equal to) != (not ...
select()
The select() function will select specific columns (variables) in a data frame. This may be useful when a data frame has many variables and you only want to take a look at a few of them together, or save a subset of variables as a new data frame. ...
arrange()
The arrange() function will arrange a data frame by a specific variable, in ascending order. You can use the desc() argument with the arrange() function to arrange the data frame in descending order. NOTE: The arrange() function is similar to the ...
sort()
The sort() function will sort a vector (or a single column in a data frame) by a specific variable, in ascending order. You can use the decreasing = TRUE argument with the sort() function to sort the data frame in descending order (be careful not to ...
tail()
The tail() function is used to get the last rows of a vector or data frame. By default, it will print out the last 6 rows of the object, but you can also specify the number of rows you would like to print out. The tail() function is similar to the ...
head()
The head() function is used to get the first rows of a vector or data frame. By default, it will print out the first 6 rows of the object, but you can also specify the number of rows you would like to print out. The head() function is similar to ...
str()
The str() function will display the internal data structure of an R object (such as a data frame). It will return information about the rows (observations) and columns (variables) along with extra information like the names of the columns, class of ...
shuffle()
The shuffle() function will mix up, or "shuffle", the values in a column into a randomized order. It is one possible method for simulating a random data generating process (DGP). Example 1: One way to see how the shuffle() function works is by ...
resample()
The resample() function will take a random sample of rows from the data frame, with replacement (of the specified sample size). It can be useful if you want to bootstrap a sampling distribution of statistics (e.g., b1, PRE, F) (Bootstrapping ...
sample()
The sample() function will take a random sample of rows from the data frame, without replacement (of the specified sample size). For random sampling with replacement, see the resample() function. Example 1: You can take a closer look at how the ...
confint()
The confint() function computes confidence intervals (based on the t-distribution) for one or more parameters in a fitted model. The default confidence level is 95%. To adjust the confidence level, use the "level = " argument. Example 1: You can use ...
zscore()
The function zscore() will standardize a variable by converting all of its values to z-scores (i.e., the number of standard deviations the score is above or below the mean). Example 1: Using the zscore() function alone will produce a list of z-scores ...
xpnorm()
The xpnorm() function will generate the probability and z-score for a value (X) based on the mathematical function for a normal distribution. It can do this with just three pieces of information: the border you are interested in, and the mean and ...
f() or fVal()
The f() function (and, similarly, the fVal() function) will calculate the F value for a model. Example 1: Below are various methods for indexing the model in the argument of the f() function (they will all produce the same output). For any of these, ...
pre() and PRE()
The pre() function (and, similarly, the PRE() function) will calculate the PRE (Proportional Reduction in Error) value for a model. Example 1: Below are various methods for indexing the model in the argument of the pre() function (they will all ...
b1()
The b1() function will calculate the b1 value for a model. Example 1: Below are various methods for indexing the model in the argument of the b1() function (they will all produce the same output). # Method 1: Find the b1 value for the Gender model of ...
cor()
The cor() function will compute the covariance or correlation of x and y if these are vectors. If x and y are matrices then the covariances (or correlations) between the columns of x and the columns of y are computed. The default method will compute ...
cohensD()
The cohensD() function calculates the Cohen's d measure of effect size. Example: cohensD(Thumb ~ Gender , data = Fingers) Example output:
favstats()
The favstats() function will compute a set of common summary statistics ("favorite stats") for a given variable, including the five-number summary (minimum, Q1, median/Q2, Q3, maximum), the mean, the standard deviation, the sample size (n), and the ...
sd()
The sd() function computes the standard deviation of a variable. Example: # Calculate the standard deviation of Thumb sd(Fingers$Thumb) # Alt: use 'data =' argument instead of '$' (produces same output) sd(~Thumb, data = Fingers) Example output:
Next page