Histogram Bins

Histogram Bins

Histogram bins

What they are

To understand the bins of a histogram, we must first understand a histogram. 


A histogram is a distribution of a quantitative, and often continuous, variable. A histogram divides the observations of the variable into intervals and displays these data visually. 


A quantitative variable is one in which the variable is measured numerically, and the numbers are meaningful (e.g. height). A continuous variable is one in which there are infinite possible values. Examples of continuous variables include miles per gallon, the weights of baby elephants in pounds, and average life expectancies in years.


Because histograms usually display continuous quantitative variables, and continuous variables can take on almost any value (e.g. 204.2 pounds), we find it easier to visualize if we group similar values into bins. To create bins, R divides the entire range of the variable’s values into intervals, usually of equal size (e.g. the range of all possible weights is divided into 20 smaller intervals). One interval is called a bin. 


And what goes into each bin? The number of observations that have values within the interval, or bin. The entire area of the rectangle above the bin represents the frequency of that interval of values.


For example, to determine what goes into a bin that ranges from 200 - 204.9 pounds, count how many baby elephants in the data that weigh between 200 and 204.9 pounds. Each of these observations gets put into the same bin, even though they are not identical values. If they baby elephant weighs 205 pounds, it would go into the next bin.


These two features - the bin width and the number of values in each bin - together create the basic visual features of the histogram. 

Example  

For instance, we used this code to create a histogram of grams of fiber in a cereals data set. 


gf_histogram(~Fiber, data=Cereal, binwidth = .50, color="blue", fill="yellow")


You can see from the code binwidth = .50  that the bins in this histogram are each 0.50 grams wide on the x-axis. 

Review 

In summary, a histogram bin is an interval of the variable range. Observations are grouped according to which bin they fall into. 



    • Related Articles

    • gf_histogram()

      The gf_histogram() function will create a frequency histogram for a quantitative variable. This means it will show the number of cases observed in the data for each value of the variable. (See gf_dhistogram() for information on density histograms). ...
    • histogram

      Histogram is a visualization where the x-axis represents the values of the variable while the y-axis represents frequency; the height of a bar in a histogram represents how many cases have that range of values.
    • relative frequency histogram

      Relative frequency histogram is a histogram that represents proportion (instead of frequency) of cases on the y-axis.
    • Bar Graphs vs Histograms

      What is the difference between a bar graph and a histogram?  Bar graphs and histograms are both visualizations of data, but they visualize different types of data. Bar graphs are used for categorical variables, while histograms are used for ...
    • gf_point()

      The gf_point() function can be used to create a scatterplot, or to plot a specified point, such as a specific value along the x-axis. Example 1: gf_point( Thumb ~ Height , data = Fingers , size = 2 ) Example of output from running the code above: ...