Data

Data

Data

Data are recorded observations, measurements, or information collected to answer questions, test hypotheses, or make decisions. In statistics and data science, data are the raw material we analyze to discover patterns, relationships, and insights.

What counts as data?

Data can come in many forms, including:

  • Numbers (e.g., exam scores, temperatures, income)

  • Categories or labels (e.g., gender, species, product type)

  • Text (e.g., survey responses, reviews)

  • Dates and times (e.g., timestamps, durations)

  • Logical values (TRUE/FALSE)

Types of data

Type of Data

Description

Examples

Quantitative (numeric)

Data that represent measurable or countable values and are expressed using numbers. Arithmetic operations (such as averages or differences) are meaningful for this type of data. These data typically answer questions like how much or how many.

Exam scores, age, height, number of courses taken, daily temperature

Categorical (qualitative)

Data that represent groups, labels, or categories rather than numeric measurements. Even when coded with numbers, the values identify categories and should not be used in arithmetic calculations. These data typically answer questions like what type or which group.

Major field of study, eye color, species, political affiliation, pass/fail outcome, logical values (TRUE/FALSE)

Data in R

In R, data are commonly stored in structured objects, especially:

  • Vectors – one-dimensional data of a single type

  • Data frames – tabular data where rows are observations and columns are variables

  • Tibbles – a modern version of data frames (from the tidyverse)

Example:

students <- data.frame(

  name = c("Alex", "Jordan", "Sam"),

  score = c(88, 92, 79),

  passed = c(TRUE, TRUE, FALSE)

)


The result will be a data frame where:

  • Each row represents one observation (a student)

  • Each column represents a variable (name, score, passed)

Why data matter

Data allow us to:

  • Describe what is happening (descriptive statistics)

  • Make comparisons and detect patterns

  • Test claims and hypotheses about the Data Generating Process (DGP)

  • Make predictions and informed decisions


    • Related Articles

    • data frame

      Data frame is an R object. It stores data in rows and columns.
    • tidy data

      Tidy data is a way of organizing data into rectangular tables, with rows and columns, in which each column is a variable, each row is an observation, and each type of observational unit it kept in a different table.
    • Data Generating Process (DGP)

      Data Generating Process (DGP) The Data Generating Process (DGP) refers to the underlying mechanism—real or hypothetical—that produces the data we observe. A DGP specifies how variables are related, how randomness enters the system, and how observed ...
    • Appendix of Data Frames Used in Course Textbook

      All data frames listed below are automatically preloaded when you run library(coursekata). Link to full repository of R documentation: https://www.rdocumentation.org/ Data Frame Name and Link to R Documentation Ames BikeCommute FatMice18 Fingers ...
    • Not Connecting to Server

      If you hit the "Connect" button for the R coding exercise windows but it will not connect to the server, please note the following: This error should be rare and should not persist for more than a few minutes. Please do not continue to hit "Connect" ...