Converting between data frames and contingency tables
Problem
You want to do convert between a data frame of cases, a data frame of counts of each type of case, and a contingency table.
Solution
These three data structures represent the same information, but in different formats:
cases
: A data frame where each row represents one case.ctable
: A contingency table.counts
A data frame of counts, where each row represents the count of each combination.
# Each row represents one case
cases <- data.frame(
Sex=c("M", "M", "F", "F", "F"),
Color=c("brown", "blue", "brown", "brown", "brown")
)
cases
#> Sex Color
#> 1 M brown
#> 2 M blue
#> 3 F brown
#> 4 F brown
#> 5 F brown
# A contingency table
ctable <- table(cases)
ctable
#> Color
#> Sex blue brown
#> F 0 3
#> M 1 1
# A table with counts of each combination
counts <- data.frame(
Sex=c("F", "M", "F", "M"),
Color=c("blue", "blue", "brown", "brown"),
Freq=c(0, 1, 3, 1)
)
counts
#> Sex Color Freq
#> 1 F blue 0
#> 2 M blue 1
#> 3 F brown 3
#> 4 M brown 1
Cases to contingency table
To convert from cases to contingency table (this is already shown above):
# Cases to Table
ctable <- table(cases)
ctable
#> Color
#> Sex blue brown
#> F 0 3
#> M 1 1
# If you call table using two vectors, it will not add names (Sex and Color) to
# the dimensions.
table(cases$Sex, cases$Color)
#>
#> blue brown
#> F 0 3
#> M 1 1
# The dimension names can be specified manually with `dnn`, or by using a subset
# of the data frame that contains only the desired columns.
table(cases$Sex, cases$Color, dnn=c("Sex","Color"))
#> Color
#> Sex blue brown
#> F 0 3
#> M 1 1
table(cases[,c("Sex","Color")])
#> Color
#> Sex blue brown
#> F 0 3
#> M 1 1
Cases to counts
It can also be represented as a data frame of counts of each combination. Note that it’s converted here and stored in countdf
:
# Cases to Counts
countdf <- as.data.frame(table(cases))
countdf
#> Sex Color Freq
#> 1 F blue 0
#> 2 M blue 1
#> 3 F brown 3
#> 4 M brown 1
Contingency table to cases
countsToCases(as.data.frame(ctable))
#> Sex Color
#> 2 M blue
#> 3 F brown
#> 3.1 F brown
#> 3.2 F brown
#> 4 M brown
Note that the expand.dft function is defined below.
Contingency table to counts
as.data.frame(ctable)
#> Sex Color Freq
#> 1 F blue 0
#> 2 M blue 1
#> 3 F brown 3
#> 4 M brown 1
Counts to cases
countsToCases(countdf)
#> Sex Color
#> 2 M blue
#> 3 F brown
#> 3.1 F brown
#> 3.2 F brown
#> 4 M brown
Note that the countsToCases
function is defined below.
Counts to contingency table
xtabs(Freq ~ Sex+Color, data=countdf)
#> Color
#> Sex blue brown
#> F 0 3
#> M 1 1
countsToCases()
function
This function is used in the examples above:
# Convert from data frame of counts to data frame of cases.
# `countcol` is the name of the column containing the counts
countsToCases <- function(x, countcol = "Freq") {
# Get the row indices to pull from x
idx <- rep.int(seq_len(nrow(x)), x[[countcol]])
# Drop count column
x[[countcol]] <- NULL
# Get the rows from x
x[idx, ]
}