Getting a subset of a data structure
Problem
You want to do get a subset of the elements of a vector, matrix, or data frame.
Solution
To get a subset based on some conditional criterion, the subset()
function or indexing using square brackets can be used. In the examples here, both ways are shown.
# A sample vector
v <- c(1,4,4,3,2,2,3)
subset(v, v<3)
#> [1] 1 2 2
v[v<3]
#> [1] 1 2 2
# Another vector
t <- c("small", "small", "large", "medium")
# Remove "small" entries
subset(t, t!="small")
#> [1] "large" "medium"
t[t!="small"]
#> [1] "large" "medium"
One important difference between the two methods is that you can assign values to elements with square bracket indexing, but you cannot with subset()
.
v[v<3] <- 9
subset(v, v<3) <- 9
#> Error in subset(v, v < 3) <- 9: could not find function "subset<-"
With data frames:
# A sample data frame
data <- read.table(header=T, text='
subject sex size
1 M 7
2 F 6
3 F 9
4 M 11
')
subset(data, subject < 3)
#> subject sex size
#> 1 1 M 7
#> 2 2 F 6
data[data$subject < 3, ]
#> subject sex size
#> 1 1 M 7
#> 2 2 F 6
# Subset of particular rows and columns
subset(data, subject < 3, select = -subject)
#> sex size
#> 1 M 7
#> 2 F 6
subset(data, subject < 3, select = c(sex,size))
#> sex size
#> 1 M 7
#> 2 F 6
subset(data, subject < 3, select = sex:size)
#> sex size
#> 1 M 7
#> 2 F 6
data[data$subject < 3, c("sex","size")]
#> sex size
#> 1 M 7
#> 2 F 6
# Logical AND of two conditions
subset(data, subject < 3 & sex=="M")
#> subject sex size
#> 1 1 M 7
data[data$subject < 3 & data$sex=="M", ]
#> subject sex size
#> 1 1 M 7
# Logical OR of two conditions
subset(data, subject < 3 | sex=="M")
#> subject sex size
#> 1 1 M 7
#> 2 2 F 6
#> 4 4 M 11
data[data$subject < 3 | data$sex=="M", ]
#> subject sex size
#> 1 1 M 7
#> 2 2 F 6
#> 4 4 M 11
# Condition based on transformed data
subset(data, log2(size) > 3 )
#> subject sex size
#> 3 3 F 9
#> 4 4 M 11
data[log2(data$size) > 3, ]
#> subject sex size
#> 3 3 F 9
#> 4 4 M 11
# Subset if elements are in another vector
subset(data, subject %in% c(1,3))
#> subject sex size
#> 1 1 M 7
#> 3 3 F 9
data[data$subject %in% c(1,3), ]
#> subject sex size
#> 1 1 M 7
#> 3 3 F 9
Notes
Also see ../Indexing into a data structure.