Working with NULL, NA, and NaN
[TOC]
Problem
You want to properly handle NULL
, NA
, or NaN
values.
Solution
Sometimes your data will include NULL
, NA
, or NaN
. These work somewhat differently from “normal” values, and may require explicit testing.
Here are some examples of comparisons with these values:
x <- NULL
x > 5
# logical(0)
y <- NA
y > 5
# NA
z <- NaN
z > 5
# NA
Here’s how to test whether a variable has one of these values:
is.null(x)
# TRUE
is.na(y)
# TRUE
is.nan(z)
# TRUE
Note that NULL
is different from the other two. NULL
means that there is no value, while NA
and NaN
mean that there is some value, although one that is perhaps not usable. Here’s an illustration of the difference:
# Is y null?
is.null(y)
# FALSE
# Is x NA?
is.na(x)
# logical(0)
# Warning message:
# In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
In the first case, it checks if y
is NULL
, and the answer is no. In the second case, it tries to check if x
is `NA, but there is no value to be checked.
Ignoring “bad” values in vector summary functions
If you run functions like mean()
or sum()
on a vector containing NA
or NaN
, they will return NA
and NaN
, which is generally unhelpful, though this will alert you to the presence of the bad value. Many of these functions take the flag na.rm
, which tells them to ignore these values.
vy <- c(1, 2, 3, NA, 5)
# 1 2 3 NA 5
mean(vy)
# NA
mean(vy, na.rm=TRUE)
# 2.75
vz <- c(1, 2, 3, NaN, 5)
# 1 2 3 NaN 5
sum(vz)
# NaN
sum(vz, na.rm=TRUE)
# 11
# NULL isn't a problem, because it doesn't exist
vx <- c(1, 2, 3, NULL, 5)
# 1 2 3 5
sum(vx)
# 11
Removing bad values from a vector
These values can be removed from a vector by filtering using is.na()
or is.nan()
.
vy
# 1 2 3 NA 5
vy[ !is.na(vy) ]
# 1 2 3 5
vz
# 1 2 3 NaN 5
vz[ !is.nan(vz) ]
# 1 2 3 5
Notes
There are also the infinite numerical values Inf
and -Inf
, and the associated functions is.finite()
and is.infinite()
.
Also see /Manipulating data/Comparing vectors or factors with NA