Writing data to a file
Problem
You want to write data to a file.
Solution
Writing to a delimited text file
The easiest way to do this is to use write.csv()
. By default, write.csv()
includes row names, but these are usually unnecessary and may cause confusion.
# A sample data frame
data <- read.table(header=TRUE, text='
subject sex size
1 M 7
2 F NA
3 F 9
4 M 11
')
# Write to a file, suppress row names
write.csv(data, "data.csv", row.names=FALSE)
# Same, except that instead of "NA", output blank cells
write.csv(data, "data.csv", row.names=FALSE, na="")
# Use tabs, suppress row names and column names
write.table(data, "data.csv", sep="\t", row.names=FALSE, col.names=FALSE)
Saving in R data format
write.csv()
and write.table()
are best for interoperability with other data analysis programs. They will not, however, preserve special attributes of the data structures, such as whether a column is a character type or factor, or the order of levels in factors. In order to do that, it should be written out in a special format for R.
Below are are three primary ways of doing this:
The first method is to output R source code which, when run, will re-create the object. This should work for most data objects, but it may not be able to faithfully re-create some more complicated data objects.
# Save in a text format that can be easily loaded in R
dump("data", "data.Rdmpd")
# Can save multiple objects:
dump(c("data", "data1"), "data.Rdmpd")
# To load the data again:
source("data.Rdmpd")
# When loaded, the original data names will automatically be used.
The next method is to write out individual data objects in RDS format. This format can be binary or ASCII. Binary is more compact, while ASCII will be more efficient with version control systems like Git.
# Save a single object in binary RDS format
saveRDS(data, "data.rds")
# Or, using ASCII format
saveRDS(data, "data.rds", ascii=TRUE)
# To load the data again:
data <- readRDS("data.rds")
It’s also possible to save multiple objects into an single file, using the RData format.
# Saving multiple objects in binary RData format
save(data, file="data.RData")
# Or, using ASCII format
save(data, file="data.RData", ascii=TRUE)
# Can save multiple objects
save(data, data1, file="data.RData")
# To load the data again:
load("data.RData")
An important difference between saveRDS()
and save()
is that, with the former, when you readRDS()
the data, you specify the name of the object, and with the latter, when you load()
the data, the original object names are automatically used. Automatically using the original object names can sometimes simplify a workflow, but it can also be a drawback if the data object is meant to be distributed to others for use in a different environment.