Loading data from a file
Problem
You want to load data from a file.
Solution
Delimited text files
The simplest way to import data is to save it as a text file with delimiters such as tabs or commas (CSV).
data <- read.csv("datafile.csv")
# Load a CSV file that doesn't have headers
data <- read.csv("datafile-noheader.csv", header=FALSE)
The function read.table()
is a more general function which allows you to set the delimiter, whether or not there are headers, whether strings are set off with quotes, and more. See ?read.table
for more information on the details.
data <- read.table("datafile-noheader.csv",
header=FALSE,
sep="," # use "\t" for tab-delimited files
)
Loading a file with a file chooser
On some platforms, using file.choose()
will open a file chooser dialog window. On others, it will simply prompt the user to type in a filename.
data <- read.csv(file.choose())
Treating strings as factors or characters
By default, strings in the data are converted to factors. If you load the data below with read.csv
, then all the text columns will be treated as factors, even though it might make more sense to treat some of them as strings. To do
this, use stringsAsFactors=FALSE
:
data <- read.csv("datafile.csv", stringsAsFactors=FALSE)
# You might have to convert some columns to factors
data$Sex <- factor(data$Sex)
Another alternative is to load them as factors and convert some columns to characters:
data <- read.csv("datafile.csv")
data$First <- as.character(data$First)
data$Last <- as.character(data$Last)
# Another method: convert columns named "First" and "Last"
stringcols <- c("First","Last")
data[stringcols] <- lapply(data[stringcols], as.character)
Loading a file from the Internet
Data can also be loaded from a URL. These (very long) URLs will load the files linked to below.
data <- read.csv("http://www.cookbook-r.com/Data_input_and_output/Loading_data_from_a_file/datafile.csv")
# Read in a CSV file without headers
data <- read.csv("http://www.cookbook-r.com/Data_input_and_output/Loading_data_from_a_file/datafile-noheader.csv", header=FALSE)
# Manually assign the header names
names(data) <- c("First","Last","Sex","Number")
The data files used above:
"First","Last","Sex","Number"
"Currer","Bell","F",2
"Dr.","Seuss","M",49
"","Student",NA,21
"Currer","Bell","F",2
"Dr.","Seuss","M",49
"","Student",NA,21
Fixed-width text files
Suppose your data has fixed-width columns, like this:
First Last Sex Number
Currer Bell F 2
Dr. Seuss M 49
"" Student NA 21
One way to read it in is to simply use read.table()
with strip.white=TRUE
, which will remove extra spaces.
read.table("clipboard", header=TRUE, strip.white=TRUE)
However, your data file may have columns containing spaces, or columns with no spaces separating them, like this, where the scores column represents six different measurements, each from 0 to 3.
subject sex scores
N 1 M 113311
NE 2 F 112231
S 3 F 111221
W 4 M 011002
In this case, you may need to use the read.fwf()
function. If you read the column names from the file, it requires that they be separated with a delimiter like a single tab, space, or comma. If they are separated with multiple spaces, as in this example, you will have to assign the column names directly.
# Assign the column names manually
read.fwf("myfile.txt",
c(7,5,-2,1,1,1,1,1,1), # Width of the columns. -2 means drop those columns
skip=1, # Skip the first line (contains header here)
col.names=c("subject","sex","s1","s2","s3","s4","s5","s6"),
strip.white=TRUE) # Strip out leading and trailing whitespace when reading each
#> subject sex s1 s2 s3 s4 s5 s6
#> 1 N 1 M 1 1 3 3 1 1
#> 2 NE 2 F 1 1 2 2 3 1
#> 3 S 3 F 1 1 1 2 2 1
#> 4 W 4 M 0 1 1 0 0 2
# subject sex s1 s2 s3 s4 s5 s6
# N 1 M 1 1 3 3 1 1
# NE 2 F 1 1 2 2 3 1
# S 3 F 1 1 1 2 2 1
# W 4 M 0 1 1 0 0 2
# If the first row looked like this:
# subject,sex,scores
# Then we could use header=TRUE:
read.fwf("myfile.txt", c(7,5,-2,1,1,1,1,1,1), header=TRUE, strip.white=TRUE)
#> Error in read.table(file = FILE, header = header, sep = sep, row.names = row.names, : more columns than column names
Excel files
The read.xls
function in the gdata
package can read in Excel files.
library(gdata)
data <- read.xls("data.xls")
See http://cran.r-project.org/doc/manuals/R-data.html#Reading-Excel-spreadsheets.
SPSS data files
The read.spss
function in the foreign
package can read in SPSS files.
library(foreign)
data <- read.spss("data.sav", to.data.frame=TRUE)