Importing data from the clipboard in R

Importing data from the clipboard

Copy the data from Google Sheets or Excel and import the data from the clipboard using the function

> x1 = read.table(file="data-share", sep = "\t", 
+ na.strings = "NA", header = T)
> importeddata = read.excel()

 

This function assumes the header row is copied by default. However, this import function does not seem to work all the time. Sometimes it works fine but then it will give errors like

> importeddata = read.excel()
Warning message:
In read.table("clipboard", sep = "\t", header = header) :
incomplete final line found by readTableHeader on 'clipboard'
> importeddata = read.excel()
Error in read.table("clipboard", sep = "\t", header = header) : 
 first five rows are empty: giving up
In addition: Warning message:
In read.table("clipboard", sep = "\t", header = header) :


So I gave up using it and use a workaround. I copy and paste the data in a text editor (Gedit) and save that data as plain text as a file called data-share. Then import that file using the read.table() function.

Importing data into R using read.table()

read.table() is a versatile function which can import any table

> importedata = read.table(file="data-share", sep = "\t", 
+ na.strings = "NA", header = T)
> importedata
  x1 y1 x2 y2
1   2 2  6 5
2   2 5  7 4
3   6 5  8 7
4   7 3  5 6
5   4 7  5 4
6   6 4 NA NA
7   5 3 NA NA
8   4 6 NA NA
9   2 5 NA NA
10  1 3 NA NA

 
We can extract individual columns using the array notation

> importedata[1]
  x1
1   2
2   2
3   6
4   7
5   4
6   6
7   5
8   4
9   2
10  1
> importedata[3]
  x2
1   6
2   7
3   8
4   5
5   5
6  NA
7  NA
8  NA
9  NA
10 NA

 
However, we cannot find the means of individual columns just by applying the mean() function on these lists.

> mean(importedata[1])
[1] NA
Warning message:
In mean.default(importedata[1]) : argument is not numeric or logical: returning NA

 
The reason is that any extracted column is also a list.  importedata and all of its extracted parts contain a header and are therefore considered “list” objects. The easiest solution is to apply the mean() function to each column using the sapply() function. This function acts just like the map() function of Python.

> sapply(importedata, mean, na.rm=T)
x1  y1 x2  y2 
3.9 4.3 6.2 5.2

 
Usage: sapply(data, function, arguments to function)

We needed to use the argument na.rm=TRUE for the function mean() otherwise it will return NA for the last two rows which contain many NA values. The last two columns have less data than the first two. They contain NA values for the rows which are missing data so the last two columns would evaluate to NA if the arg na.rm=T is not used.

> sapply(importedata,mean)
x1  y1 x2  y2 
3.9 4.3  NA NA

 
The alternative to using sapply() would be to extract each column, convert using unlist() function, assign the result to a vector and then find the mean of that vector.

> data1 = unlist(importedata[1], use.names = F)
> mean(data1)
[1] 3.9

 
The first row strips the header, flattens the columns and returns a vector of numbers. The second row finds the mean.

Leave a comment