I would like to read a text file in R, line by line, using a for loop and with the length of the file. The problem is that it only prints character(0). This is the code:

fileName="up_down.txt"con=file(fileName,open="r")line=readLines(con) long=length(line)for (i in 1:long){linn=readLines(con,1)print(linn)}close(con)
6

Best Answer


You should take care with readLines(...) and big files. Reading all lines at memory can be risky. Below is a example of how to read file and process just one line at time:

processFile = function(filepath) {con = file(filepath, "r")while ( TRUE ) {line = readLines(con, n = 1)if ( length(line) == 0 ) {break}print(line)}close(con)}

Understand the risk of reading a line at memory too. Big files without line breaks can fill your memory too.

Just use readLines on your file:

R> res <- readLines(system.file("DESCRIPTION", package="MASS"))R> length(res)[1] 27R> res[1] "Package: MASS" [2] "Priority: recommended" [3] "Version: 7.3-18" [4] "Date: 2012-05-28" [5] "Revision: $Rev: 3167 $" [6] "Depends: R (>= 2.14.0), grDevices, graphics, stats, utils" [7] "Suggests: lattice, nlme, nnet, survival" [8] "Authors@R: c(person(\"Brian\", \"Ripley\", role = c(\"aut\", \"cre\", \"cph\"),"[9] " email = \"[email protected]\"), person(\"Kurt\", \"Hornik\", role" [10] " = \"trl\", comment = \"partial port ca 1998\"), person(\"Albrecht\"," [11] " \"Gebhardt\", role = \"trl\", comment = \"partial port ca 1998\")," [12] " person(\"David\", \"Firth\", role = \"ctb\"))" [13] "Description: Functions and datasets to support Venables and Ripley," [14] " 'Modern Applied Statistics with S' (4th edition, 2002)." [15] "Title: Support Functions and Datasets for Venables and Ripley's MASS" [16] "License: GPL-2 | GPL-3" [17] "URL: http://www.stats.ox.ac.uk/pub/MASS4/" [18] "LazyData: yes" [19] "Packaged: 2012-05-28 08:47:38 UTC; ripley" [20] "Author: Brian Ripley [aut, cre, cph], Kurt Hornik [trl] (partial port" [21] " ca 1998), Albrecht Gebhardt [trl] (partial port ca 1998), David" [22] " Firth [ctb]" [23] "Maintainer: Brian Ripley <[email protected]>" [24] "Repository: CRAN" [25] "Date/Publication: 2012-05-28 08:53:03" [26] "Built: R 2.15.1; x86_64-pc-mingw32; 2012-06-22 14:16:09 UTC; windows" [27] "Archs: i386, x64" R> 

There is an entire manual devoted to this.

Here is the solution with a for loop. Importantly, it takes the one call to readLines out of the for loop so that it is not improperly called again and again. Here it is:

fileName <- "up_down.txt"conn <- file(fileName,open="r")linn <-readLines(conn)for (i in 1:length(linn)){print(linn[i])}close(conn)

I write a code to read file line by line to meet my demand which different line have different data type follow articles: read-line-by-line-of-a-file-in-r and determining-number-of-linesrecords. And it should be a better solution for big file, I think. My R version (3.3.2).

con = file("pathtotargetfile", "r")readsizeof<-2 # read size for one step to caculate number of lines in filenooflines<-0 # number of lineswhile((linesread<-length(readLines(con,readsizeof)))>0) # calculate number of lines. Also a better solution for big filenooflines<-nooflines+linesreadcon = file("pathtotargetfile", "r") # open file again to variable con, since the cursor have went to the end of the file after caculating number of linestypelist = list(0,'c',0,'c',0,0,'c',0) # a list to specific the lines data type, which means the first line has same type with 0 (e.g. numeric)and second line has same type with 'c' (e.g. character). This meet my demand.for(i in 1:nooflines) {tmp <- scan(file=con, nlines=1, what=typelist[[i]], quiet=TRUE)print(is.vector(tmp))print(tmp)}close(con)

I suggest you check out chunked and disk.frame. They both have functions for reading in CSVs chunk-by-chunk.

In particular, disk.frame::csv_to_disk.frame may be the function you are after?

fileName = "up_down.txt"### code to get the line count of the filelength_connection = pipe(paste("cat ", fileName, " | wc -l", sep = "")) # "cat fileName | wc -l" because that returns just the line count, and NOT the name of the file with itlong = as.numeric(trimws(readLines(con = length_connection, n = 1)))close(length_connection) # make sure to close the connection###for (i in 1:long){### code to extract a single line at row i from the filelinn_connection_cmd = paste("head -n", format(x = i, scientific = FALSE, big.mark = ""), fileName, "| tail -n 1", sep = " ") # extracts one line from fileName at the desired line number (i)linn_connection = pipe(linn_connection_cmd)linn = readLines(con = linn_connection, n = 1)close(linn_connection) # make sure to close the conection#### the line is now loaded into R and anything can be done with itprint(linn)}close(con)

By using R's pipe() command, and using shell commands to extract what we want, the full file is never loaded into R, and is read in line by line.

paste("head -n", format(x = i, scientific = FALSE, big.mark = ""), fileName, "| tail -n 1", sep = " ")

It is this command that does all the work; it extracts one line from the desired file.

Edit: R's default behavior is for i to return as normal number when less than 100,000, but begins returning i in scientific notation when it is greater than or equal to 100,000 (1e+05). Thus, format(x = i, scientific = FALSE, big.mark = "") is used in our pipe command to make sure that the pipe() command always receives a number in normal form, which is all that the command can understand. If the pipe() command is given any number like 1e+05, it will not be able to comprehend it and will result in the following error:

head: 1e+05: invalid number of lines