Coyote's Guide to IDL Programming

Reading ASCII Data Files

QUESTION: I have an ASCII data file named exp2b9c.dat with a three line header and three columns of data. I want each column to be a separate IDL vector. The number of rows in the data file is variable, but the number is written in the second line of the header. How can I read this kind of data file in IDL?

The first few lines of the data file look like this:

   Experiment 01-14-97-2b9c
   No. of Data Rows: 247
   Temperature         Pressure          Relative Humidity
      20.43             0.1654                 0.243
      16.48             0.2398                 0.254
      17.21             0.3985                 0.265
      18.40             0.1852                 0.236
      21.39             0.2998                 0.293

ANSWER: The short answer to this question is to go download the NASA IDL Astronomy Library and learn to use the ReadCol procedure you find there. Or, you can read a longer answer here.

There are any number of ways to read this type of data file in IDL, but here is one way. First, you must attach the file to a logical unit number, since all file input and output occurs over logical unit numbers in IDL. This is done with the OPENR (OPEN for Read) command. You will use the GET_LUN keyword to get an available logical unit number out of a pool of numbers IDL manages. The logical unit number will be assigned to the variable lun in the command below.

   OPENR, lun, 'exp2b9c.dat', /GET_LUN

The next step is to read the header. Since in this case you know that the header is the first three lines of the file, you can read the header into a three-element string array. When reading a string variable, IDL reads until the end of the current line in the file. To read the header, you use the READF command, like this:

   header = STRARR(3)
   READF, lun, header

If you like, you can print the header out now, like this:

   PRINT, header

The next step is to read the number of rows in the data file from the second line of the header. Here is one way to do that using the READS (READ from String) command. The method shown here requires that you know that the integer specifying how many rows of data there are in the file starts on 18th column in the file (see the FORMAT keyword, below).

   junk = "" ; A string variable
   rows = 0  ; An integer variable
   READS, header(1), junk, rows, FORMAT='(A17, x, I0)'

The "I0" format in the READS command above allows you to read an integer value with a variable number of digits. This will be useful if you don't know if the number of rows will be 100 or 10000.

Now you are ready to read the data itself. With the number of rows known, it might occur to you (especially if you are a FORTRAN programmer) that you can initialize the three vectors you are after and read them directly, like this:

   pressure = FLTARR(rows)
   temperature = FLTARR(rows)
   humidity = FLTARR(rows)
   FOR j=0,rows-1 DO READF, lun, pressure(j), temperature(j), humidity(j)

Although this is exactly what you would do with a FORTRAN program, it will absolutely not work in IDL! The reason it won't work is that there is a rule in IDL that you cannot read into a subscripted variable. You could read values into temporary variables, and store them in your loop if you like. Your code might look like this:

   pressure = FLTARR(rows)
   temperature = FLTARR(rows)
   humidity = FLTARR(rows)
   p = 0.0
   t = 0.0
   h = 0.0
   FOR j=0,rows-1 DO BEGIN
      READF, lun, p, t, h
      pressure(j) = p
      temperature(j) = t
      humidity(j) = h
   ENDFOR

Although this code will certainly work in IDL, it will not be very fast. IDL was designed as an array processing language and it excells at handling data arrays all at once rather than in loops. It would be better to read the data all at once and use array processing techniques to pull the vectors out of the larger array. The preferred way to read this data is like this:

   data = FLTARR(3, rows)  
   READF, lun, data

Finally, you are ready to pull your three vectors out of the larger data array. You can use array subscripting for this task, like this:

   pressure = data(0,*)
   temperature = data(1,*)
   humidity = data(2,*)

If you used the HELP command to look at your three vectors, you will notice that they are column vectors (i.e., they are two-dimensional arrays dimensioned 1-by-rows). Normally, when you think of a vector in IDL you think of a row vector. These three vectors can be turned into row vectors by using the REFORM command to dimension them into a rows-by-1 array. If the last dimension of a multiply-dimensioned array is 1, IDL drops that dimension. Thus, these arrays become normal one-dimensional row vectors. The code looks like this:

   pressure = REFORM(pressure)
   temperature = REFORM(temperature)
   humidity = REFORM(humidity)

Sometimes you will not know how many rows of data you have in the file. Or the number of rows will be variable. What do you do then?

If I didn't know how many rows were in the data file in the example above, I might have used the EOF command to read the data until I got to the end of the file. For example, suppose I don't know how many data rows I have, but I know for sure I have less than 10,000. I might have set my code up like this:

   pressure = FLTARR(10000)   
   temperature = FLTARR(10000)   
   humidity = FLTARR(10000)
   p = 0.0   
   t = 0.0   
   h = 0.0
   count = 0   
   WHILE (NOT EOF(lun)) DO BEGIN
      READF, lun, p, t, h      
      pressure(count) = p      
      temperature(count) = t
      humidity(count) = h
      count = count + 1   
   ENDWHILE
   pressure = pressure(0:count-1)
   temperature = temperature(0:count-1)
   humidity = humidity(0:count-1)

Note that you can now use FILE_LINES to tell you how many rows you have in an ASCII file. And there are also ways to tell how many columns you have in an ASCII file, too. Another alternative is to use the READCOL program from the IDL Astronomy Library.

Reading Into a Structure

Another way to read an ASCII data set is to read it all at once into a structure variable. Consider a file, named test.dat, with data looking like this.

   20050206     0.386216      1286.08      1.34860     0.133876    -0.695917
   20050207     0.391236      1080.36      1.19682     0.285981    -0.127826
   20050402     0.685645      1137.82      1.20061     0.112449    -0.716363
   20050411     0.727419      1096.23      1.21388     0.134878    -0.556709
   20050421     0.768319      1110.19      1.19086     0.136384    -0.636537
   20050422     0.772067      1133.38      1.26480     0.214946    -0.528211

The first column in this data set is obviously a date, but it is written in such a way that it can be read as a long integer. I'll show you how to extract the date part later. It is not important what physical quantity each column in the data set represents, so let's just call each column a "parameter" and list them as "p1", "p2", etc. Our first task is to create a structure to represent each row of data.

   dataStruct = { date:0L, p1:0.0, p2:0.0, p3:0.0, p4:0.0, p5:0.0 }

In this case, the first column, date, is a long integer, and the other five columns are floats. You would set your structure up as appropriate for your data file. Next, we have to create an array of these structures, one for each row of data. We can use the Replicate command to do this.

   file = 'test.dat'
   nrows = File_Lines(file)
   data = Replicate(dataStruct, nrows)

The last step is simple. Just read the data.

   OpenR, lun, file, /GET_LUN
   ReadF, lun, data
   Free_Lun, lun

Now each column of the data set is in its own field of the structure. For example, if you want to print out all the dates in the file, you would do this.

   IDL> Print, data.date
       20050206    20050207    20050402    20050411    20050421    20050422

Or, any of the column parameters.

   IDL> Print, data.p1
      0.386216     0.391236     0.685645     0.727419     0.768319     0.772067
   IDL> Print, data.p4
      0.133876     0.285981     0.112449     0.134878     0.136384     0.214946

Pulling Apart the Date

There are probably numerous ways to tease out the date from the data column. But if I wanted the date in separate year, month, and day variables (as integers), I think I would do it like this.

   year =  Fix( StrMid( StrTrim(data.date, 2), 0, 4) )
   month = Fix( StrMid( StrTrim(data.date, 2), 4, 2) )
   day =   Fix( StrMid( StrTrim(data.date, 2), 6, 2) )

So that, ...

   IDL> Print, year
    2005    2005    2005    2005    2005    2005
   IDL> Print, month
       2       2       4       4       4       4
   IDL> Print, day
       6       7       2      11      21      22

Google
 
Web Coyote's Guide to IDL Programming