Reading/Writing Strings in netCDF Files

QUESTION: Yikes! I am having all kinds of problems reading and writing string arrays to netCDF files in IDL. Can you give me some help with this?

ANSWER: String varibles in netCDF files have a CHAR variable type. The problem is that there is nothing equivalent to this in IDL. So, in IDL, all CHAR types are converted to BYTE type. Thus, to work with strings in netCDF files, you have to write byte arrays when you write strings to a netCDF file and you have to convert byte arrays back to strings when you read strings from a netCDF file. IDL handles the conversion of strings to bytes for you, when you write a string variable, but you have to do the conversion of bytes back to strings yourself, when you read a string from a netCDF file.

This is all fairly straightforward, except for one small complication. To do this correctly, you have to define a second (in the case of a scalar) or third (in the case of an array) dimension for the strings you write into your netCDF file. This extra dimension is the length of your string or the maximum length of the strings in your string array.

This is probably best shown with an example. Let's create a 10-by-10 string array that we would like to write to a netCDF file. And let's make one element of that string array much longer than the others.

   inputStrings = Replicate('A test string', 10, 10)
   inputStrings[0,0] = 'This is a very much longer test string' 

We see that we have create a string array, and that the maximum length of the string array is 38 characters. (Only the first string is this long, all the other strings are 13 characters in length.)

    IDL> Help, inputStrings, Max(StrLen(inputStrings))
   INPUTSTRINGS    STRING    = Array[10, 10]
   <Expression>    LONG      =           38 

Writing a String Array to a netCDF File

Next, we would like to write this string array to a netCDF file. We need to create a new netCDF file, and define the dimensions of the string variable we wish to write to it. Notice, however, that we define three dimensions for the string array. The extra dimension we define is related to the maximum length of the strings that are going to be in the string array.

   ncdfID = NCDF_Create('teststring.nc', /CLOBBER)
   xdimID = NCDF_DimDef(ncdfID, 'xsize', 10)
   ydimID = NCDF_DimDef(ncdfID, 'ysize', 10)
   zdimID = NCDF_DimDef(ncdfID, 'string_length', Max(StrLen(inputStrings)))

The next step is to define the string variable in the file. Notice the string length dimension (zdimID in this example) is placed first in the dimensions vector. This makes it much easier to convert the byte array stored in the netCDF file back to strings when we read the file later on. The CHAR keyword will take care of converting these strings to byte arrays for us.

   varID = NCDF_VarDef(ncdfID, 'test_strings', [zdimID, xdimID, ydimID], /CHAR)

Finally, we end the file definition mode and switch over to actually writing the string data into the file. We take care to close the file, too.

   NCDF_Control, ncdfID, /ENDEF    NCDF_VarPut, ncdfID, varID, inputStrings
   NCDF_Close, ncdfID 

Reading a String Array from a netCDF File

For completeness, let's see if we can read the string array from the netCDF file we just created. The process if fairly easy. We open the file, get the variable ID, and read the variable from the file, like this.

    id = NCDF_Open('teststring.nc')
   strID = NCDF_VarID(id, 'test_strings')    NCDF_VarGet, id, strID, theStrings

We see, in fact, that the variable is a byte array.

   IDL> Help, theStrings    THESTRINGS      BYTE      = Array[33, 10, 10]

Of course, variables can legitimately be byte arrays, so it is difficult to know apriori whether this variable is a string or not. To know for sure, we will have to know whether this variable has a data type of "CHAR". We find out like this.

    varinfo = NCDF_VarInq(id, strID)    datatype = varinfo.datatype 

We see that the data type of this variable is "CHAR":

   IDL> Print, datatype    CHAR 

We can now close the netCDF file and convert the CHAR or byte array to strings again.

    NCDF_Close, id
   IF datatype EQ 'CHAR' THEN myStrings = String(theStrings) 

Note that in doing so, we have converted the string array back into a 10-by-10 array.

    IDL> Help, myStrings    MYSTRINGS       STRING    = Array[10, 10]

And our original strings are intact.

 IDL> Print, myStrings[0,0]  This is a very much longer test string
 IDL> Print, myStrings[0,1]  A test string 

Version of IDL used to prepare this article: IDL 7.0.3.

Google
 
Web Coyote's Guide to IDL Programming