Purpose of Count, Offset, and Stride Keywords in Reading a netCDF Variable

QUESTION: I downloaded a time series of a gridded image data set from one of the data providers in netCDF format. The variable, temperature, is a floating point array of 5760 columns by 4800 rows by 825 time points. A pretty big variable. If I try to read it all at once in my IDL session I get the dreaded message Unable to Allocate Memory to Make Array. Is there a way I can read one 5760 by 4800 image at a time out of this file? Or at least be able to create a time series plot of a specific pixel though time?

ANSWER: Yes, of course. This is one of the nice features of the netCDF scientific data format. A three dimensional variable like this can be read in several different ways. Suppose the file is named "gridded_temperatures.nc" and the variable is named "temperature." You would open the file and prepare to read the variable with these commands that obtain both a file identifier and a variable identifier.

    file = 'gridded_temperatures.nc'
   theVariable = 'temperatures'    fileID = NCDF_Open(file)
   varID = NCDF_VarID(fileID, theVariable) 

You seem to know the dimensions of the data, but if you didn't, you could discover this information by querying the variable itself, like this.

   varInfo = NCDF_VarInq(fileID, varID)    dimIDs = varInfo.dim
   nDims = N_Elements(dimIDs)    dims = IntArr(nDims)
   FOR j=0,nDims-1 DO BEGIN       NCDF_DimInq, fileID, dimIDs[j], dname, dsize
      dims[j] = dsize    ENDFOR 

If you printed the variable dims, you would see this is the size of your data cube.

    IDL> Print, dims         5760   4800   825 

If you wanted to read the entire variable into IDL, you might try something like this.

    NCDF_VarGet, fileID, varID, data 

In this case, data is an output variable and is the name of the variable you are trying to read into. It is this variable that IDL cannot create because it cannot allocate enough contiguous memory. (The data requires over 1 GByte of memory to create, and it is seldom possible to allocate this much contiguous memory on a 32-bit operating system.)

Rather, you want to allocate images from this data cube as you need them. To do that, you will take advantage of the Count and Offset keywords to the NCDF_VarGet command. For example, to read the first image from the cube, you would execute these commands.

   NCDF_VarGet, fileID, varID, image_1, COUNT=[dims[0], dims[1], 1], OFFSET=[0, 0, 0]

The Count keyword is a 1-based vector of the same number of elements as dimensions of the data cube. It tells how many items to read from this location in the data variable. In this case dims[0] is 5760 and dims[1] is 4800 So, you would read 5760 * 4800 * 1 elements from this location. The Offset keyword is a 0-based vector of the same number of elements as dimensions of the data cube. It gives the offset location into each dimension. So, for example, if you wanted to read the 100th image in the cube, you could set the offset to the vector [0,0,99]. Another way to think of this would be move to 0 in the X dimension, 0 in the Y dimension, and 99 in the Z dimension.

   NCDF_VarGet, fileID, varID, image_100, COUNT=[dims[0], dims[1], 1], OFFSET=[0, 0, 99]

Both of these images would be 5760 by 4800 in size, but they would be taken from different locations in the data cube. You could easily write a loop to cycle through the images in the image cube.

Image Subsetting

Maybe you are interested only in a 1000 by 1000 subset of these images, from the center of the data cube. Then you could calculate the X offset as (5760 - 1000)/2 and the Y offset as (4800 - 1000)/2. The code to read a 1000 by 1000 image from the center of the 50th image in data cube would look like this.

    NCDF_VarGet, fileID, varID, subImage_50, COUNT=[1000, 1000, 1], $
        OFFSET=[(5760 - 1000)/2, (4800 - 1000)/2, 49] 

Image Size Reduction

Maybe a 5760 by 4800 image is still too big. You think you could maybe reduce the image size by a factor of four and still retain enough information in your image. In this case, you want to use the Stride keyword to read every fourth data value in both the X and Y directions of the image. Of course, the number of data points or elements you are going to read is also reduced by a factor of four, so the code will look like this.

   NCDF_VarGet, fileID, varID, reducedImage, COUNT=[dims[0]/4, dims[1]/4, 1], $
        OFFSET=[0, 0, 49], STRIDE=[4, 4, 1] 

In this case, the reducedImage will be a 1440 by 1200 image.

Image Pixel Through Time

Suppose you wanted to get all the values at a single pixel through time. And suppose the pixel was at 4000 in X and 3500 in Y. To get all 825 values for this pixel, you could write code like this.

   NCDF_VarGet, fileID, varID, pixelOverTime, COUNT=[1, 1, 825], OFFSET=[4000, 3500, 0]

If you look at the variable pixelOverTime, you see it is a 1 by 1 by 825 array.

    IDL> Help, pixelOverTime
        PIXELOVERTIME   FLOAT  = Array[1, 1, 825] 

You can convert it to a row vector by removing all the dimensions of length 1, if you like, by doing this.

    IDL> pixel = Reform(pixel) 

You can see that you have a great deal of flexibility in how you read the data cube, through the Count, Offset, and Stride keywords to the NCDF_VarGet command.

Be sure to close the open file when you are finished reading your variable.

    HDF_Close, fileID 

Version of IDL used to prepare this article: IDL 7.0.3.

Web Coyote's Guide to IDL Programming