# Representing Missing Data in IDL

**QUESTION:** To all ye who have attained IDL nirvana and to the one who speaketh
the truth :p

Missing data points in my *binary *data (coded as 32-bit words) are
denoted by large numbers, like 999999. In order *not *to plot these
missing values, I am using **!Values.F_NAN**. But the array ought to be
floating type to set it to **NaN **directly. Besides, the type of a variable
within a structure can't be modified.

What I did was this:

array = fltarr(dim) ; dim is dimension, i.e., #days * #data/day array = float(data[*].mydat) ; data[dim].mydat is data variable array [where(array [*] eq 999999.)] = !Values.F_NAN

It works, but I was wondering if there is a “better way” to do this?

** ANSWER:** Ken Bowman answers this question on the
IDL Newsgroup.

The first line in your code above is unnecessary, as the following line will create the array as a floating point array automatically.

And the third line in your code above is not a great idea, since it will crash when there are no missing data. Plus, the

[*]syntax is completely unnecessary. You should do something like this instead:i = WHERE(a EQ 999999.0, count) IF (count GT 0) THEN a[i] = !VALUES.F_NANOther than that, the concept seems fine. You have to create a FLOAT variable in order to use

NaNs, which I heartily endorse.The only alternative is to create the original data structure using a FLOAT instead of a LONG (presumably when you read the data). I prefer to replace missing data codes with

NaNs at the point I read the data. That way I don't use them inadvertently.

I pointed out that if it was really only a problem with plotting the data,
then a **MAX_VALUE **keyword would work perfectly well, without any need to change
the data to **NaN**s:

Plot, array, MAX_VALUE=999999-1

Ken agreed with this, but pointed out that representing data as **NaN**s often prevented
other problems downstream of the plotting.

This is true, but using “special numbers” to indicate missing data is rife with the possibility using the missing value as valid data with noticing it. I'm a big advocate of using

NaNs because they ensure that if you use them by mistake, your result will be aNaN(which is usually hard to ignore).

This caused the original questioner to ask another question.

**QUESTION:** This prompts me to ask another question,
if I may. Since I have lots of missing data, and I do lots of math
operations (array operations, FFT, etc.), will these **NaN**s propagate all
the way through in such situations? Should I be using them in
conjunction with **FINITE **command? Any pointers as to where one ought to
be careful with these **NaN**s?

** ANSWER:** Ken answered with a warning about a potential bug in **TOTAL ** in IDL 6.3
that could cause the IDL user problems.

Many IDL functions include

/NANkeywords to skipNaNs in operations (TOTAL,MEAN, etc.). In other cases, you will have to find the good data withWHERE(FINITE(...), COUNT = count).There is one special case that you have to watch out for when using

TOTALwith the/NANkeyword. Ifallof the elements areNaNs, the result returned is not aNaN, but a zero!IDL> x = replicate(!values.f_nan, 5) IDL> print, x NaN NaN NaN NaN NaN IDL> print, total(x) NaN IDL> print, total(x, /nan) 0.00000I think this is a serious implementation bug because it renders the

/NANkeyword useless in most circumstances, but I guess we are stuck with it.Inconsistently, this happens with

TOTAL, but not withMEAN.IDL> print, mean(x, /nan) NaN

Editor's Note: This inconsistency has been fixed in the IDL 7.1 version I am looking at currently.

Copyright © 2007 David W. Fanning

Last Updated 8 April 2007