Drizzling and Chunking in IDL

QUESTION: Alright, you're so smart. I have one for you. What do you know about drizzling algorithms in IDL? (In case you don't know, these algorithms are used in the linear reconstruction of images from undersampled, dithered data.)

ANSWER: Oh, right, that drizzle. I thought for a moment you were talking about ice cream toppings. Well, I know absolutely nothing about that. :-(

[Editor's Note: After this article went to press, Wayne Landsman wrote in with the following caveat. I include it here at the front of the article so that all who continue on are appropriately warned.

... [the] article might lead people to think that there are drizzle or other flux-conserving algorithms available in IDL, and I am not aware that there are any. David's article is really about array decimation, which would be just one step in writing a vectorized drizzle code (as described in this article). My own attempts at writing a vectorized drizzle code floundered on determining partial pixel weights.

I'm pretty sure now that drizzle code is best written in C and linked to IDL. Or perhaps I need to try out the V6.0 IDL-Java bridge Tom McGlynn has written.

Proceed at your own risk. I still think the article contains interesting ideas.]

But fortunately the IDL newsgroup is populated with experts who know all kinds of things. Foremost among them is J.D. Smith, who started a most illuminating thread on this topic, entitled "Chunk Array Decimation" on October 1st, 2002. Here is J.D.

Histogram lovers (and haters alike):

Carsten put this question to me, and I've wasted too much time on it not to report the results. It's a deceptively simple-seeming problem.

You have a vector of indices, inds, and a data vector data of the same length. The inds vector contains a list of many repeated indices, e.g.:


The job is to generate a result vector vec of length max(inds)+1, such that for each index i:

  vec[i]=total(data[where(inds eq i)]) 

or zero otherwise. That is, gather all data with a given corresponding index, and total them together into the result vector at that index. If no indices are repeated, you can of course just use simple assignment, but that's a much simpler problem. Here's a very straightforward implementation based on this idea:

   for j=0L,mx do begin 
      wh=where(inds eq j,cnt)
      if cnt eq 0 then continue

This is slow. Very slow. Verrrrrrrrrry slow. And wasteful. You search through the index vector using Where many, many times. Beware of constructs like this. Surely we can do better. OK, how about a very literal, straight loop approach, just like we'd write in C?

   for j=0L,n_elements(data)-1L do $

Accumulate based on the index. Not too bad. Runs much faster than using Where , but its run-time scales with the number of elements of data, independent to how many repeated indices there are.

Of course, anyone familiar at all with Histogram would realize there's a better route when many indices are repeated:

   for j=0L,n_elements(h)-1 do if ri[j+1] gt ri[j] then $

This taps into the ever-so useful reverse indices vector to pick out those elements of data which fall in each "bin" of the index histogram. Notice I'm using OMIN to save time in case the minimum index is greater than 0. This is much faster than the Where method, and can be a factor of 2 or 3 faster than the literal loop approach, if indices are repeated at least a few times on average (a few drops in each histogram bin). If indices are never repeated, or especially if many indices are skipped (a sparse set), the literal loop method can be much faster than Histogram.

This is all well and good, but Carsten was amazed that I'd offered a loop in a solution. I wondered whether a loop-free and possibly faster version existed.

Given the initial set of reverse indices, the problem reduces to one of: is there a loop-free way to decimate an array using a variable width "chunk" decimation? E.g.: total the first 5, then the next 3, then the next 0, then the next 7, ..., elements of an array. I was encouraged by the Histogram(Total(/CUMULATIVE)) method which solved Pavel's chunk fill problem of years past, and came up with:

   row=ri2[0:n_elements(h2)-1]-ri2[0]+om ; chunk indices = row number
   if mx ge n_ind then $
      vec4=sprsax(sparse_array,[data,replicate(0.,mx+1-n_ind)]) $
   else vec4=(sprsax(sparse_array,data))[0:mx]

I use sparse arrays to solve the chunk decimation problem, with the chunk fill method generating the row numbers of non-zero (unity actually) entries, and the original reverse indices generating the column numbers. Unfortunately, RSI's Numerical Recipes-based sparse array routines demand square arrays (which seems unnecessary to me), so you either have to pad the data or truncate the result, depending on whether you have more repeated indices than skipped indices. Even so, this method is at least 10 times faster than the former Histogram method for relatively dense (many repeated index) mappings! For sparse sets of indices, it still works (thanks to that if statement at the end), and, amazingly, can still beat the literal loop method! Such is the penalty for looping at all using IDL variables, that you're better off going to elaborate lengths creating sparse array structures and histograms just to get all your looping to occur in real compiled code.

For a random list of 20,000 indices drawn from 0-2000 (a dense sampling: each index repeated 10 times, on average), the methods time to (average, sec):

    0.48   ; where()
    0.024  ; literal loop
    0.011  ; histogram() loop
    0.0096 ; sparse array method

And for 20,000 indices drawn from 0-40,000 (a sparse sampling: only 1 out of 2 indices present on average), you get:

    5.839  ; where()              
    0.0257 ; literal loop           
    0.1047 ; histogram() loop       
    0.0207 ; sparse array method

Yes, it's ugly, but the numbers speak for themselves. For those of you not too squeamish to look closely, you'll see that I've used a histogram of a histogram.

Does anyone begin to feel like the looping penalty in IDL is a bit much? In any case, it looks like I'm going to add the spr* functions to my rebin/reform/histogram/## power tool list. They seem to be quite fast.

Wayne Landsman then let us know that this new discovery of J.D.'s had a practical application. In fact, Wayne had tried to implement a version of drizzling already.

The problem that discussed by JD is actually a very practical one, that can be used in "drizzling" algorithms. This a method of combining or warping images that preserves flux -- every pixel in the input image is equally represented in the output image. Instead of starting with an input pixel and mapping to an output image (e.g. as with POLY_2D) , one starts with an output pixel and determines which input pixels get mapped into it. The flux conservation property is one very dear to astronomers, and for which there are no existing IDL tools.

My solution to the problem combined the REVERSE_INDICIES aproach of JD, with the "accumlate based on the index" approach. For the drizzle problem, one is probably only going to sum at most 3-4 pixels together, so it makes sense to loop over the number of distinct histogram values (i.e. loop only 3-4 times).

My solution is below, but I have to admit that I haven't looked at it for a while.

P.S. I never finished the drizzle algorithm, because I couldn't figure out a quick way to compute partial pixel overlaps in IDL...

pro fdrizzle, vector, index, values
;    Add values to an array at specified indicies. The basic usage is
;       FDRIZZLE, vector, index, values
;    where INDEX and VALUES should have same number of elements. If there are
;    no duplicates in INDEX then FDRIZZLE simply performs the assignment
;    But if INDEX contains repeated elements then the corresponding VALUES
;    will be summed together.
;    Use the REVERSE_ELEMENTS keyword of histogram to determine the repeated
;    values in INDEX and vector sums these together.

 h = histogram(index,reverse = ri,min=0,max=N_elements(vector)-1)

; Add locations with at least one pixel
 gmax = max(h)         ;Highest number of duplicate indicies

 for i=1,gmax do begin
       g = where(h GE i, Ng)
     if Ng GT 0 then  vector[g] = vector[g] + values[ri[ ri[g]+i-1]]


J.D. liked what he saw, and added this.

That's a very interesting approach, Wayne. People who need to understand the reverse indices vector would do well to study this one. I put it into the same terms as my problem for testing:

   gmax = max(h)                ;Highest number of duplicate indicies
   for j=1,gmax do begin
      g = where(h GE j, Ng)
      if Ng GT 0 then  vec5[om+g] = vec5[om+g] + data[ri[ ri[g]+j-1]]

I was interested to see that your method beat mine for normal densities by about a factor of 2! This should provide some cannon fodder for Craig in his loop-anti-defamation campaign: keep loops small, and they're not bad. The only change I added was using OMIN as opposed to fixing MIN=0, but that shouldn't account for much if any improvement.

However, one thing still bothered me about the your method: even though the loop through the bin depth is small (e.g. maybe up to 5-10 for FDRIZZLE-type cases), you're using WHERE to search a potentially very large histogram array linearly each time. What's the solution? Why, just use another histogram to sort the histogram into bins of repeat count, of course. Now this is a true histogram of a histogram.

   ;; easy case - single values w/o duplication
   if ri2[1] gt ri2[0] then begin 
   for j=1,n_elements(h2)-1 do begin 
      if ri2[j+1] eq ri2[j] then continue ;none with that many duplicates
      vec_inds=ri2[ri2[j]:ri2[j+1]-1] ;indices into h1
      vec_inds=rebin(ri1[vec_inds],h2[j],j+1,/SAMPLE)+ $

This is absolutely the fastest I've seen... faster by a factor of ~2 than FDRIZZLE. Here are some timings again, for the curious:

20,000 Indices

Indices repeated once, on average:

WHERE loop:                                      3.8967
Literal Accumulate Loop:                         0.0250
Reverse Indices Loop:                            0.0725
Loop-Free with Sparse Arrays:                    0.0136
FDDRIZZLE Loop:                                  0.0107
Dual Histogram Loop:                             0.0077

Repeated 5 times, on average:

WHERE loop:                                      0.9433
Literal Accumulate Loop:                         0.0241
Reverse Indices Loop:                            0.0214
Loop-Free with Sparse Arrays:                    0.0102
FDDRIZZLE Loop:                                  0.0069
Dual Histogram Loop:                             0.0041

Repeated 20 times, on average:

WHERE loop:                                      0.2510
Literal Accumulate Loop:                         0.0246
Reverse Indices Loop:                            0.0063
Loop-Free with Sparse Arrays:                    0.0095
FDDRIZZLE Loop:                                  0.0075
Dual Histogram Loop:                             0.0033

Repeated 50 times, on average:

WHERE loop:                                      0.1016
Literal Accumulate Loop:                         0.0246
Reverse Indices Loop:                            0.0032
Loop-Free with Sparse Arrays:                    0.0094
FDDRIZZLE Loop:                                  0.0079
Dual Histogram Loop:                             0.0033

Only 1 in 5 indices present (WHERE loop omitted -- too slow):

Literal Accumulate Loop:                         0.0275
Reverse Indices Loop:                            0.1754
Loop-Free with Sparse Arrays:                    0.0453
FDDRIZZLE Loop:                                  0.0264
Dual Histogram Loop:                             0.0196

Only 1 in 20 indices present:

Literal Accumulate Loop:                         0.0334
Reverse Indices Loop:                            0.4785
Loop-Free with Sparse Arrays:                    0.1471
FDDRIZZLE Loop:                                  0.0623
Dual Histogram Loop:                             0.0530

Only 1 in 50 indices present:

Literal Accumulate Loop:                         0.0419
Reverse Indices Loop:                            1.0674
Loop-Free with Sparse Arrays:                    0.3461
FDDRIZZLE Loop:                                  0.1289
Dual Histogram Loop:                             0.1127

Thanks for the pointer.

Naturally, J.D.'s mention of Craig Markwardt brought him out of the woodwork, with a reminder that nothing is ever new.

Here I come late to the game again. This topic actually came up before by Liam Gumley in September 2000.

My solution then was the following loop (expressed in today's variable names):

  n = n_elements(vec)
  hh = histogram(inds, min=0, max=n-1, reverse=rr)
  wh = where(hh GT 0) & mx = max(hh(wh), min=mn)
  for i = mn, mx do begin
    wh = wh(where(hh(wh) GE i, ct))          ;; Get IND cells with GE i entries
    vec(wh) = vec(wh) + data(rr(rr(wh)+i-1)) ;; Add into the total

This is essentially the same as Wayne's FDRIZZLE routine, with the difference that the WHERE-generated index array is slowly whittled away by repeated thinning. Thus, the WHERE function gets faster and faster as the loop proceeds. At the time, I was crowned the victor by Pavel :-), but I don't know how I will do against this round of competitors.

However, all of these optimized techniques that Wayne and JD have proposed in the end game here, including mine, suffer if the dynamic range of the histogram is very large. For example, if the input array contains a million 1s, then any of the proposed loops will still take 1 million iterations. There are even ways around that, which reminds me to finish an old routine named CMHISTOGRAM...

PS. In my case, NO, I do not have to check the count returned by WHERE, because by the construction of the loop, there is always guaranteed to be at least one element.

No good idea ever gets past J.D. He was onto this like a dog on a bone.

Too much fun. I translated your thinned WHERE method into my terms:

   h = histogram(inds,OMIN=om,REVERSE_INDICES=ri)
   wh = where(h GT 0)
   mx = max(h[wh], min=mn)
   for j=mn,mx do begin
      wh=wh[where(h[wh] GE j)]  ; Get IND cells with GE i entries
      vec7[om+wh]=vec7[om+wh] + data[ri[ri[wh]+j-1]] ; Add into the total

Then you wrote:

However, all of these optimized techniques that Wayne and JD have proposed in the end game here, including mine, suffer if the dynamic range of the histogram is very large. For example, if the input array contains a million 1s, then any of the proposed loops will still take 1 million iterations. There are even ways around that, which reminds me to finish an old routine named CMHISTOGRAM...

With a million 1's, you have only one iteration in your loop, since there's just one bin in the histogram. This example illustrates an error in your formulation: it only works if mn is 1 (which it almost always will be in a large enough vector of random indices)! Why? Because you need the loop to accumulate all of the values from ri[wh]...ri[wh]+n_bin. If you have only one bin of 1000000, you just pick out the value at ri[ri[wh]+1000000]! It's fast, but wrong. FDRIZZLE works correctly because it starts its loop explicitly at 1. Yours works if I modify it to start at 1 also:

   h = histogram(inds,OMIN=om,REVERSE_INDICES=ri)
   wh = where(h GT 0)
   mx = max(h[wh],min=mn)
   for j=1,mx do begin
      wh=wh[where(h[wh] GE j)]  ; Get IND cells with GE i entries
      vec7[om+wh]=vec7[om+wh] + data[ri[ri[wh]+j-1]] ; Add into the total

In the pathological case of 20,000 1's, I get:

WHERE loop:                                      0.0014
Literal Accumulate Loop:                         0.0246
Reverse Indices Loop:                            0.0014
FDDRIZZLE Loop:                                  0.2256
Dual Histogram Loop:                             0.0030
Thinned WHERE Histogram Loop:                    0.2623

The WHERE loop and reverse indices are essentially equivalent to one call to Total with a vector of all indices, and so are quite fast. My method also uses Total , but just has to skip all the empty bins. I changed it to do this by starting at min(h1) (rather than just loop through and CONTINUE all those times), and it's fairly fast.

In a more reasonable case of an index density of 5 (indices repeated 5 times on average), I get:

WHERE loop:                                      0.9506
Literal Accumulate Loop:                         0.0245
Reverse Indices Loop:                            0.0213
Loop-Free with Sparse Arrays:                    0.0102
FDDRIZZLE Loop:                                  0.0064
Dual Histogram Loop:                             0.0040
Thinned WHERE Histogram Loop:                    0.0069

Strangely, yours always performs slightly worse than Wayne's, despite the thinning. This is a dual processor machine, so your mileage may vary, but in any case it's not faster. Just for fun, here's a run with 1,000,000 random indices with a density of 20:

Literal Accumulate Loop:                         1.2437
Reverse Indices Loop:                            0.7192
Loop-Free with Sparse Arrays:                    1.1367
FDDRIZZLE Loop:                                  0.7882
Dual Histogram Loop:                             0.5489
Thinned WHERE Histogram Loop:                    0.8438

One more very depressing timing to report. I compiled a C version of the literal accumulate loop, which consists entirely of:

  for(i=0;i<=n_elts-1;i++) vec[inds[i]]+=data[i];

(omitting 10 lbs of DLM cruft). Here's a test run with 1,000,000 elements, each index repeated 20 times on average:

Literal Accumulate Loop:                         1.2411
Reverse Indices Loop:                            0.7217
Loop-Free with Sparse Arrays:                    1.1401
FDDRIZZLE Loop:                                  0.7815
Dual Histogram Loop:                             0.5490
Thinned WHERE Histogram Loop:                    0.8422
Literal Accumulate: Compiled DLM :               0.0288

Yes, that's right, 20 times faster than the fastest pure IDL method. What's really amusing is to compare the compiled and uncompiled Literal Accumulate Loop, which uses precisely the same logic: 43 times faster, which is the approximate penalty you pay for loops in IDL vs. loops in C. This is optimized C (whereas IDL is not heavily optimized), but it only uses one processor. Threads are not enough to recover the tremendous difference between native compiled and IDL code.

I discover this disparity about once every year, and then conveniently forget about it, lest I should spend too much time writing function declarations. :-)

Jaco van Gorkom finally chimed in for the rest of us, dazzeled by drizzel.

JD, we mortals need a tutorial or two in order to even begin to understand that Sparse Array trick you pulled there. My humble contribution to this thread is to propose an additional and hopefully simpler (or, more intuitive) algorithm: let us first sort the data array based on the index array, and then use a cumulative total to do the chunking. In pseudo-code:

  sortInds = SORT(inds)
  totData = TOTAL(data[sortInds], /CUMULATIVE)
  uniqInds = UNIQ(inds[sortInds])
  subTotData = [0., totData[uniqInds]]
  vec7 = subTotData[1:*] - subTotData[0:*-1]

leaving some minor issues like non-occurring indices unresolved.

The above algorithm is by far not fast enough because of the very slow SORT operation. If this is replaced by a sorting routine which is more optimised for the problem at hand, such as, well, I'll let you guess;) , then things get much better:

  nh = N_ELEMENTS(h)
  sortData = data[ri[nh+1:*]]
  totSortData = [0., TOTAL(sortData, /CUMULATIVE)]
  vec8 = totSortData[ri[1:nh]-nh-1] - $

On my machine this seems to be always slightly faster than the double histogram loop. Here are some timings:

For 1,000,000 elements, each index repeated 20 times on average, on a single PIII 1GHz, W2K:

Literal Accumulate Loop: 1.2778
Reverse Indices Loop: 0.9794
Loop-Free with Sparse Arrays: 1.2589
FDDRIZZLE Loop: 0.9543
Dual Histogram Loop: 0.6329
Thinned WHERE Histogram Loop: 1.0506
Simple sorting plus cumulative total: 2.6347
Histogram plus cumulative total: 0.6119

And for each index repeated only once on average:

Literal Accumulate Loop: 1.3920
Reverse Indices Loop: 8.1998  (?)
Loop-Free with Sparse Arrays: 1.8647
FDDRIZZLE Loop: 1.7135
Dual Histogram Loop: 1.3129
Thinned WHERE Histogram Loop: 1.7996
Simple sorting plus cumulative total: 2.8701
Histogram plus cumulative total: 1.2488

J.D. explained some of his thinking and saw a potential problem.

Well, think of an array multiplication of a very large array on a very long data vector. Consider only the first row. If almost everything in the first column is 0., but a few choice 1.'s, that is the same as selecting those elements of the data vector and totaling them. If you really had to do all the null operations like:


then it would do you no good at all: all those useless multiply-by-zeroes would waste far too much time to be efficient. Fortunately, "sparse" arrays were invented for just this problem: you specify only the non-zero elements, and, when multiplying using sprsax, they alone are computed. If you adjust the array values, you could obviously use this to do much more than totaling selected data elements.

The biggest problem with this method, though, is round-off error, which accumulates like mad in a cumulative total of any length. If you do it in floating point, as you've written, I find unacceptably large round-off errors for as few as 500 individual indices. If I convert the addition to DOUBLE, this mitigates (but does not eliminate) the round-off errors, but then it erases the slight time advantage this method has over the prior champ (the dual histogram). Neither, of course, come close to the compiled DLM. :-(

Jaco replies.

Oops. Ok, so we should use /DOUBLE, indeed. But a quick test of TOTAL(REPLICATE(1., 30000000), /CUMULATIVE) reveals roundoff errors only after roughly 17,000,000 elements on my machine. If you tested the routine with a data vector (data = FINDGEN(100000)) then the average value for each data element would be 50,000 , so after 500 indices the cumulative total is already 25,000,000. Realistic data might well have a lower average value, or even zero average for data spread around zero, so that the cumulative total of 25,000,000 is only reached after many more elements or never.

And what exactly do you mean by "accumulates like mad in a cumulative total?" Do you mean it suffers a cumulative total more from round-off error than a single total? That should not be, should it? The way I see it, not having any computer science background, the error in a cumulative total would be the (very small) error in each sum (accumulating many, many times), plus the (much larger) round-off errors that occur when we get to very high values. So as long as we do not get to high values we should be ok, especially since we take differences out of sub-elements of the total, which will not be very far apart, probably.

P.S. The (floating-point) round-off error in TOTAL seems to come in integer steps, how strange!

And, of course, we let J.D. have the last word.

I guess I mean that to get the answer for a single element of the result vector, say the 100th element, you must do many more additions with this technique. You are throwing away most of the additions in the accumulation. Each addition has associated with it an intrinsic round-off error. Compounding these errors many times is what leads to the "mad accumulation" of round-off in the cumulative sum. This is more or less the same whether performing a regular or cumulative total. However, with all the other techniques in this thread, only the data in that bin are summed (a sum which still has its own round-off error, but it's much smaller).

Essentially the issue is, your way performs the calculation of v+x+y+z as:

a+b+c+d+e+f+g+h+j+k+l+m+n+o+p+q+r+s+t+u+v+x+y+z -

An example:

The sum of the first n natural integers is n*(n+1)/2 (a result Gauss discovered as a young schoolboy, much to the consternation of his lazy teacher, who had assigned her students a full morning to summing the first 100 integers). E.g. for n=1 million:

IDL> n=1000000LL & print,n*(n+1)/2

IDL> print,total(1.+findgen(n)),FORMAT='(D18.2)'

a difference of 101744640, or about .02%. Interestingly, the result is slightly different with accumulation:

IDL> print,(total(1.+findgen(n),/CUMULATIVE))[n-1],FORMAT='(D18.2)'

but it's roughly the same magnitude. Try it on your machine and see. Round-off error is always roughly the same magnitude in terms of the mantissa. This can be a large number or small number, depending on the exponent. The relative round-off is therefore a more interesting measure.

It's still a neat technique though.

Finally, a note from J.D. seconding Wayne's caveat, expressed at the top of this article, that this kind of drizzling algorithm might be best coded in C.

I tend to agree with Wayne [about coding a drizzling algorithm in C], but I do have a Sutherland-Hodgeman polygon clipper written in pure IDL which, together with any one of the aggregation techniques described in the article, could constitute a full drizzle algorithm. I'm using it for a different, but similar, purpose. If enough people bug me I'll polish it up and put it out for the enjoyment of all.

I did find that it was fairly slow for even my (densely dithered) 128x128 arrays, and ended up writing a replacement clipper in C. The most fun thing about it: it uses MAKE_DLL to compile the C version of the code to a shared library auto-magically, and, if this fails, just falls back on the slow, but equivalent, IDL version, with no one the wiser. It's still not nearly as fast as coding the whole problem in C, thanks in no small part to the CALL_EXTERNAL overhead, and the need to make a round trip to the clipper so frequently, but it's suitable for my needs. I suspect drizzling 10's of 2kx2k images would get a bit ugly though.

By the way, this is one problem which cannot, at least insofar as about two concerted weeks of deep thought on the problem revealed, be vectorized using the many IDL tricks we discuss here. It's surprising when you run up against these problems, but they do exist. If RSI is listening, they should bone up on Sutherland-Hodgeman or other, fancier clippers, and implement a vectorized version in IDL. I should be able to give a large list of polygons, and have them all clipped to a grid of a specified size, with optional return of the grid pixel(s) and area(s) each polygon clipped to. Somebody would no doubt write a nice, fast DRIZZLE algorithm, maybe with a GUI interface, and that would be one less reason to fire up IRAF. Another very useful routine: given a (long) input list of polygons, compute the internal overlap with areas.

This array decimation subject seems to come up over and over again. This caused JD to dig out his old test code and make it available so people could use their own data for time testing. JD also add this comment:

This is a good point of embarkation after absorbing the HISTOGRAM tutorial, going deeper into the potential uses of this function. It also serves to illustrate a simple time testing setup.

Give it a run and see what's fastest for you (though do make sure you understand the round-off issues with the sort cumulative method). For mixed sparseness of indices in the same problem, from 1:20 to 30:1, you'll probably have the best luck with one of the specialized REVERSE_INDICES methods. Just plug data and inds in from your problem. The sad and amusing thing is how much faster the one-line literal loop approach in a compiled C DLM is (about 20 times faster than the fastest abstruse IDL method).

[Return to IDL Programming Tips]