Now afore you start questioning my test procedures, understand that its only referenced to itself. So even if the test system is out of scale, it's out of scale only to other systems, not itself, so variations within it are consistent. My deq2496 went back to the band before I got to calibrate the meter in the pda. It does however calibrate itself to the internal mic in the pda.
First spl with pink noise. 24 piezos within 1db of each other from 2k to 12k. +- 2-3 db above 12k. Almost ruler flat to 12k.
Here's where it gets interesting. Sweep from 2k to 15k. 9 out of the 24 exhibited an odd characteristic. You would hear a secondary and tertiary harmonic somewhere between 8k and 15k, below the sweep, in the 4k and 2k region. Pretty loud on some, faint on others. It happened too fast for me to capture it, so I ran just 8k, 10, 12k, and 15k tones. I saw the expected bumps in response at 4k and 16k with the 8k tone, but no evidence of what I was hearing. Not there. I didn't want to spend all day looking for just the right freq, so I just ran sweeps from 8k to 18k. A couple more exhibited it at 18k, but they were of high enough frequency to be neglible.
The worst ones in the 8k region popped somewhere around 2-3k, which would be a tertiary harmonic. It was a very stepped sound. I'm betting this is the source of the some of the nastiness that piezos can exhibit. We end up swapping out elements because this shows up. I know I've built arrays, isolated each one with a tube and there are distinct differences in some of the tweeters. The array smooths it all out if it's not too bad.
The good ones were smooth as silk.

One nice thing I found is I can still hear to 15-16k. Pretty damn good for almost 50.
Thoughts? Suggestions for further testing?