This is the first of what I hope will be a series. We’ll call this part, Only Getting Halfway with AS3’s Built-In Sound Capabilities. I’ve spent close to a week working on a little application, which has so far turned out pretty nice. The end result is an AIR app that reads sound files, draws their waveforms, and then saves out a PNG of each sound’s waveform.

Easy peasy, right?

Sure enough, we thought this would be a pretty quick job. Just use the built-in SoundChannel.peakLeft and peakRight properties, or perhaps SoundMixer.computeSpectrum(). Load an MP3, set up a loop to play a Sound from an incremented position, and grab the peak values. Should take a few minutes, tops!

Except that that approach has several flaws.

The first problem is that the loop/play/read approach doesn’t work the way you’d think it does. When you start playing a Sound, and then grab the left/rightPeak, you get 0 back as the first result. Always. So, you’ll next try setting up a Timer to alternately start playing the Sound, then reading the peak a few moments later. And you’ll get the same result: 0. Same sort of deal with computeSpectrum(). The only way I was able to get this approach to work was to make a second peak an interval of time after the first. So, my Timer would drive a function that would alternately play the Sound from an incremented position and then immediately read the peak, and then read the peak again on the next Timer interval. And repeat.

That was a hassle, but at least we can now grab peak information from the MP3. Well, sort of. leftPeak/rightPeak returns a value from 0 to 1. We don’t get negative values. This was a minor issue, as you could certainly argue that when zoomed out enough on the graphic, you can get away with simply drawing a mirror image of the positive value. Except…what if we did want that data? We’re drawing waveforms here, and I won’t get into why we need to do so, but we might need that detail. Which brings me to gripe #3.

There’s going to be a limit to how much detail we can get out of a Sound, using left/rightPeak. Given that telling a Sound to play from a certain position seems to be an inexact science, and that we can only grab that peak data after waiting for a TimerEvent to fire, which is also an inexact science, we’re limited as to how accurate we can really be.

But even if we’re OK with a certain degree of inaccuracy and resolution limitation, there’s something else that I started to realize as I was experimenting with trying to get this to work. Imagine a drum track, by itself (no other instruments). Drums produce highly-transient sounds, that is, they peak quickly, last just a few milliseconds, and then decay quickly. You’ve probably seen a drum waveform, with its extra-spikey, comb-like appearance. You’ve probably also seen a, say, rock guitar waevform, which tends to have a lot more “body.” In the image below, a drum track is on top, and a guitar track is on bottom.

Drums track vs. Guitar track

Drums track vs. Guitar track (click for bigger version)

So, imagine that we’re gathering sample data at a certain rate. Also imagine that the sampling rate (of the peak data gathering, not the audio file itself) is at a lower resolution (for a more “zoomed out” waveform) and, most importantly, not in sync with the tempo of the sound. In the image below, the situation is exaggerated, but it’s a drum track (a little more zoomed-in that the last one) with peak data samples happening where the white lines occur. Notice that the samples have completely failed to capture the fact that two large peaks happened, and managed to sample peak data that is actually very low energy. This is an inherent problem with simple polling of highly dynamic data.

Drum track with hypothetical sample locations

Drum track with hypothetical sample locations (click for bigger version)

Ideally we’d have a “sample window” that we can take all of the actual samples as they exist in the audio file for a certain duration, and average them (or something) into a bit of data for waveform drawing. That is, instead of taking a single data sample at each of the above white line, we’d take all of the data between the lines, and come up with a value to represent that duration of time. And here’s about where we hit the limits of using the SoundChannel and SoundMixer classes, as those represent only instantaneous data, not continuous data.

And to top it all off, the deal-killer is that in order to get the peak data (using either left/rightPeak or computeSpectrum()), you have to be actually playing the song. Seems like a “no doi,” I know, but it’s not until you actually try this that you realize that annoyance levels rise very quickly to dangerous, even lethal, amounts when you have to listen to the sound playing back in a stuttered fashion. All those play() calls from the Timer actually play the sound. But only for 50 ms or so until you stop it in the next TimerEvent. But then you play it again 50 ms later…and so on.

“Gee Dru,” I hear you saying, “just set the Sound’s volume to 0.” Yes, good point. I tried that. Turns out that both left/rightPeak and computeSpectrum() are “post fader,” that is, they get the sound energy level after the Sound’s volume(s) is adjusted. Set the Sound’s volume to 0, and you’ll get 0 out of every peak read operation.

“Well can’t you just turn your computer speakers down while you do this?” Um, no.

What’s a boy to do? My next thought was to look into reading the actual sound file and see if we can get the sample data from the ByteArray. Stay tuned for the next post for more on that.