Open Audio: 2016

Sunday, December 4, 2016

Extending Teensy Audio Library for Floating-Point

For my Teensy-based hearing aid, it will be much easier to program new audio processing algorithms if I can use floating-point aido data instead of fixed-point audio data. Unfortunately, the Teensy Audio Library assumes the use fixed-point data types (ie, Int16) for all of its processing blocks. So, I decided to extend that library to enable the floating-point processing that I want. Here's an overview of how I made it work. My "OpenAudio" library is available on my GitHub here. I'd love any feedback (or GitHub pull requests!) on how it could be done better.

Teensy Audio Library Assumes Int16: My goal has been to maintain as much of the Teensy Audio Library's structure as possible. But, many of the core elements of the Teensy Audio Library are deeply entwined with the assumption of fixed-point Int16 audio data. Shown in red in the figure above are several of the foundational elements of the Teensy Audio Library that are tied to Int16 audio data. These are the elements that I extended.

OpenAudio F32 Versions: To enable Float32 operations, I wrote new Float32 versions of these elements. I used inheritance where possible to reduce the duplication of functionality, particularly for the AudioStream to AudioStream_F32 conversion. To maintain the structure and conventions of the Teensy Audio Library, I made a one-for-one replacement so that one only need to substitute the "_F32" version for the standard version.

Conversion Routines: To interface between the Int16 data of the Teensy Audio Library and the Float32 data used my extended library, I also wrote two new "AudioConvert" classes. In addition to converting between the Int16 and Float32 data types, these routines also re-scale the data. The Teensy Audio Library assumes "full scale" is ±32768 whereas my floating-point objects assume that ±1.0 is full scale. My conversion routines automatically account for this difference.

OpenAudio F32 Example: If you download the library and unzip it into your Arduino Libraries directory (see "Installation" notes here), you can start the Arduino IDE and load an example sketch that comes with the library. Go under the "File" menu and select "Examples", then, "OpenAudio_ArduinoLibrary", then "BasicGain_Float".

Add a New #include: Inside this sketch, you'll see many features that are similar to every sketch that uses the Teensy Audio Library. For example, the screenshot below shows the #include statements at the beginning of the sketch. Note that I added an #include for the new OpenAudio library. Once this line is added, any of the new OpenAudio classes can be used.

Instantiate the New Classes: In a typical sketch using the Teensy Audio Library, the next block of code instantiates the audio-related objects. The screenshot below illustrates this by invoking the standard AudioControlSGTL5000, AudioInputI2S, and AudioOutputI2S classes. After these standard lines, I added three new lines that instantiate blocks for floating-point processing.

In this case, AudioConvert_I16toF32 will convert the standard Int16 audio data to float32. AudioEffectGain_F32 will apply gain to the float32 audio data. AudioConvert_F32toI16 will convert the audio data back to Int16 so that it can be output by the usual functions of the Teensy Audio Library.

New Audio Connections: After instantiating the objects, the next step is to make the audio connections between the objects. The screenshot below shows that a mix of standard (Int16) connections and new (Float32) connections are used to make the full processing chain. The standard connections (AudioConnection) are used whenever the data being passed is Int16 data. The new connections (AudioConnection_F32) are used whenever the data is float32 data.

Allocate Float32 Memory: The last step is shown in the screenshot below. In the setup() function is where memory is allocated for the Teensy Audio Library via the AudioMemory() statement. Following this same pattern, I added the AudioMemory_F32() statement to allocate the float32 memory that is needed for the floating-point processing.

Complie and Run: If you have a Teensy 3.5 or 3.6 and a Teensy Audio Board, you can compile and run this example sketch. If you have a potentiometer attached to the Teensy Audio Board's volume control, you can use the pot to adjust the volume of the sound (well, it actually adjusts the gain applied by the new floating-point gain block). On my hardware, it works great! Hopefully it works well on yours, too.

Next Steps: With the floating-point audio processing structure in-place, I can move forward with adding more "F32" processing blocks for the different functions that I need. My next block will be a dynamic range compressor. Then I'll probably add a filtering block. Eventually, I'll be looking to add frequency-domain processing, which will likely involve extending my library again for a "complex_float32" data type. That'll be even more fun!

Update: I had my first Pull Request with a user contribution. So awesome!
Update: I've added an algorithms: a basic Dynamic Range Compressor.

Tuesday, November 29, 2016

Teensy Audio Board Headphone Level

For my home-brew Teensy Hearing Aid, I'm using the Teensy Audio Board as my audio interface. The heart of the Teensy Audio Board is the Freescale SGTL5000 stereo codec. This codec is very flexible, with lots of virtual knobs for tuning your system to suit your own application. For my hearing aid, it is important for me to understand all settings related to loudness. So, today, I'm going to look at the impact of the headphone volume control.

System Block Diagram: The figure above is taken from the datasheet for the SGTL5000. All of the blocks in yellow are analog gain stages. The headphone volume control is at the end of the chain, on the upper right. It says that the headphone volume can be adjusted from -52 dB to +12 dB.

Teensy Audio Library: Because this SGTL5000 is embedded within the Teensy Audio Board, I control it via the Teensy Audio Library. The library hides many of the details of controlling the SGTL5000. For example, it allows the user to control the headphone volume simply by commanding a value somewhere between 0.0 and 1.0. Presumably, 0.0 maps to the -52 dB setting and 1.0 maps to the +12 dB setting.

Measuring the Response: To see what this means for my system, I wrote a sketch for the Teensy that uses the Teensy Audio Library to continuously generate a nearly full-scale sine wave and to output it via the headphone jack (code is shared here). The volume setting for the headphone output is controlled via potentiometer on the Teensy Audio Board. To see the effect of the headphone volume setting, I measured the RMS voltage produced at the headphone jack using a Fluke handheld digital multi-meter.

Results: The figure above shows the values that I measured. As can be seen, the response in the middle section is linear, which is great. At the low end, my results do deviate from linear, but that's presumably because my multi-meter had difficulty measuring such small signals (-50dBV is only 3.2 mVrms!).

Saturation: The interesting result happens at the high end of my graph where I turn up the volume control to its highest values. Here, my results saturate -- increasing the volume command does not increase the output voltage. The break-over point is at a command of about 0.85. This corresponds to an output signal level of +1.5 dBV, which is 1.2 Vrms, which is 3.4 Vpk-pk. Since the power supply to the SGTL5000 is only 3.3V, it is not surprising that the output saturates when I ask for the output signal to be greater than the power rails.

Best Setting to Use: Saturation will sound bad. It will cause distortion in the audio. To keep my hearing aid sounding good, I want to avoid this saturation. If my full-scale sine wave causes saturation at a volume setting of 0.85, I am going to set my volume to 0.8.

Saturday, November 26, 2016

Reducing Current Draw

My previous post introduced my Teensy-based hearing aid. While my primary goal was to make a platform for developing audio processing algorithms, I am interested in exploring its performance as an actual hearing assistive device. From this viewpoint, battery life is important. In assembling my device, however, I gave no thought to minimizing power consumption. Does it draw too much current? Will it have a usable battery life? Let's find out!

Initial Measurements: With my baseline Teensy Hearing Aid, I measuring the current being drawn from the battery during normal operation. At this early stage of development, the Teensy isn't doing very much -- it is simply applying digital gain to the audio stream. Because it is doing so little, I am hoping that we'll see the lowest power consumption. My results are below. Note that I measured the power consumption across a range of processor speeds. You can set the speed of the processor via the Arduino IDE when you compile the sketch.

That's A Lot of Current! I was surprised to find that at its normal speed (180 MHz) the system was drawing almost 91 mA of current. That's a lot of current. My battery has a capacity of 350 mA-hours, so my battery life will only be about (350 mA-hrs / 91 mA) = 3.8 hours. For a "hearing aid", this is pretty short. Yes, I could slow down the processor to reduce my power consumption, but this is an inelegant solution that requires me (the programmer) to ensure that my software will always live happily within the limits of the slower speed. I'd prefer a more dynamic approach, where the code itself will switch between low-power and high-performance modes as the workload demands. I don't know how to do that.

Sleep During Idle: Luckily, I have a friend who is experienced with low power embedded systems. He suggested that I could save a lot of power by putting the processor to sleep during idle periods. Knowing that I'm working with an ARM processor (the heart of the Teensy board), he suggested that I insert the ARM-specific command asm(" WFI") into my main loop (yes, you do need the space before the W). When the processor hits this command, it'll go to sleep and consume less power. It'll stay asleep and "Wait For Interrupt" (hence "WFI") before waking to resume its work.

Wait for Interrupt: Since I don't want it to sleep forever, I need to make sure that there is an interrupt (such as a timer) that'll wake the system periodically so that it can do its work. As I don't know how to do this, I have a problem. Luckily, I know that the Teensy Audio Library already uses interrupts to do its work. I know that it fires at least one interrupt every 128 audio samples. As any interrupt will trigger the WFI to wake up, I don't need to do anything more. I can just add the WFI command and rely upon the Audio Library to wake the system. The system will automatically go back and forth between sleep and wake. One line of code -- what a simple solution.

It Works! After adding the WFI command (my code is here), I measured the current draw again. The WFI command works! At the default speed of 180 MHz, the system now draws only 57 mA, instead of 91 mA as seen before. That's a 34 mA savings with no impact on system performance. Fantastic.

Battery Life: By adding the WFI command, power consumption has dropped to 2/3 of its original value. My system will now have 50% more battery life. Using my 350 mA-hr battery, I'll now get 6 hours of life instead of 4 hours. That's a great improvement for adding just a single line of code.

More Power Savings? While increasing my battery life by 50% is great, I went back to my friend asking if more savings could be had so easily. He said that I could save more power, but not nearly as easily. He said that the next step would be to identify and disable peripherals that I don't need. But, to do this, I'd have to read through and understand my processor's datasheet, which is not an easy task for a newbie such as myself. So, for now, I think that I'll be content with the easy savings provided by the WFI command.

For Fellow Nerds...My Raw Data: For anyone who is really interested, my raw data is in the table below (or here). It is interesting that, even at very slow processor speeds, I still see power savings when using the WFI command. Clearly, my "Gain Only" audio processing algorithm requires very little effort from the processor. That Teensy 3.6 sure is fast!

Sunday, November 20, 2016

A Teensy Hearing Aid

Hearing aids are totally closed devices -- their inner workings are hidden. Access is limited to only those who work for the hearing aid companies. But if innovation is to accelerate, we need more ideas iterated more quickly. We need more people to participate in hearing aid development. But if the devices are closed, there is no way to try new ideas. So, let's consider the alternative. Let's try to build an open-source hearing aid. Yes, at first, an open-source hearing aid will be absurdly big and ugly. But, you have to start somewhere. I'm going to start here: take one Teensy microcontroller, add some supporting electronics, and VOILA! A Teensy Hearing Aid!

Basic Hearing Aid Elements: While it is unfortunate that they are closed devices, real hearing aids are absolutely amazing pieces of technology. Their minuscule packages are absolutely stuffed with functionality. I, though, am going to start more simply. I will start with some microphones, analog and digital converts, a digital audio processor, some speakers, and a battery. Once this works, I can always add more features later.

Choosing a Processor: For me, the biggest challenge is always with the software. Therefore, I need to choose a digital audio processor that it is easy to program. For me, that means choosing a processor that can be programmed from the hobbyist-friendly Arduino programming environment. Within that universe of processors, I've chosen to use the Teensy 3.6 because it's fast and because it has a nice audio processing library to make it even easier to program for audio.

Supporting Elements: To get audio signals into and out of the processor, the Teensy folks offer an inexpensive Audio Adapter Board that mates directly to the Teensy 3.6. It has the audio ADC and DAC that I need. For the microphones, I'm using a pair of mic breakout boards from Adafruit. For speakers, I'm using whatever headphones or earbuds that I might have on-hand. Finally, for the battery, I'm using a Li-Po battery and Li-Po charger from Adafruit.

Wiring It Up: The figure above gives an overview of how everything was connected together. As you can see, the Audio Board is at the center with everything else connecting to it. The only tricky part of this setup is connecting the battery and charger into the Teensy system. As indicated on the Teensy pinout diagram, you have to cut a trace on the backside of the Teensy so that you can insert the battery connections. Once cut, the Teensy's 5V USB voltage goes to the battery charger and the battery's output goes back to the Teensy's "5V" input pin. Not too bad.

Initial Software: While this post is primarily about the hardware, I did write some basic software in order to see if the hardware is working (see my GitHub repo). I used the Teensy Audio Library to configure the Audio Board to pass the "I2S" inputs (ie, my microphones) to the "I2S" outputs (ie, my headphones). In between, I wrote a simple routine that applies a user-controllable amount of gain to make things louder. I use the blue potentiometer on the Audio Board to set the amount of gain.

First Audio: Of course, the first time that I tried to compile the code, my software didn't work. Does anyone's code ever work the first time? After some iteration, I finally got it to compile and upload. Putting on my headphones, I could hear the audio being picked up by the microphones. Turning the blue pot, I could control its volume. Hardware knobs are so satisfying. My favorite part, though, is being able to use the on-board battery so that I can move around freely. Very fun.

Limitations: Sure, my home-brewed "hearing aid" is ridiculously large -- no one (including me) would ever wear this around in everyday life. But, unlike commercially-available hearing aids, my device is open. Anyone can modify it to make it better. Anyone can try out their own audio processing algorithms to try to improve one's hearing. Will any of us beat the professional hearing aid algorithm designers? Probably not. But, for me at least, I will surely learn a lot by trying a few of the standard approaches. And, I'll get to look really cool sharing pictures of myself with this (not so) Teensy Hearing Aid. Happy hacking!

Follow-Up: I measured the power consumption of my Teensy Hearing Aid. Using the "WFI" command, I found a super-easy way to increase the battery life by 50%. Wow! See my post here.

Follow-Up: I measured how the volume control affects the headphone ouput. See here.

Tuesday, October 25, 2016

Teensy Audio over USB

In my last post, where I sent my first analog audio through a Teensy, I also discovered that it has the ability to exchange digital audio over its USB link. Whoa. This means that, if I'm developing some snazzy new audio-processing algorithm, I can test it 100% digitally without adding any complications of converting back and forth through analog signals. That is a really powerful capability that'll greatly simplify my audio development. Let's see if it really works!

Discovering USB Audio: I only discovered that the Teensy might be able to do USB Audio while using their web-based graphical tool. This is a tool that the Teensy folks ahve created to help people work with their Audio Library. While using that tool, I noticed that there were input/output blocks named "USB". Does this mean that Teensy can exchange audio over USB?!? Really? That's a pretty advanced feature for an electronics board aimed at hobbyists! Does it work?

A Simple USB Audio Chain: To give it a try, I configured an audio chain as shown below: one stereo USB input connected to one stereo USB output. The left channel is just connected straight across -- no processing. For the right channel, though, I've inserted a Biquad filter block just so that I can confirm that the Teensy is indeed doing something to the audio. It's a pretty simple setup.

It Didn't Work: When I exported the audio chain above, loaded it onto the Teensy, and gave it a try, it didn't work. I got silence. I don't know why. So I started messing around...

Making it Work: To get the USB Audio to work, I found that I had to engage the audio board's hardware in some way. One way to make it work is shown below: add a SGTL5000 block for the Audio Board, and add an I2S block to be the output of the Audio Board. I connected the I2S output to be just like the USB output. Now, whatever the Teensy is sending out via USB will also be present at the headphone jack. Sure, it seems like an unnecessary complication if I just want to use USB for my audio, but it's kinda cool that it's so easy to output to multiple targets simultaneously.

Completing the Arduino Code: The graphical web-based tool shown above doesn't produce a complete program to load onto the Teensy -- it just gets you started. When I hit the "Export" button, it gave me all of the code to setup the hardware and make the connections, but I still needed to write the last bit of code to make it run. As usual, I did my coding in the Arduino IDE because it's easiest.

My Code: The screenshot below shows the Arduino sketch that I used for this demo (it's on my GitHub here). At the top is the big block of code that was exported from the web-based tool. Easy. Then, at the bottom, I added the setup() and loop() functions. As you can see, the setup() function configures the SGTL5000 audio board and it sets the biquad filter to be a low-pass filter at 500 Hz. The loop() function does nothing.

Tell the Compiler about USB Audio! The only trick to using the USB audio is that you need to tell the compiler that you want the Teensy to run in USB Audio mode. So, prior to compiling and uploading this Arduino sketch, go under the "Tools" menu, click on "USB Type" and select "Audio". Notice that there are many other choices, including "Serial", "Keyboard, "MIDI", "Audio" and "No USB". Normally, one uses "Serial" so that Serial.println() is able to print messages to the Serial Monitor. But, for this demo, I want "Audio".

Loading onto the Teensy: Once you change the USB setting to Audio, you can compile and upload to the Teensy. The first time you do it, it'll probably upload to the Teensy without any issue. Great! But, once it is in USB Audio mode, future programs won't appear to upload automatically. Why? Well, once the USB link set for Audio, the Teensy can't get reprogramming commands over that same USB link. What's the solution? After the Arduino IDE has finished compiling, simply press the Teensy's reset button (picture below). After a reset, the Teensy knows to check in with the PC prior to switching over to Audio mode. It will automatically get any new program waiting for it. It's a small extra step, but I didn't find it to be a problem at all.

Testing Using Audacity: Once the Teensy is in USB Audio mode, the Teensy appears to your PC as if it is a soundcard. So, you can now send and receive audio data to it using any audio recording program. For this demo, I used Audacity because it is free.

Configuring Audacity: With the Teensy attached, my computer now has two soundcards: (1) its built-in soundcard and (2) the Teensy. Once Audacity launches, I configured Audacity to use the Teensy as the soundcard instead of the built-in one. The screenshot below shows the five settings that I changed in Audacity. On my computer (Windows 7), the Teensy cryptically appeared as "Digital Audio Interface (2-Tee" [sic], which wasn't as quite helpful as informative as it could have been, but it was good enough for me to choose the right item.

First USB Audio: For this test, I'm want to send audio out over USB (to the Teensy) and record audio coming from USB (from the Teensy). So, I need to start by creating a test signal to send out over USB. Under the "Generate" menu, I chose "Chirp" and made a linear frequency sweep from 100 Hz to 4000 Hz. This appeared as a single mono track in Audacity. That's all the preparation that I needed to do. Now, I simply hit the red record button. Audacity started playing out the chirp audio to the Teensy while recording the audio coming back from the Teensy. The result is shown in the screenshot below.

Success! The screen shot shows two audio files. Both files are shown in "spectrogram" view (60 dB dynamic range). The top file contains the original (mono) frequency sweep generated by Audacity. This is what was sent to the Teensy over the USB Audio link. The bottom file shows the stereo audio recorded from the Teensy. The left channel had no processing being performed by the Teensy and it does indeed appear to be a copy of the original signal. The right channel, though, was being processed by the Teensy by its 500 Hz low-pass biquad filter. As can be seen, the high frequencies are attenuated -- the gentle roll-off is as expected from a biquad filter. This was amazingly easy!

Future Use: While I'm very impressed that this capability exists at all, it's not perfect. In additional testing, I occasionally found hiccups or other artifacts in the audio stream coming back from the Teensy. But, most of the time, this USB Audio link worked really well. It'll be great for debugging my audio projects. Anytime I have any question about the negative effects of my analog components, I can simply switch over to USB Audio to avoid the whole problem. It'll be a great tool to have in my audio hacking toolbox.

Sunday, October 23, 2016

Teensy Audio Board - First Audio

After having good success with the speed of the Teensy products for doing audio-relevant operations, I've decided that it's now time to start playing with actual audio signals. To do this, I need some sort of interface for getting analog audio signals into and out of the Teensy. Luckily, Teensy makes an audio board just for this purpose. Today, I'm going to give the Teensy Audio Board a try.

Buying The Parts: The first step of a new project is always fun: buying the parts. I go to Adafruit or Sparkfun, I see what they have in stock, and I order all the stuff that I need. This is the part of the project where the vision and dream fills one with excitement, without being tempered by any of the actual hardware and frustration that comes with building and trouble-shooting. And if that wasn't good enough, in a few days, you get a box full of shiny things!

For this project, I used a Teensy 3.2, a Teensy Audio Board, some 14-pin female headers for the Audio Board, a 10K pot to fit into the "volume knob" spot on the Audio Board, and a 3.5 mm stereo audio jack.

Assembling the Audio Board: This part of the project was pretty straight-forward. Simply solder on the female headers, solder on the 10K pot, and solder some wires between the "Line-In" holes on the Audio Board and the 3.mm audio jack.. As you can see in the photo below, I could have done a better job with the audio jack, but it's good enough for this project. (And thanks to Ray for giving it a nice sturdy plastic base!)

Assemble with the Teensy: After finishing the assembly of the Audio Board, I turned to the Teensy itself. All that is needed is to solder on some male pin headers. Then, the Teensy can be connected to the Audio Board.

Audio Software: To use the Teensy with the Audio Board, you some software that tells the Teensy how to interact with the Audio Board. Because audio programming can be tricky, PJRC (the Teensy folks) have written a pretty extensive audio library. It's pretty impressive. I chose to use this library for my initial trials. You can get the library from their GitHub repo (as described here) or, even easier, it was probably already installed on your computer when you first installed the drivers for the Teensy itself (via the "Teensyduino" installer).

Audio System Design Tool: The hardest part of doing something new is getting started. Where do I begin? Well, PJRC knows that this is a hurdle, so they help you get started by providing a web-based GUI for configuring your audio processing. The idea with this GUI is that it will help you build your software for the Teensy. The Teensy doesn't know about the Audio Board -- you have to tell it everything. So, as you can see in the screenshot below, I started by dragging in a module called "sgtl5000". The SGTL5000 is the audio codec at the heart of the Teensy Audio Board, so by using the module, the Teensy will now know about the Teensy Audio Board.

Line-In Pass-Through:  The next step is to configure the audio path that I'd like. For this demo, I simply want to pass audio from the Line-In input to the headphone output. From the Teensy's perspective, the Audio Board will provide these signals to the Teensy over the Teeny's I2S bus. So, in the GUI, I dragged in an I2S input module and I dragged in an I2S output module. Finally, I drew lines connecting the inputs to the outputs (the two lines represent the left and right audio channels). And that's it!

Exporting to the Arudino IDE:  This GUI is merely a tool to help you get started with configuring your audio processing chain. It doesn't compose all of the software for me. Instead, I need to export this audio configuration and get it into the Arduino IDE, where I can complete the programming and put it on the Teensy. So, I clicked on the GUI's "Export" button, at which point it pops up a window with a bunch of Arduino code ready to be copied-and-pasted into a new Arduino sketch.

The GUI-Provided Arduino Code:  The figure below shows a screen-shot of my completed sketch in the Arduino IDE. The first part of the sketch is the code provided by the web-based GUI tool. This code imports the required libraries, instantiates the required audio-related objects, and connects them together per the drawing that I made in the GUI.

Completing the Software: The second half of the program is the code that I had to write myself. I had to write the "setup()" and "loop()" functions that are the core of every Arduino sketch. As you can see in my "setup()" function above, I do a couple of things: (1) I allocate some memory for the audio library to use and (2) I issue a few commands to configure the settings within Audio Board. I stole all of this code from example programs in the Audio library. After the "setup()" function, I wrote the "loop()" function. Based on the example code, I don't need anything here. Once properly configured, the Audio Library handles everything in the background. My code is on my GitHub here.

First Audio Testing: For my first audio test, I simply want to inject an audio signal into the Teensy and record the audio signal that it produces in response. Since the Teensy was programmed to pass the audio through without any manipulations, the output should be the same as the input. Let's see!

Setup: I compiled the code and loaded onto the Teensy. To generate a test signal, I used my computer. I used Audacity to make a linear frequency sweep (a "chirp") and played it out of the computer's headphone jack. I connected the headphone output of my computer to the Line-In jack on the Teensy Audio Board. I recorded the audio output of the Teensy Audio Board by connecting its headphone output to a Roland R-05 handheld audio recorder that I happen to have. I hit record on the Roland and I hit play in Audacity and, like MAGIC!, I found that I had audio passing through the Teensy. Yay!

First Data: After recording the audio chirp with the Roland, I transferred the audio file over to the PC and opened it in Audacity (available here). As can be seen in the screen shot below, it does indeed look like a chirp, which is good. I also see, however, that it is not a pure chirp signal -- I see some unexpected lines in the spectrogram. To me, these spurious lines look like harmonic distortion and some sort of aliasing. These undesirable signals are not very loud compared to the primary signal, but they are clearly present. So, while I'm pleased with my initial success with getting this Teensy-based audio system to work, I also see that there is some work ahead to optimize the audio quality.

Next Steps: My next steps are to investigate other features of the Audio library (the audio library provides for USB-based audio transfer!?! That's pretty advanced!) and to try the Audio Board with the Teensy 3.6 instead of the Teensy 3.2. There's fun times ahead with audio hacking!

Follow-Up: I made my own Teensy Hearing Aid using a Teensy and a Teensy Audio Board. It's fun!

Saturday, October 8, 2016

Benchmarking - Teensy 3.6 is Fast!

Fast but easy to program -- that's what I need for my audio processing projects. In my previous post, I loved the speed of the NXP K66 processor, but was frustrated with the NXP development environment. Well, thanks to PJRC and Kickstarter, the solution has arrived...via US Mail. Welcome to the new Teensies! Based on my tests, it looks like audio hacking just got a whole lot more fun.

Teensy? Teensy is a line of microcontroller boards that are much more capable than the standard Arduino boards, but that can still be programmed using the friendly Arduino development environment (the "Arduino IDE"). I've used one of the existing Teensy boards (Teensy 3.2) on a number of projects and it definitely achieves its goal: it's a powerful, fast little board that is really easy to program.

New Teensies! Through just-completed Kickstarter, Teensy has expanded its line of boards by adding the Teensy 3.5 and Teensy 3.6. As can be seen in the table below, the Teensy 3.5 and 3.6 have a faster clock speed and have more RAM. The specs on the Teensy 3.6 match by my loved-and-hated FRDM-K66F board from NXP. The increased capabilities of the Teensy 3.5 and 3.6 relative to the older Teensy (and relative to Arduino) should really help in computationally-heavy tasks, such as processing audio.

Floating-Point Prowess. For me, the best feature of the new Teensies is the inclusion of a floating-point unit (FPU), which will accelerate calculations with floating-point numbers. This will makes it much easier to hack together audio processing algorithms. I loved the performance of the NXP FRDM-K66 board because its FPU made it stunningly fast on floating-point FIR and FFT operations. I'm hoping that the Teensy 3.6 is able to match the speed of the FRDM-K66. If it can, I can leave behind the FRDM-K66 board and never again have to struggle with NXP's development environment.

Programming the New Teensies. To evaluate the new Teensy boards, I'm going to re-run my FFT benchmark tests (on GitHub here). First, I had to get the latest "Teensyduino" software from PJRC (to get support for the new boards into the Arduino IDE). Then, I simply opened up my FFT benchmarking sketch and recompiled. It was previously written for the Teensy 3.2 and now, without any changes, it compiled successfully for the Teensy 3.5/3.6. This is the ease-of-use that I was hoping for!

Speed Results. I ran my bechmarking tests on the new boards and complied the results in the table below. It shows how many FFTs each board could complete per second. It's my raw data. It is hard to see the interesting trends in this table, so just go ahead and jump over it...I'll present something simpler in a moment...

Which Board is Fastest? Below is a bar chart comparing the speed of each board for a 128-point FFT. I shows the speed when using Int32 data types and when using Float32 data types (the two most likely used in audio processing). Two results stand out:

The Teensy 3.6 is the overall winner, even beating the FRDM-K66F. While that is a bit inexplicable, it's great!
The Teensy 3.5 and 3.6 are great for floating point calculations. Look at how big those red bars are! Clearly, the FPU in these newer boards makes a huge difference.

How Fast Do I Need? So the bars in the graph above are big. How do I know what bars are big enough? How do I know which are fast enough for audio processing? Well, like in my previous post, I want to know which boards can keep up with calculations that need to be done at audio rates. In this case, I want to know which boards can do FFT operations at audio rates. What does that mean?

FFTs for Audio Processing. FFTs are interesting for audio because it means one can process audio in the frequency domain, which can be very convenient. To do frequency-domain processing, you need to do an FFT to get into the frequency domain and you need to do an IFFT (which is basically the same computational load as an FFT) to get back out of the frequency domain. Then, because of windowing and other blah-blah-blah, you generally overlap your audio data blocks by 50%, which means that each audio block needs four FFT-like operations. Can this get done in real time?

Keeping up with Audio. Because I measured how many FFTs each board can do per second, I can estimate the maximum audio sample rate (samples per second) that each board can support while still doing four FFTs per data block. A typical sample rate used for audio is 44 kHz. My results below show which boards can sustain at least 44 kHz (shown in green) versus those that cannot (shown in red).

Integer vs Floating-Point Results. As can be seen in the table, the Teensy 3.2 is fast enough for audio if I'm using integer data types (Int16 or Int32). It is not fast enough to use Floats. The new Teensy 3.5 and 3.6 are faster than the Teensy 3.2, especially on Floats. The Teensy 3.6 just screams on Floats -- it has enough processing power to sustain a sample rate over 300 kHz for frequency-domain processing. Wow. This is exactly the result that I was hoping to see.

Conclusion. For my audio processing projects, it looks like the Teensy 3.6 is the best choice. It's really fast, yet it can be easily programmed through the comfortable and friendly Arduino IDE. I'm very pleased.

Next Step: My next step is to join the Teensy 3.6 to an audio interface (such as the Teensy Audio Board) so that I can leave these synthetic benchmarks behind and start playing with actual audio. It's gonna be fun!

Extra Credit: How much faster are the new Teensies vs the Teensy 3.2? The Teensy 3.6 is 16x faster than the Teensy 3.2 on Floats. Dang, that's fast. Again, this must be due to the FPU in the new units.

Follow-Up: I made my own Teensy Hearing Aid using a Teensy 3.6 and a Teensy Audio Board. It's fun!

Wednesday, September 14, 2016

Benchmarking - FFT Speed

My goal is to find a good microcontroller board for doing audio processing. Speed is a very important concern so, in my last post, I looked at the speeds for different boards when doing FIR filters. While time-domain FIR filters are an important audio processing task, I am also curious how suitable these boards are for frequency-domain processing. In other words, I need to know how fast they can do FFTs. Let's find out!

Why FFT? An FFT is a "Fast Fourier Transform", which is not a helpful name if you're not already familiar with the idea. FFTs are most often used when one wants to look at the frequency content (the spectrum) of audio data. Intriguingly, you can also do an inverse FFT (an "IFFT") to convert that frequency spectrum data back into audio data. By pairing the FFT with the IFFT, therefore, one can now manipulate (or mangle!) audio data in the frequency domain, which can be a more natural and easy way to construct one's audio processing algorithms. They key to it all is the FFT (and its computationally similar IFFT).

Microcontroller Boards: To understand which hardware might be capable of this frequency domain approach to audio processing, I'm evaluating a range of different boards. In addition to spanning a range of clock speeds (16 MHz - 180 MHz), they also vary in other computationally important ways: one is 8-bit while the others are 32-bit, some have DSP extensions while others do not, and one has a floating-point unit (FPU) while the others do not. Though my testing, I'll see what's important.

My Test Software: For all of the boards except for the FRDM-K66F, I used the Arduino IDE to write my test software. It's a simple program that does an FFT on dummy data of a given length. The program uses Arduino's "micros()" command to measure the time to complete a fixed number of FFTs. Easy. For the FRDM-K66F board, which cannot be programmed from the Arduino IDE, I had to use NXP's IDE (Kinetis Design Studio). The FFT functions were identical, however, regardless of which IDE I used. All of my software is available on my GitHub (for Arduino, for Kinetis).

KissFFT Function: The only difficult part of the software is the FFT function itself. Since I wanted to test across a variety of hardware, I wanted to start with an FFT routine that was written in generic C. From the FFTW website, I found an interesting comparison of different FFT routines. From their list, one of the most generic routines appeared to be the "KissFFT". After downloading it, I refactored the KissFFT code to enable different data types (Int16 vs Int32 vs Float32) and to remove the dynamic allocation of memroy via malloc(). With these changes, it was much more suitable for use on a microcontroller.

Raw Results: The raw results of my speed tests are here, from which I made the summary table shown below. The table shows the number of FFTs per second that each board could complete. I performed the tests for different data types (Int16, Int32, Float32) and for different FFT lengths (N=32 to N=512). This table is too dense to read, so let's skip ahead.

Speed vs FFT Length: First, let's look at the overall flow of the data. The plot below shows the FFT speed for different FFT lengths. For FFTs using more data points, I would expect slower performance. This graph confirms that expectation. Good. This graph also shows that the relative ranking of the different boards is the same, regardless of the length of the FFT. This allows me to greatly simplify the rest of the plots.

Speed for Integer Data: As the length of the FFT doesn't matter for the relative rankings, I chose to focus on an FFT size of 128. I chose this value because, when operating on audio data sampled at 44100 Hz, N=128 yields a frequency resolution of 344 Hz, which is a reasonably useful value. The graph below compares the speeds of the different boards for doing 128-point FFT on Int32 data. The Arduino Uno didn't have enough memory to complete this test, so it is excluded. Otherwise, I see that the Arduino M0 is the slowest and that the FRDM-K66F is, by far, the fastest. The only surprise in this data is the slowness of the Arduino M0. It is much slower compared to the Teensy or Maple than is expected based on their relative clock speeds. My data shows that Teensy can do ~4x the number of FFTs even though its clock speed is only ~2x higher. Clearly the M0 is not optimized for these kinds of calculations.

Can They Do Floating Point? For audio processing projects, I hope to do all of processing using floating point data types (Float32). Being able to utilize floating point math (Float32) instead of fixed point math (Int16 and Int32) makes the algorithms much easier to design, debug, and optimize. The difficulty is that microcontrollers tend to be very slow with floating point calculations. So, let's do some tests with Floats and see if any of my boards are fast enough.

FFT Speed for Float32: The graph below shows the FFT speeds that I measured using Float32 audio data. As can be seen by the red bars, the M0, Maple, and Teensy are very slow on Floats. They can only do 1/3rd as many Float32 FFTs and they can Int32 FFTs. That is a major speed penalty. The major exception to this trend is the FRDM-K66F. It is actually faster using Float32 than Int32. Presumably, this is due to the float-point unit (FPU) included in the K66F chip. Even with the FPU, I did not expect it to be faster on Floats than Ints. Very surprising.

DSP Acceleration: Because an FFT is such a common digital signal processing (DSP) task, some processors include internal features to accelerate this kind of math. The Teensy 3.2 and the FRDM-K66F are both based on the Arm Cortex M4 processor core. The Cortex M4 includes DSP acceleration. I learned that I can invoke the ARM's DSP accelerators by calling the FFT functions from ARM's "CMSIS" library. It took me some effort to figure it out, but I eventually got it to work. And, boy, did it work well...

FFT Speed Using CMSIS: My results using the CMSIS library are shown below. Using the CMSIS routines (and the underlying hardware acceleration) really speeds up the FFTs.

The effect of the CMSIS / DSP accelerators is so dramatic that I quantified the improvement in the table below. The speed improvement is different for the different data types. Using Int16 data, the CMSIS routines are 4-5x faster than the Generic C "KissFFT" routines. Wow. For the In32 data, CMSIS is 3x faster and for Float32 data, CMSIS is 2x faster. This extra speed is definitely a good reward for the effort spent to figure out how to use the CMSIS library.

How Fast is Fast Enough? I want to know which boards are fast enough to enable frequency-domain processing of audio signals. As I discussed earlier, frequency-domain processing requires that I perform an FFT to get into the frequency domain and then an IFFT (which takes the same amount of time as an FFT) to get back out of the frequency domain. Given my measured FFT speed values, I can estimate the maximum audio sample rate that each board can sustain:

max_sample_rate_Hz = (FFTs_per_second * N_FFT / 2) * (1 - overlap)

In this equation, the "2" accounts for the need to do both an FFT and an IFFT. The "overlap" term accounts for the fact that most people do frequency-domain processing using blocks of audio samples that overlap by 50% (0.5) in order to smooth out any artifacts at the ends of the audio blocks.

Maximum Sample Rate for Frequency-Domain Processing: Using this equation, assuming an N=128, and assuming an overlap of 0.5, the table below shows the maximum sample rate that each board can support for frequency-domain processing. I've highlighted in green those boards that can sustain this kind of processing at sample rates appropriate for audio (ie, greater than 44100 Hz).

Conclusion: If my goal is to do frequency-domain processing using Float32 data, only the FRDM-K66F is fast enough. That is my primary conclusion. My secondary conclusions are that, if I use Int32 data, the CMSIS acceleration means that the Teensy 3.2 is a viable option. Furthermore, if I can tolerate Int16 data, I can choose from the Maple, Teensy, or K66. But I don't want to do that. I want to use Floats. So, the K66F is for me.

Looking Forward: While the FRDM-K66F board is very powerful for doing FFT operations, it is difficult for me to program. I prefer the simplicity of the Arduino IDE, yet the FRDM-K66F is not supported by Ardiuno. Looking forward however, the folks who do Teensy are about to release the "Teensy 3.6", which uses the same (or similar) processor as is used in the FRDM-K66F. Since Teensy is programmable from the Arduino IDE, I am hoping that we'll soon have the power of the K66F combined with the ease-of-use of the Teensy. That will be a truly winning combination for open source audio processing.

Tuesday, September 6, 2016

Benchmarking - FIR Filtering

I want to do real-time audio processing. And, I want to do it using small electronics without having to fight with an operating system like Windows or Linux. Most of my previous experience with electronics hacking has been with Arduino (or Arduino-like) platforms. Audio processing, however, is pretty computationally demanding. I need to find an easy-to-use board that is fast for typical audio tasks. So, as a first step, I'm going to get a bunch of different boards and see which can do FIR filtering at audio rates. Let's see which boards are up to the task!

Top: Arduino Uno, Arduino M0, LeafLabs Maple. Bottom: Teensy 3.2 and FRDM-K66F.

The Competitors: I chose to test six different boards -- many of which I already had kicking around the house. The six boards that I tested are summarized in the table below. As you can see, I tried everything from the lowly Arduino Uno up to the mighty Teensy 3.2 and the even-mightier FRDM-K66F. While the FRDM-K66F is a bit obscure, I'm using it as a proxy for the up-coming Teensy 3.6, which uses the same K66F chip. Its fast 180 MHz clock speed and its floating point unit (FPU) should make the FRDM-K66F / Teensy 3.6 great for processing audio.

Why FIR Filters? If you want to manipulate the frequency content of an audio stream, you need a filter. Boosting the bass? Cutting some mids? Boosting the treble? Apply the appropriate filter. There are many kinds of filters, typically divided into either IIR filters or FIR filters. I'm not yet ready to dive into the differences between IIR vs FIR, but I tend to prefer FIR filters for their linear phase and unconditional stability (a good discussion of FIR filters is here and here). Regardless, FIR filters are a good, basic audio processing task that make for a widely-applicable benchmark.

Lots of Multiplies and Adds: The challenge with FIR filters is that they can require a processor to do a lot of computation -- a lot of multiply and addition operations. The finer the frequency resolution desired, the more multiplies and adds are needed to do the filter. The resolution of an FIR filter scales with the length of the filter (the "N" of the filter). As a simple rule-of-thumb, an FIR filter's workload scales as:

N_FIR = sample_rate_Hz / freq_res_Hz; //approximate
num_multiplies = N_FIR * sample_rate_Hz; //same for num_adds

For example, if you want a frequency resolution of 250 Hz, and if your sample rate is 44 kHz, then you need an FIR filter length N = (44000/250) = 176. To actually filter the audio, you need to apply this 176-point filter to every audio sample in your 44 kHz audio stream. To keep up, your processor will need to do at least (176*44000) = 7.7 million multiplications plus 7.7 million additions per second. That's a lot of work to do! Which of my boards are capable of this?

FIR Software: To test the FIR speed of each board, I used the Arduino IDE and wrote a very naive implementation of an FIR filter (yes, faster results could definitely be achieved, but this test is just trying to get a sense of relative speeds of the platforms). My code uses the Arduino's "micros()" command to measure the time to repeat the FIR filter numerous times. For the K66 board, which could not be programmed through the Arduino IDE (until the Teensy 3.6 comes out!), I had to use NXP's "Kinetis Design Studio" to write the software, but the FIR function itself is the same. All of the code is available in my OpenAudio repository on GitHub.

Results, All Data: My raw data (here) consist of the time required to perform FIR filters on the different platforms. To ease the presentation of the data, I invert the values so that it tells me the number of FIR filters that can be completed per second. In this perspective, a bigger number means it can do more FIRs per second, which is good. My results are shown in the table below. It shows the FIR speeds for different filter lengths (16-256) and for two different data types (Int32 and Float32). Because I find tables difficult to read, let's jump over the table and make some plots that better illustrate the results.

Results, Effect of "N": Longer filters require more computations, so I would expect longer filters to be slower. Using data from the big table above, the plot below confirms that expectation. Also, note that all of the lines show the same slope and that the lines never cross each other. This means that the relative ranking of the different boards stays the same across all FIR filter lengths, which allows me to greatly simplify the rest of the plots.

Results, Speed of Each Board (Int32): Since the relative ranking of the boards stays the same throughout, let's illustrate the relative speed of each board by picking just one filter length. The plot below picks N=128. It shows the speed when using Int32 data. On the left side of the plot, note that the Arduino Uno is very slow on Int32 values. Presumably its 8-bit processor has difficulty with the 32-bit data type. On the right side of the plot, the fastest board is the K66F, which is 100 times faster than the Uno. That's a huge difference!

Results, Floating Point: The below below is the same, but I add in the results for floating point data (Float32). Writing audio processing algorithms using Float operations is much easier than using Int operations, so these are the results that I'm most interested in. As can be seen in the plot, these boards are *much* slower on Floats than on Ints. The major exception to this result is the K66F board, which is basically as fast on Floats as it is on Ints. That's amazing! This result clearly reflects the fact that the K66 is the only processor in this comparison which has an FPU.

How Fast is Fast Enough? Deciding what FIR speed is "fast enough" for audio processing is not a simple question. One approach is to return back to the example at the top of this post: a hypothetical 176-point FIR filter. This filter length was chosen because it would yield a frequency resolution of 250 Hz when run at an audio sample rate of 44 kHz. Which of these boards can support such a long filter at this fast sample rate? Well, by scaling the N=128 speed values to N=176, the table below shows the sample rate that could be sustained by each board. As can be seen the Teensy 3.2 is fast enough for audio processing using Ints. But, if I want to do Floats, only the K66F is fast enough.

Programming the K66F: Unfortunately, the FRDM-K66F board is not programmable from the Arduino IDE. This makes it much harder to setup and debug by non-professionals. This a real hurdle. Luckily, the upcoming Teensy 3.6 also uses the K66 processor. Since the Teensy products are all compatible with the Arduino IDE, it means that the power of the K66 will soon be far more accessible. That's the solution that I'm really looking for. So, I supported the Teensy 3.6 kickstarter. I can't wait to get my Teensy 3.6!

Next Steps: FIR filtering is not the only audio processing task that one might like to do. An FFT is another important type of operation that is computationally intense. In my next post, I'll look at the FFT speeds to see which boards are capable of real-time, frequency-domain processing. Until then, have some happy hacking!

Update: FFT Benchmarking Results are here!