Saturday, October 8, 2016

Benchmarking - Teensy 3.6 is Fast!

Fast but easy to program --  that's what I need for my audio processing projects.  In my previous post, I loved the speed of the NXP K66 processor, but was frustrated with the NXP development environment.   Well, thanks to PJRC and Kickstarter, the solution has arrived...via US Mail.  Welcome to the new Teensies!  Based on my tests, it looks like audio hacking just got a whole lot more fun.


Teensy?  Teensy is a line of microcontroller boards that are much more capable than the standard Arduino boards, but that can still be programmed using the friendly Arduino development environment (the "Arduino IDE").  I've used one of the existing Teensy boards (Teensy 3.2) on a number of projects and it definitely achieves its goal: it's a powerful, fast little board that is really easy to program.

New Teensies!  Through just-completed Kickstarter, Teensy has expanded its line of boards by adding the Teensy 3.5 and Teensy 3.6.  As can be seen in the table below, the Teensy 3.5 and 3.6 have a faster clock speed and have more RAM.  The specs on the Teensy 3.6 match by my loved-and-hated FRDM-K66F board from NXP.  The increased capabilities of the Teensy 3.5 and 3.6 relative to the older Teensy (and relative to Arduino) should really help in computationally-heavy tasks, such as processing audio.


Floating-Point Prowess.  For me, the best feature of the new Teensies is the inclusion of a floating-point unit (FPU), which will accelerate calculations with floating-point numbers.  This will makes it much easier to hack together audio processing algorithms.  I loved the performance of the NXP FRDM-K66 board because its FPU made it stunningly fast on floating-point FIR and FFT operations.  I'm hoping that the Teensy 3.6 is able to match the speed of the FRDM-K66.  If it can, I can leave behind the FRDM-K66 board and never again have to struggle with NXP's development environment.

Programming the New Teensies.  To evaluate the new Teensy boards, I'm going to re-run my FFT benchmark tests (on GitHub here).  First, I had to get the latest "Teensyduino" software from PJRC (to get support for the new boards into the Arduino IDE).  Then, I simply opened up my FFT benchmarking sketch and recompiled.  It was previously written for the Teensy 3.2 and now, without any changes, it compiled successfully for the Teensy 3.5/3.6.  This is the ease-of-use that I was hoping for!

Speed Results.  I ran my bechmarking tests on the new boards and complied the results in the table below.  It shows how many FFTs each board could complete per second.  It's my raw data.  It is hard to see the interesting trends in this table, so just go ahead and jump over it...I'll present something simpler in a moment...


Which Board is Fastest?  Below is a bar chart comparing the speed of each board for a 128-point FFT.  I shows the speed when using Int32 data types and when using Float32 data types (the two most likely used in audio processing).  Two results stand out:
  • The Teensy 3.6 is the overall winner, even beating the FRDM-K66F.  While that is a bit inexplicable, it's great!
  • The Teensy 3.5 and 3.6 are great for floating point calculations.  Look at how big those red bars are!  Clearly, the FPU in these newer boards makes a huge difference.

How Fast Do I Need?  So the bars in the graph above are big.  How do I know what bars are big enough?  How do I know which are fast enough for audio processing?  Well, like in my previous post, I want to know which boards can keep up with calculations that need to be done at audio rates.  In this case, I want to know which boards can do FFT operations at audio rates.  What does that mean?

FFTs for Audio Processing.  FFTs are interesting for audio because it means one can process audio in the frequency domain, which can be very convenient.  To do frequency-domain processing, you need to do an FFT to get into the frequency domain and you need to do an IFFT (which is basically the same computational load as an FFT) to get back out of the frequency domain.  Then, because of windowing and other blah-blah-blah, you generally overlap your audio data blocks by 50%, which means that each audio block needs four FFT-like operations.  Can this get done in real time?

Keeping up with Audio.  Because I measured how many FFTs each board can do per second, I can estimate the maximum audio sample rate (samples per second) that each board can support while still doing four FFTs per data block.  A typical sample rate used for audio is 44 kHz.  My results below show which boards can sustain at least 44 kHz (shown in green) versus those that cannot (shown in red).


Integer vs Floating-Point Results.  As can be seen in the table, the Teensy 3.2 is fast enough for audio if I'm using integer data types (Int16 or Int32).  It is not fast enough to use Floats.  The new Teensy 3.5 and 3.6 are faster than the Teensy 3.2, especially on Floats.  The Teensy 3.6 just screams on Floats -- it has enough processing power to sustain a sample rate over 300 kHz for frequency-domain processing.  Wow.  This is exactly the result that I was hoping to see.

Conclusion.  For my audio processing projects, it looks like the Teensy 3.6 is the best choice.  It's really fast, yet it can be easily programmed through the comfortable and friendly Arduino IDE.  I'm very pleased.

Next Step:  My next step is to join the Teensy 3.6 to an audio interface (such as the Teensy Audio Board) so that I can leave these synthetic benchmarks behind and start playing with actual audio.  It's gonna be fun!

Extra Credit:  How much faster are the new Teensies vs the Teensy 3.2?  The Teensy 3.6 is 16x faster than the Teensy 3.2 on Floats.  Dang, that's fast.  Again, this must be due to the FPU in the new units.


Follow-Up:  I made my own Teensy Hearing Aid using a Teensy 3.6 and a Teensy Audio Board.  It's fun!

5 comments:

  1. Hello. Very nice series ! I was looking for a K66F review with possible audio processing. You make my day ! Thank you :). I just bought FRDM-K66F because I am working every day with Kinetis Design Studio. So it is not a big deal for me to use it.

    Also, about FRDM-K66F and Teesnyduino IDE, it is possible to mix it ! Teensyduino output at compilation time a .hex file. Then you can flash to FRDM board with Jlink commander or other free debugger tools. Out of the box, it is a two step process but with some tweak, it might be possible to make it in one time through Teesnyduino IDE :).

    Keep it up audio processing !

    Max

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. Hello, what lib do you use to do fft/ifft ?
    do they return real/img part ?

    regards

    ReplyDelete