Sunday, October 3, 2021

What Triggers Audio Processing?

Using the Teensy or Tympan for audio processing can be very exciting.  It's really fun to open up the example programs, compile them, and listen.  It's also pretty easy to look at the example code to see how the algorithm blocks are created and connected together.  Great!  But, what if you want to make your own algorithm?  That's when start to look a bit more critically at the examples.  Your first thought will likely be: "Wait.  How does any of this audio processing actually get called? How does this crazy structure work?!?"   Yes.  That's a good question.  Let's talk about it.

So Much is Hidden.  The essential problem is that nearly all of the audio plumbing is hidden so that its complexity doesn't scare people off.  For example, look at an extremely minimal audio processing example in the image below.  It instantiates the audio objects and creates the audio connections.  Then, you've got the traditional Arduino setup() and loop() functions.  Note that the loop() function is empty.  This program looks like it does nothing.  Yet, audio does flow.  The audio is made louder by the gain block applying 6 dB of gain.  But how?!?  I see no functions that call into the audio blocks!

What You Didn't Know That You Programmed.  The screenshot above shows the code that you know that you programmed.  There is also a whole bunch of code that you included, however, that you didn't know that you were invoking.  In effect, you programmed some very complex activities and you didn't even know it.

Going Down the Rabbit Hole.  The flow chart below tries to expose some of the hidden code.  This is a map that helps explain some of the hidden underground parts of the Teensy Audio Library (and Tympan Library). 

1) Code Shown in the Arduino Window.  The blocks in blue are the pieces of code that you know that you wrote.  This is the code shown in the Arduino IDE.  Here, you instantiate the audio classes and the audio connections.   Here, you write the Arduino setup() and loop() functions.  This is the part that we can all see and (usually) understand.  For the audio processing, the hidden magic gets invoked behind the scenes by the audio classes.  In particular, the AudioOutputI2S class is the most magical.

2) Audio Class Constructors.  As a bit of background, "I2S" is a communication system built into the Teensy processor that is purposely design to pass sound data (that's the 'S' in I2S) between the processor and the audio input/output hardware.  So, the AudioOutputI2S class handles the passing of audio data out from the processor to the audio output.  If you were to open the AudioOutputI2S class, you would see that its constructor calls its begin() method.  Looking in begin(), you'll see that it configures the I2S bus (which is logical) but it also configures the DMA and it attaches an interrupt to the DMA.  Huh?

DMA. Direct Memory Access is a special way of using memory.  You know how you can drive your car and listen to a podcast at the same time?  Your brain is able to handle certain tasks autonomously in the background without disturbing the foreground thoughts?  The processor has some of the same capabilities.  The processor can allow for certain regions of its built-in memory to be directly accessed by external devices.  In this case, DMA is configured so that the audio output system can read audio data directly from the processors memory without the processor having to respond to a request for each and every sample.  That's DMA.  It happens in the background.

3) Firing an Interrupt (ISR).  When the DMA is set up, it's pointed to a small region of memory.  The region holds a fixed number of audio samples.  Once the I2S bus is commanded to begin pumping data, it starts pulling data from the DMA.  Again, this happens in the background.  As the DMA empties, it'll get low on samples that remain.  In the DMA setup, the DMA has been configured to call a function (an interrupt service routine, ISR) to replenish the data in the DMA.  In the AudioOutputI2S begin() method, a specific function was attached as the ISR.  You didn't know it, but it was.  The ISR is right there in AudioOutputI2S.

Interrupting All Other Activities.  When the ISR is requested, the processor now has to take notice.  It is called an "interrupt" service routine because it interrupts whatever else is happening.  Whatever else the processor is doing (such as looping around in your Arduino loop() function) will be paused while the processor goes off and executes the ISR.  This interruption is done so that we can ensure there is fresh audio data placed into the DMA before the DMA fully empties and the audio stalls.

4) Execute the ISR.  Looking in AudioOutputISR, we see that the ISR copies previously-processed audio data into the DMA.  This keeps the DMA fed so that there are no hiccups in the audio.  Great.  The next thing that the ISR does is command update_all(), which starts the audio processing chain so that processed audio will be available the next time the DMA is running low.  This update_all() is the key.

Doing the Audio Processing.  The update_all() method lives in AudioStream.  AudioStream is the root (parent) class of every audio processing class that you might have instantiated in your Arduino code.  AudioStream has some static data members that, in effect, act as a central location for tracking every audio processing object that needs to be called.  Update_all() simply loops through the list of your audio objects and calls each one's update() method.  

Each Class's Update().  If you open up any audio class, you'll find that there is an update() method.  This is where all the audio processing is done.  Inside the update(), it pulls blocks of audio from its upstream connections and pushes audio blocks out to its downstream connections.  These upstream and downstream connections are known because of you created them in your Arduino code via those AudioConnection lines.

Summary.  So, that's how the audio processing happens on Teensy (and Tympan).  There is the code that you explicitly wrote and then there is all the code that comes along with the libraries, such as the class  AudioOutputI2S.  AudioOutputI2S sets up the I2S bus for passing data to the audio hardware and it sets up the DMA for feeding data to the I2S bus.  Whenever the DMA gets low on data, it fires an ISR.  The ISR refills the DMA and it launches the cascade of update()for all your audio objects.  Because it is interrupt driven, it all happens in the background...and that's why your Arduino code looks so empty.

Phew.  Good work, everyone.

Saturday, October 2, 2021

Formant Shifting with Tympan

Once you have a real-time platform for manipulating audio, it is always fun to see what you can do to your voice.  In my case, I had been spending a lot of time figuring out a good way to implement frequency-domain audio processing on the Tympan.  Once I did that, I realized that it would be super easy to start having fun with my voice.  So, here, I present my Tympan Formant Shifter!

Formants vs Pitch.  Like any instrument, your voice has a pitch, which is the fundamental frequency of the sound of your voice.  The sound of your voice contains many harmonics, in addition to the fundamental frequency.  Those harmonics extend upward far above the fundamental frequency.  Which of those harmonics are actually projected from your mouth depend up how your mouth (and nasal passages and throat) are shaped.  As you talk, you naturally change the shape of your mouth (and nose and throat) to make the various consonant and vowel sounds.  An "E" sounds different from an "A" because your throat/nose/mouth enhance different harmonics for the two different vowels.  These enhanced frequency regions that move around -- these are your "formants".  

Formant Shifting.  In the video, I am only moving the formants.  Clearly, the effect on us listeners is that we feel like my voice itself is going higher or lower.  But, this isn't the case.  The fundamental frequency of my voice and the frequencies of the harmonics are all unchanged.  Instead, the formant shifting allows you to hear harmonics of my voice that are higher than your normally hear or lower than you normally hear.  The processing is shifting which harmonics are emphasized or attenuated.

Frequency-Domain Formant Shifter.  A formant shifter is implemented easily in the frequency domain.  Starting from your audio, take an FFT, shift the magnitude of the FFT bins to higher or lower bins (while leaving the FFT phases in their original bins), take the inverse FFT ("IFFT"), and play the audio.  In principle that's it!

Real World FFT/IFFT Processing.  In reality, of course, implementing the FFT/IFFT processing on a real-time audio stream is more complicated.  But, as I said at the top, I took quite a bit of time to try to hide all of these complications.  It takes care of the buffering, the windowing, and the overlap-and-add operations.

Tympan Example Code.  In the Tympan Library, I wrote an example sketch for Formant Shifting.  You can see it here.  The underlying audio processing class that does the formant shifting is here (*.h) and here (*.cpp).  Finally, for the video at the top, I combined the Formant Shifting with a USB Audio interface so that it would get recorded along with the video from my web camera.  You can get this USB Audio enabled version of the code (along with other partly-working goodies) here.

More Frequency-Domain Examples.  Once I got the whole frequency-domain processing structure in place, I found it fun and easy to implement several other frequency-domain algorithms.  In addition to Formant Shifting, I've got Frequency Shifting (but it is only linear shifting, not exponential scaling).  I've also got Frequency Compression and Noise Reduction.  I totally nerd-ed out.  So fun!

Tympan at High Speed (Ultrasonic!) Sample Rates

While we designed the Tympan as a platform for trying hearing aid algorithms, it's flexible enough to be used for many different audio tasks.  For example, by increasing the Tympan's sample rate, you can see signals above the range of human hearing...to explore ultrasound!  The question is, how high into the ultrasonic range can the Tympan go?


Sample Rate and Nyquist.  The Tympan is a digital audio device; it samples the voltage of a signal at discrete moments in time.  It acquires audio samples at a constant rate, the "sample rate".  If you wish to sense a certain frequency of audio (say 10 kHz), you need a sample rate that is fast enough to capture this frequency.  Thanks to Nyquist, we generally say that the sample rate needs to be at least twice the frequency of the signal that you want to sense.  So, to sense 10 kHz, our sample rate must be *at least* 20 kHz.  Typically, digital audio systems run at 44.1 kHz or 48 kHz so that they can comfortably span the 20 kHz maximum range of human hearing.

//set the sample rate and block size
const float sample_rate_Hz = 48000.0; //for audible sound
const int audio_block_samples = 128

Ultrasound.  For sensing ultrasound, we need to sense frequencies higher than 20 kHz.  Many inexpensive ultrasonic range-finders, for example, operate near 40 kHz.  If we want to explore these signals, we need to increase our sample rate to 80+ kHz.  Some rats and bats make vocalizations that extend up to 80 kHz, so that would require a sample rate of 160+ kHz.  Can the Tympan sample this fast?

Changing the Sample Rate.  Changing the sample rate of the Tympan is easy.  Near the top of every Tympan example, you can see where to change the sample rate.  This example is even called "ChangeSampleRate".  So, that part is easy; simply write in a sample rate that is higher!  The question is whether the Tympan produces useful data when running at these higher speeds.  Let's test it!

//set the sample rate and block size
const float sample_rate_Hz = 96000.0; //for ultrasound
const int audio_block_samples = 128; 

Test Setup.  As shown in the photo at the top of this post, I used a function generator to make a sine wave.  I then ran its signal through an attenuator to make sure that I wasn't overdriving the input of the Tympan.  I used a Tympan RevE and inserted the signal via its pink input jack.  

Tympan Software.  On the Tympan, I used one of the example sketches that records audio to the SD card.  In the code, I made two changes: (1) I told it to record from the pink jack as line-in and (2) I changed the sample rate to whatever I was testing.

Test Method.  For each test, I started the Tympan's SD recording and then I manually turned the knob on the function generator to sweep up through the frequency range.  I then stopped recording, pulled out the SD card, and made a spectrogram of the recording on my PC.  I used Matlab, but you could also use Python or Audacity for your spectrograms.

Results, Clean Audible Signal (fs = 48 kHz).  I started with a known-good traditional audio sample rate.  The figure below shows my frequency sweep when using a sample rate of 48 kHz.  The spectrogram shows that we could see frequencies up to 24 kHz, as expected based on Nyquist.  The spectrogram looks great; the signal is clean and the background noise looks like background noise.  This is what "clean" and "good" look like.

Results, Clean Ultrasound Signal (fs = 96 kHz).  I then turned up the sample rate to 96 kHz and repeated the measurement.  The spectrogram below is the result.  It looks great.  We see our signal up to 48 kHz, as expected.  There's a little bit of aliasing as the input signal continued past 48 kHz (we see that the signal's line in the spectrogram bounces downward a little bit when it hits 48 kHz).  The aliasing stops quickly, so this seems fine.  I think that this spectrogram looks great.

Results, Marginal Quality (fs > 96 kHz).  When I increase the sample rate beyond 96 kHz, the results start to look less good.  Below are the results for 100 kHz, 105 kHz, and 110 kHz.  As you can see, the signal itself looks OK, but strange artifacts start to appear in the background noise.



Results, Bad Quality (fs > 110 kHz).  Finally, by the time we get to a sample rate of 115 kHz, the recorded audio is bad.  Bascially, any sample rate above at 115 kHz and above is unusable.




Conclusion.  The Tympan is good for recording at sample rates up to 96 kHz.  You can maybe even run up to 110 kHz.  But, at 115 kHz and above, your signal will be corrupted.

Improving High-Frequency Performance.  The audio codec used to do the sampling is a very complicated device.  There are many settings and many ways of clocking the device.  It is possible that there is a different combination of settings that would provide good-looking data at sample rates higher than 96 kHz.

96kHz is Still Good!  Luckily, 96 kHz is still a very useful sample rate for ultrasound.  Running at 96 kHz is fast enough to give good access to signals around 40 kHz.  This is a very important region for ultrasound in air.  There are many ultrasonic range finders and motion sensors that operate in the 40 kHz range.  So, you can explore and do many fun things running your Tympan at 96 kHz.  Furthermore, we also know that the Tympan's on-board microphones are sensitive up into this region, so you don't even need any additional hardware to sense the ultrasound!  You can just change the system's sample rate and then go have fun!