Friday, January 19, 2018

Measuring Audio Latency

For real-time audio processing, it is often important to minimize the delay between audio coming out of the system compared to the audio coming into the system.  This is especially true of hearing aids, where too much latency can cause the listener to perceive an echo (or have a comb-filtering effect), which end up degrading rather than helping the listener's experience.  Research suggests that the maximum tolerable latency in a hearing aid is only 14-30 msec.  If Tympan wants to be helpful, we need to make sure that we keep our latency shorter than this.  Let's find out what our system does!


Test Setup.  To measure the latency of the Tympan, we need to inject an audio signal into the Tympan and then measure delay of the output audio relative to the input audio.  There's lots of ways that this can be done.  I chose to use the setup shown in the picture above.  In this setup, I generate my test audio signals from my PC.  I use a series of 1 kHz tone bursts that are 1 second long.  The audio comes out of my PC and (as shown in the red line) is split so that it goes to the input of the Tympan *and* to the input of an audio recorder.  The output of the Tympan is then (as shown in the blue line) routed to the other input of the audio recorder.  The stereo audio file produced by the audio recorder will contain the input audio in the left channel and the output audio in the right channel.  It will then be a simple post-processing analysis to measure the delay between the two channels.


Raw Data.  An example recording and an example Matlab processing script are in my GitHub here.  The plot above shows the example data.  As you can see, there are three tone bursts in the recording.  At this timescale, one cannot see any delay between the signals, but that is because we are not zoomed in enough.  The plot below zooms in to the start of one of the tone bursts.  Here, we definitely see that the Tympan output has a slight delay relative to the direct audio signal.


Analysis.  Using the plot above, we could visually assess the latency between the two channels.  But, to do even better, I included a Matlab script in the GitHub directory that computes the cross-correlation between the two channels.  The best estimate of the latency will be when when the cross-correlation is maximum.  For this recording, the best estimate of the latency is 3.1 msec.  That's nice and short!

Measuring other Tympan Configurations.  Expanding from this single measurement, I then repeated the process and measured the latency for a variety of different configurations of the Tympan.  I tried different audio block sizes and I tried different audio processing algorithms.  My results are shown in the figure below.  The simple 3.1 msec value reported above can be seen in the bottom-left of the plot as the first point on the yellow line.  All of the other configurations result in increased latency, but there are still a lot of configurations that are shorter than the 14-30 msec upper limit from the literature.


Components of the Latency.  After working with the system for a while, I've identified three contributors to the latency:

  • Hardware:  The audio interface for the Tympan is a TI 3206 AIC.  This chip has a pipeline that is 17 samples long on the input and 21 samples long for the output.  So, for this round-trip testing, it contributes 38 samples of latency, which at sample rate of 24 kHz is a latency of 1.58 msec.
  • Audio Library:  The Tympan audio library is based on the Teensy audio library, which has employs a queue of two audio blocks in order to prevent audio hiccups and drop-outs.  In the Tympan library, this audio block size is configurable, hence my ability to make the graph above where I vary the block size.  For a block size of 16 samples, the library's latency is 2*16=32 samples, which is 1.33 msec at 24 kHz.  Totaled with the hardware latency (1.58 msec + 1.33 msec), we get 2.9 msec, which is very close to the 3.1 msec value that I measured.  Great!
  • Audio Processing:  Beyond the audio library, most any actual audio processing will also introduce additional latency.  A typical (symmetric) FIR filter, for example, will result in a group delay that is half the length of the FIR filter.  Other filters and other operations (such as FFT) introduce delays as well.  The specific amount of latency may vary, but some latency is inherent in the math.  For the Tympan, we see the latency for FIR and FFT in the graph above.
Optimizing for Minimum Latency.  From this exploration, I've learned that latency can be minimized by (1) using the shortest audio block size that the system can handle and (2) running the simplest audio processing algorithm that you can get away with.  On this latter point, we all want our audio processing to have extremely fine frequency resolution.  But, high resolution requires long FIR filters or long FFTs.  Long FIRs/FFTs introduce a lot of latency, however.  So, if you want low latency, you need to use the shortest FIRs and FFTs that you can.