Wednesday, February 15, 2017

Faster Log10 and Pow

After my previous post, which showed a tremendous increase in speed by choosing float-specific math functions (ie, "log10f") versus their generic counterpart (ie, "log10"), I still wasn't satisfied with the speed of my audio processing on the Teensy 3.6. I had some calculations that went into and out of dB space, which means I needed to do lots of log10f(x) and pow(10,x) function calls. I needed them to be faster.  Here's how I did it...

They Should be Faster:  The figure above shows the number of CPU cycles that I measured for each math function.  The red bars are using the standard math function.  What caught my eye was that the log10f() function is slower than the logf() function.  They should only be different by one multiplication (ie, 1-2 cycles).  Similarly, I was surprised that powf(10.0,x) was so much slower than expf(x).  It seemed like I ought to be able to accelerate these functions.

Faster powf():  I started with powf() it was in the most need of acceleration.  Because I did not need a general powf() command that could do any base number, I could optimize specifically for powf(10.0,x).  Through the rules of exponential and logs, I wrote my own macro that executed pow in terms of exp:

//powf(10.f,x) is exactly exp(log(10.0f)*x)
#define pow10f(x) expf(2.302585092994046f*x)  

Note that the reformulation above is not a numerical approximation.  If you had all of the digits for that 2.302 constant, this would be an exact substitution for pow(10,x).  Yet, as you'll see in a moment, it is 3 times faster.  Way faster with no loss in accuracy.  Wow!

Faster log10f():  Similarly, for log10f(), I started with a reformulating log10 in terms of log.  While that was effective, it was only a modest increase in speed.  I wanted it even faster.  So, in a post in the ARM community forum (here), I found that someone (thanks Dr. Beckmann!) had reformulated log10f using log base 2 and then further accelerated it using an approximation for log base 2 that exploits way that single-precision numbers are represented in memory.  It's a pretty neat solution.

So, using this, my complete substitution for log10f() is shown below.  The log2 approximation function is at the end of this post.

//log10f is exactly log2(x)/log2(10.0f)
#define log10f_fast(x)  (log2f_approx(x)*0.3010299956639812f)

Because this reformulation uses an approximation for log2(), it is not an exact substitution.  But, over the range of input values that I explored, the resulting error seemed to be less than 0.05%, which is good enough for my needs.

Faster by 3x!  In looking at the speed of these to reformulated functions, I saw that my log10f approximation was 2.8x faster than the standard log10f().  Similarly, I found that my pow10f function was 3.0x faster than the standard powf(10,x) function call.  That's a pretty nice acceleration!  I'm pleased.

As promised, here's the fast log2 approximation (source):

// This is a fast approximation to log2()
// Y = C[0]*F*F*F + C[1]*F*F + C[2]*F + C[3] + E;
float log2f_approx(float X) {
  float Y, F;
  int E;

  F = frexpf(fabsf(X), &E);
  Y = 1.23149591368684f;
  Y *= F;
  Y += -4.11852516267426f;
  Y *= F;
  Y += 6.02197014179219f;
  Y *= F;
  Y += -3.13396450166353f;
  Y += E;

1 comment:

  1. If this is in reference to your prior post about a dynamic range compressor (, I have been working on some very efficient ways to do the logarithmic processing in a compressor.
    1) a^b is just a*a*a...b number of times.
    2) Compression curve is log-linear, so the final gain function is an exponential proportional to e^-x.
    3) Given the attack & release times on a compressor, you don't have to resolve to log(n) within a single sampling period.
    4) The ratio is the slope of the log of the exponential function, which is a ratio of e^rx/e^x. The slope "r" just makes it a steeper or less steep line in the log domain.

    Strategy can be implemented with a successive approximation routine:
    --Pseudocode every sample--
    target = peak_detect(audio)
    if(tracker < target) {
    tracker *= beta;
    if(ratio_counter == ratio) {
    gain *= beta; //every ratio iterations this is multiplied
    else if (tracker > target)
    tracker *= alpha;
    if(ratio_counter == ratio) {
    gain *= alpha; //every ratio iterations this is multiplied
    if(ratio_counter > ratio) {
    ratio_counter = 0;
    Pre-computed, alpha is a number < 1 and beta is a number > 1, and they correspond to attack and release times respectively.

    Now I found for shorter attack/release times this makes a limit-cycle oscillator about the target when the envelope is not changing as fast as release time. More complexity is needed to make alpha and beta closer to 1.0 as you get close to the target in order to reduce the amplitude of the oscillation to an inaudible level (<60 dBFS min).

    The periodic multiplication can be performed like this as long as you are not picky about being granular with available attack and release times:
    a = (a + a<>N
    "<<" is bit shift left and ">>" is bit shift right.
    Notice this assumes fixed point (integer) math. This probably doesn't save you anything if you're using floating-point numbers.