Once you have a real-time platform for manipulating audio, it is always fun to see what you can do to your voice. In my case, I had been spending a lot of time figuring out a good way to implement frequency-domain audio processing on the Tympan. Once I did that, I realized that it would be super easy to start having fun with my voice. So, here, I present my Tympan Formant Shifter!
Formants vs Pitch. Like any instrument, your voice has a pitch, which is the fundamental frequency of the sound of your voice. The sound of your voice contains many harmonics, in addition to the fundamental frequency. Those harmonics extend upward far above the fundamental frequency. Which of those harmonics are actually projected from your mouth depend up how your mouth (and nasal passages and throat) are shaped. As you talk, you naturally change the shape of your mouth (and nose and throat) to make the various consonant and vowel sounds. An "E" sounds different from an "A" because your throat/nose/mouth enhance different harmonics for the two different vowels. These enhanced frequency regions that move around -- these are your "formants".
Formant Shifting. In the video, I am only moving the formants. Clearly, the effect on us listeners is that we feel like my voice itself is going higher or lower. But, this isn't the case. The fundamental frequency of my voice and the frequencies of the harmonics are all unchanged. Instead, the formant shifting allows you to hear harmonics of my voice that are higher than your normally hear or lower than you normally hear. The processing is shifting which harmonics are emphasized or attenuated.
Frequency-Domain Formant Shifter. A formant shifter is implemented easily in the frequency domain. Starting from your audio, take an FFT, shift the magnitude of the FFT bins to higher or lower bins (while leaving the FFT phases in their original bins), take the inverse FFT ("IFFT"), and play the audio. In principle that's it!
Real World FFT/IFFT Processing. In reality, of course, implementing the FFT/IFFT processing on a real-time audio stream is more complicated. But, as I said at the top, I took quite a bit of time to try to hide all of these complications. It takes care of the buffering, the windowing, and the overlap-and-add operations.
Tympan Example Code. In the Tympan Library, I wrote an example sketch for Formant Shifting. You can see it here. The underlying audio processing class that does the formant shifting is here (*.h) and here (*.cpp). Finally, for the video at the top, I combined the Formant Shifting with a USB Audio interface so that it would get recorded along with the video from my web camera. You can get this USB Audio enabled version of the code (along with other partly-working goodies) here.More Frequency-Domain Examples. Once I got the whole frequency-domain processing structure in place, I found it fun and easy to implement several other frequency-domain algorithms. In addition to Formant Shifting, I've got Frequency Shifting (but it is only linear shifting, not exponential scaling). I've also got Frequency Compression and Noise Reduction. I totally nerd-ed out. So fun!
No comments:
Post a Comment