Codec Wierdness Part One: the problem with proprietary media encoding
We’ve been working on an application that needs to capture the users voice using the local or client machine’s microphone. Once the sound is captured it needs to be converted into a WAV formated audio file.
Simple right?
Wrong. Very very very wrong.!!!!
We looked at Java implementations of JMF and decided for various reasons that Flash was the way to go. What those reasons were is the topic of another post and something I”d like to revisit in my own brain. It turned out to be a decision with consequences and the jury is out as to whether it was the correct one.
Flash is nice. Its always already installed on the user’s machine so its there and it affords a ready way to connect with the client microphone. That’s a big plus.
So all we have to do is to capture the voice stream and write it to an FLV file. Then we use FFmpeg which will trans-code almost all media files including FLV to trancode to WAV. Right?
The problem? [I'm quoting from another source]
Audio in FLV files is usually encoded as MP3. However, FLV files recorded from the user’s microphone uses the proprietary Nellymoser codec.
Encoding is done i client side, and f the lash player [doesn't] allow us to choose the codec . [We] have no way to record audio streams [using] other codecs.
Whatever RTMP server (FMS, Red5 or other one) the developer has no choice but to use Nellymoser when recording audio through the Flash Player.
And the problem is that they don’t sell or otherwise let you use a decoder without charing a fortune. It makes web development of medium to small project using the Flash voice recording capabilities nearly impossible. This is bad. Its bad for Adobe and its bad for this company — Nellmoser.
Someone will almost surely find a way to circumvent this problem and end up cutting them out of the loop. Adobe is surely could come up with something without relying on this codec and no doubt they will. The bottom line is that conversion of voice captured by Flash looked either too expensive, or too complex to be worth doing and JMF seemed to be viable solution.
However we came up with a Rube Goldberg system that seems to do the job. I do not recommend this approach. I’ll go into exactly what it is and other surrounding issues in a post tomorrow or next day.
More on the subject can be found below.
This guy has devoted his life to finding a workaround. He’s amassed quite a collection of sources which will give you insight into both the problem and possible solutions.



I have faced similar difficulties, and just put a tool online (nellynomore) that does NellyMoser decoding on Linux: http://www.thoughtcrime.org/software/nellynomore/index.html
Moxie Marlinspike
May 25, 2007
Well that rocks. You did this with some very concise code.
I think maybe we can help one another. This project is growing and we’re looking to design a uri naming system to organize all the audio that would be uploaded. As you know its sort of hard to enunerberated mp3 files attached as comments or conversations so we’re lookng at MPEG7.
Let’s discuss.
videohound
May 25, 2007
Hmm, why not just put them in a database? I thought MPEG7 was designed more for a metadata index into very specific parts of an audio track. If you just want to organize whole tracks, I would think a database would work fine?
Moxie Marlinspike
May 25, 2007
You’re right about MPEG7 being used to describe the physical attributes of the file as well as some time related events. Its also supposed to have elements (I suppose elements would be what they are) to describe the actual nature of the content . Also I”m just theorizing whether there is any standard one could reasonably use to organize audio content that isn’t music. More specifically I mean organize user generated audio content on the fly.
But you’re defiantly right about the database only if one were intending to make the nature (both physical and subjective) of the data accessible then some uri based aliasing would seem in order. I really don’t know and so far nothing about Mpeg7 seems all that accessible so I tend to agree with you.
videohound
May 26, 2007
Thanks fro the comments as well as the info about Nellynomore. I like to include this feature in my future website, howevr I am totallly green in IT. Can somebody write a step by step procedure for a novice to follow to include voice recording and playing into their website. Thanks.
Sunil Jain
November 21, 2007