C++ phonems recognition FFT class

I need a class to recognize from the mic in real time which sound is spoken, like [a] [e] ...

It must not be perfect, the goal is to move the mouth of a 3d character.

I would like to avoid using a big library able to do many more thing like SAPI, I'd prefer a FFT code.

Once again the detection can be approxymative, if the sound [?] if detected instead of the sound [y] or the sound [b] instead of [p] it's ok.

You class/function will take in entry the buffer containing for example 100ms of sound recording and detect witch determine the phonem.

I'll do the sound acquisition part.

