• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Computer Speech Recognition

Olowkow

Philosopher
Joined
Oct 29, 2007
Messages
8,230
I attended a lecture by one of the PhD members of a group developing an extremely sophisticated speech to text algorithm. The program is available for testing on the internet using the Google Chrome browser only at this site:

http://www.google.com/intl/en/chrome/demos/speech.html

There are programs like Dragon that are very good, but this just seems beyond the pale insofar as its uncanny subtlety of recognition.

Select from over 30 languages, choose your dialect, and just click the microphone and "allow" (to allow google access to your mic), then speak while the microphone icon is flashing. Your speech will be transcribed. I tried it in French and Spanish, and my poor German and Italian, and it works incredibly well. I chose "Australia" and tried my best Aussie accent, but it didn't like my pronunciation.:mad:

As someone who has had some experience in the field of theoretical linguistics, I must say this is very close to the "holy grail" of computerized speech recognition. My specific question this evening was, "Did you accomplish this pretty much by throwing tons of computer power at the problem?" The answer was "Yes."

This program is well worth trying out for yourself, and there are huge implications for the disabled (blind, paraplegic, etc.) as well as computerized translation. It allows for various regional accents, including Indian, and it is not a trivial matter to fool it. The principle investigator (PI) gave a lecture in China, and a simultaneous machine translation into Mandarin was done, and it was reportedly excellent.

I am very excited to have found this, and I'm hoping some of our JREF blind or otherwise disabled members might find this as interesting and useful as I have.

Comments would be much appreciated.
 
Last edited:
I've tried the little bit of Russian, Polish, Korean, Japanese, and Chinese I know. It's spot on. I can't imagine why it doesn't like my "G'day mate, no worries, how ya goin' then?" Australian impersonation. This thing even does Euskara (Basque)! A language where "Es" means "NO!". :D Seriously.
 
"Did you accomplish this pretty much by throwing tons of computer power at the problem?" The answer was "Yes."
It is amazing what that small, slow 30W biological computer inside our heads can do.

IXP
 
It is amazing what that small, slow 30W biological computer inside our heads can do.

IXP

Our language ability is beyond amazing. The lecturer gave an example of a question to Siri whose answer does not take into account social/cultural considerations.

Iphone user: Siri, I am bleeding. Can you call me an ambulance please?
Siri: Certainly, Ann Ambulance. What do you need?

The next layer they plan to incorporate is the strange world beyond mere context sensitive programming, social and semantic rules. It is not straightforward to achieve the socially sensitive response B, below, to the utterance A from a machine. Response C purely is logical.

A. It's kind of chilly in here.

B. Oh, sorry, I'll turn up the heat.
C. Yes, I agree, it is 66 degrees Fahrenheit.
 
Sounds like ( pun intended) that this would work well for simultaneous closed captioning creation for TV. However, how does it fare with a noisy background, for instance at a county fair, a car accident scene, or a NASCAR event?
 
Many thanks for posting this Olowkow. Can't wait to try it. I experimented with the VR system Google uses for searches and was impressed.

I tried that one, but it can't seem to find my microphone for some reason. I'll try to figure out why when I find the time.
 
Sounds like ( pun intended) that this would work well for simultaneous closed captioning creation for TV. However, how does it fare with a noisy background, for instance at a county fair, a car accident scene, or a NASCAR event?

The nature of the algorithm depends on phoneme recognition, and according to the lecturer it is not yet suitable for closed captioning and is readily defeated by ambient noise.
 
Sigh

Still gonna cost us $200 per hour then.

I made a point of asking if this was used for any closed captioning, and I was told it was not.

Is $200/hour the rate for the closed captioners? I don't feel sorry for them any more if so.:D
 
The station I work for us part of a large cableco. Our agreement with the cc provider is $100 per half hour, charged in half hour increments. We had one live event that went approx five minutes long though and did not get charged for additional half hour, bit of a break there.
This applies only to live captioning though. In post production shows in Canada, one must have embedded captioning( there's a term for it, escapes me right now) which is more accurate and a bit cheaper but its a pain in the buttocks since Final Cut does not conveniently marry the cc file with the video file. Apple,,,,, phooey!
 
My dog is going crazy trying to catch the Capitol building I'm not going to go get it
Um, no.
Call the roller derby cigars the muscle Wanted been with the Incas kitchen cups Kaka person curds
No.

I'm sitting on my bed talking to my computer my lamp is turned on Simon is trying to lick my face Simon and dawg
Better, but still not right.

The farmers have started to feed a sheep I can hear the bleeding on side
Okay, I am not so impressed. I am trying to speak very clearly and slowly (earlier attempts not pasted here was talking fast, and that was an absolute disaster).

from the critically acclaimed author of the longshot comes the Sun to about the destruction of a family home in the way of life set a struggling farming colonial Country King on the brink of civil war going to the force is Attila family drama template turmoil which fiery storytelling melt with Darren original pros since his mother's death Thomas father have passion to stream the mystic pizza with everything is frozen on the old man's vicious control but with a young woman name Corinne horizon the farm the tension between the
I read the back jacket of a book. This one is extremely inaccurate.
 
Um, no.
No.

Better, but still not right.

Okay, I am not so impressed. I am trying to speak very clearly and slowly (earlier attempts not pasted here was talking fast, and that was an absolute disaster).

I read the back jacket of a book. This one is extremely inaccurate.

It could be a number of things. For example if the microphone is not very good or if there are background noises then it will not work properly.
 
Random sentences from Dan Dennett's "Intuition Pumps".

Darwin discover[ed] the power of an algorithm. NL Grill em[An algorithm] is a certain sort of formal process that can be counted on logically to yield a certain sort of result whenever it is run or instantiated.
Substrate neutrality. The procedure for long division works equally well with pencil or pen, paper or parchment, neon lights or skywriting, using any symbol system you like.
Underlying mindlessness.Overly roll [overall] design of the procedure may be brilliant, or yielda brilliant results, each constituent step, and the transition between steps, is a Shirley Temple [is utterly simple].

It's far from perfect. :) BTW it does periods and commas, not colons. The Canadian accent choice works better for me than the US one.:confused:

ETA: I played with the word "utterly" just now. The program wants to hear a clearly pronounced /t/ phoneme, and I pronounce it like I do "water", with a /d/ sound. The program does not like /uderli/, often hearing "early". It hears /uterli/ every time as "utterly".

Something I memorized long ago from Camus' L'Étranger.
Nous avons marché longtemps sur la plage. Le soleil était maintenant écrasant. Il se briser[ait] en morceaux sur le sable et sur la mer.

Periods and commas don't work in French.

 
Last edited:
The program is trying to teach people to use what it considers proper enunciation and cadence. Once you have demonstrated you are sufficiently compliant, it will move on to educating you to think only as big brother wishes.
 
How does it handle things like "sail" and "sale" or 'they're', "there", and "their"?
 
How does it handle things like "sail" and "sale" or 'they're', "there", and "their"?

I don't know how it knows, but it seems to know. I just tried it:

Look over there I like their style they're going to the store

eta:
It does NOT seem to know sale from sail, though.
 
Last edited:
I don't know how it knows, but it seems to know. I just tried it:

eta:
It does NOT seem to know sale from sail, though.

I tried a few also, like "frieze and freeze". The program didn't know "frieze" at all. Up to a certain point, it is "context sensitive" and is programmed to know that certain combinations are more likely than others disambiguating on that basis if the sounds are not enough.

Look over there I like their style they're going to the store
/their+noun/, /they're + verb [-ing]/ etc. The speaker gave the example of "The dog ran, pan, can." "The dog ran" being more "likely than the others" on the basis of extensive frequency of occurrence research. Lots of memory.

True semantic rules would involve programming which gives, for instance, /sail/ qualities in the lexicon such as [+boat, +mast, +water] etc. /sale/ might have lexicon entries with combinations such as [on sale, for sale, no sale, sell, etc.] The program would then look at a sentence such as:

I raised the sail and the mast broke, so we had to sail to port and find a new one on sale.

and figure out which word to use based on the overall meaning of the sentence from a much broader semantic context. Throw even more memory and computer speed at the problem, and you have a very serious speech to text capability which can also make use of cultural and other real world referents.

Our brain seems to process linguistic information in some kind of parallel process, rather than sorting serially through vocabulary for meanings. This aspect of psychology has proven to be a very difficult problem to solve.
 
Our brain seems to process linguistic information in some kind of parallel process, rather than sorting serially through vocabulary for meanings. This aspect of psychology has proven to be a very difficult problem to solve.

Regarding that, isn't there a 'thing' about constructing trick sentences that appear to be written one way but use less obvious grammar or word meanings, and read very confusingly?

Also, I tried a few of the sentences in Dragon out of curiosity to compare the competition; it had no problem with 'there/their/they're', occasionally got frieze right after I corrected it the first time, and got the last two sail/sales correct but never the first. (It seemed to be choosing based on whether the word was a noun or verb, but that might have been a coincidence or just one part of the selection.)

(Edit: of course, remembered almost immediately after posting - this is what I was thinking of.)
 
Last edited:

Back
Top Bottom