Saturday, October 14, 2006

Datamining Speech In SQL Server 2005

I decided to capture everything that my 2 and a half year old son says, store it in a SQL server 2005 database and run some statistics

This is the outcome in order of most words used
Blues Clues

As you can see dada is higher than mama (by a count of one)
So this is interesting, as a child approaches the terrible two’s the most used word is no. The second interesting thing is that watch was used 6300 times and Barney, Movie and Blues Clues 2100 times each. It looks like my son likes his entertainment evenly split. And of course junk food is also high on the list, broccoli is not even in the top 100

So how did I do this? Well I used this tool called Komodo Unnatural Speaking. This is a chip embedded in a sticker. The chip captures all speech and because it’s a RFID tag whenever my son passes a certain spot in the house the data is downloaded to my computer and the chip is cleared. A SSIS package uses Fuzzy Logic to process the data (a toddler’s speech is sometimes gibberish). We get rid of noise words and the results are stored in a table.

Pretty interesting don't you think? ;-)

1 comment:

Anonymous said...

Thats funny. I'm glad my husband doesn't know about this. He might be tempted to log how many times I nag him about certain things. :)