Ideas and Code

domenica 20 settembre 2009

A video blog on App Engine

I'm working on this basic idea: an automatic video blog created from a search on youtube.

I like following the interviews of the David Letterman Show on youtube but it's quite difficult to subscribe to a good RSS for that. A simple query returns old and new results and many duplicate videos. Some interviews are partials, other are of bad quality. So I start building a filtering engine, using the Google Data Api.

I started the project on Google App Engine, on the Java environment.
It's still pretty basic, but you can already see the results:

http://videovertigo.appspot.com/letterman/

I start with the query:
"+letterman 2009|09 -monologue -"top 10" -"top ten""

Then I look in the title and in the description for a Date. The parsing is performed by Antlr (thanks to Piercarlo for implementing this part).
Then I assign a rank to each keyword, based on their frequency in the result set.

Finally I try to cluster the videos that look similar, based on keywords and date.

I'm still playing with the clustering to make it as general as possible. I would like users to build thier own video blog from a complex query, using tools like information extraction and clustering.

In the home page there is a simple search functionality you can use to play with the engine: http://videovertigo.appspot.com/

What do you think? Any idea how to improve the product? Are you an engineer and you would you like to contribute? Please contact me!

Nessun commento:

Posta un commento