I'm working on this basic idea: an automatic video blog created from a search on youtube.
I like following the interviews of the David Letterman Show on youtube but it's quite difficult to subscribe to a good RSS for that. A simple query returns old and new results and many duplicate videos. Some interviews are partials, other are of bad quality. So I start building a filtering engine, using the Google Data Api.
I started the project on Google App Engine, on the Java environment.
It's still pretty basic, but you can already see the results:
http://videovertigo.appspot.com/letterman/
I start with the query:
"+letterman 2009|09 -monologue -"top 10" -"top ten""
Then I look in the title and in the description for a Date. The parsing is performed by Antlr (thanks to Piercarlo for implementing this part).
Then I assign a rank to each keyword, based on their frequency in the result set.
Finally I try to cluster the videos that look similar, based on keywords and date.
I'm still playing with the clustering to make it as general as possible. I would like users to build thier own video blog from a complex query, using tools like information extraction and clustering.
In the home page there is a simple search functionality you can use to play with the engine: http://videovertigo.appspot.com/
What do you think? Any idea how to improve the product? Are you an engineer and you would you like to contribute? Please contact me!
domenica 20 settembre 2009
Iscriviti a:
Commenti sul post (Atom)
Nessun commento:
Posta un commento