Ideas and Code

lunedì 5 ottobre 2009

JTidy errors to Log4j

By default JTidy logs all the errors on the standard output, kind of an old fashion way of doing logging. So I spent some time to integrate it with log4j to clean up my standard output.

First, you need a PrintWriter as a bridge from JTidy to Log4j:

public class Log4jPrintWriter extends PrintWriter {
Priority level;
Category cat;
StringBuffer text = new StringBuffer("");

public Log4jPrintWriter(org.apache.log4j.Category cat, org.apache.log4j.Priority level) {
super(System.err); // PrintWriter doesn't have default constructor.
this.level =level;
this.cat = cat;
}

// overrides all the print and println methods for 'print' it to the constructor's Category
public void close(){
flush();
}
public void flush(){
if (!text.toString().equals("")){
cat.log(level,text.toString());
text.setLength(0);
}
}
public void print(boolean b){
text.append(b);
}

public void print(char c){
text.append(c);
}
public void print(char[] s){
text.append(s);
}
public void print(double d){
text.append(d);
}
public void print(float f){
text.append(f);
}
public void print(int i){
text.append(i);
}
public void print(long l){
text.append(l);
}
public void print(Object obj){
text.append(obj);
}
public void print(String s){
text.append(s);
}
public void println(){
if (!text.toString().equals("")){
cat.log(level,text.toString());
text.setLength(0);
}
}
public void println(boolean x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
public void println(char x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
public void println(char[] x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
public void println(double x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
public void println(float x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
public void println(int x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
public void println(long x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
public void println(Object x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
public void println(String x){
text.append(x);
cat.log(level,text.toString());
text.setLength(0);
}
}

Thanks to JD Evora for this.

Then, declare your Logger and your PrintWriter

private static Logger log = Logger.getLogger(HtmlProcessor.class);
private static Log4jPrintWriter log4j = new Log4jPrintWriter(log, Level.DEBUG);


And finally, make JTidy work with it:


InputStream pageStream =  new ByteArrayInputStream(html.getBytes("UTF-8"));
Tidy tidy = new Tidy();

tidy.setOnlyErrors(true); //<------------------

tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setQuiet(true);
tidy.setShowWarnings(false);
tidy.setErrout(log4j);
dom = tidy.parseDOM(pageStream, null);
dom.normalize();
...


Enjoy your clean catalina.out!

giovedì 24 settembre 2009

My Personalized Tech News

I've been working on BuzzBox for a while, so I feel now confortable now to share the first links to it.
I'm currently using BuzzBox to read the most popular tech news every day.
News from BuzzBox are:
- personalized: I picked my favourite web sites
- filtered: I get only the most popular news every day
- clustered: so I don't get the same news twice

I like to describe this first stage as a "personalized techmeme" (techmeme is an tech news aggregator).

You can see My BuzzBox at My BuzzBox



I've put the RSS in my Feed Reader, and I consume the news there. Others are pushing thier News from BuzzBox to Twitter. See for example http://twitter.com/anigamBuzzBox or Anu's tech BuzzBox

We have a long roadmap ahead. We want to bring personalized news to social networks, we want to be a preferred place to share and comment about news with your friends and everybody else.

Check out the site and let me know what you think.

domenica 20 settembre 2009

A video blog on App Engine

I'm working on this basic idea: an automatic video blog created from a search on youtube.

I like following the interviews of the David Letterman Show on youtube but it's quite difficult to subscribe to a good RSS for that. A simple query returns old and new results and many duplicate videos. Some interviews are partials, other are of bad quality. So I start building a filtering engine, using the Google Data Api.

I started the project on Google App Engine, on the Java environment.
It's still pretty basic, but you can already see the results:

http://videovertigo.appspot.com/letterman/

I start with the query:
"+letterman 2009|09 -monologue -"top 10" -"top ten""

Then I look in the title and in the description for a Date. The parsing is performed by Antlr (thanks to Piercarlo for implementing this part).
Then I assign a rank to each keyword, based on their frequency in the result set.

Finally I try to cluster the videos that look similar, based on keywords and date.

I'm still playing with the clustering to make it as general as possible. I would like users to build thier own video blog from a complex query, using tools like information extraction and clustering.

In the home page there is a simple search functionality you can use to play with the engine: http://videovertigo.appspot.com/

What do you think? Any idea how to improve the product? Are you an engineer and you would you like to contribute? Please contact me!

lunedì 31 agosto 2009

A Bot from Brussel?!

I can see in Google Analytics a bunch of new visits every day to our service, buzzbox.com, from Brussel, Belgium.
They don't look right. It seems to be a Bot that is somehow downloading all static files and executing the google analytics script too.

All the visits come from 2 IP addresses


84.17.129.60
84.17.129.61


And the User Agent, as it appears in my access log, is

"Mozilla/5.0 (X11; U; Linux x86_64; c) AppleWebKit/528.5+ (KHTML, like Gecko, Safari/528.5+) WebShot"


We get most of our traffic from Twitter, so I think it is something connected with Bit.ly

I'm currently blocking requests from those IPs... but it would be nice to understand more. Anybody has any clue? Please leave a comment.

giovedì 6 agosto 2009

Tomcat on EC2 hot deployment

I'm running a simple application on Amazon EC2 on a single Tomcat and I have been looking for a while for the best way to deploy updates.

I'm builing the application with Maven 2 and so I first I considered using the Cargo plugin to connect with a running Tomcat Manager. It didn't work for me: the Tomcat Manager stops and undeploys the application before starting receiving the new war file. In my case I was waiting almost 2 minutes to see my new war up and running (2 minutes in which the application is not reachable). I considered that too long.

I ended up uploading the application with scp and then moving it in place in the running Tomcat.

1. Create this script on the server. hotdeploy.sh
#!/bin/sh
TOMCATWEBAPPS=/var/local/tomcat-buzzbox/webapps
chown tomcat:tomcat ROOT.war

if [ -f $TOMCATWEBAPPS/ROOT.war ]; then
cp -p $TOMCATWEBAPPS/ROOT.war ROOT.war.previous
fi

cp -p ROOT.war $TOMCATWEBAPPS/ROOT.war


It just move the war file in tomcat/webapps, while tomcat is running

2. Make sure your Tomcat is configured for auto deploy (in server.xml)
<Host name="localhost"  appBase="webapps"
unpackWARs="true" autoDeploy="true"
xmlValidation="false" xmlNamespaceAware="false">


3. Create this script on your local machine (or development environment). deploy.bat - I'm using windows and cygwin.
echo off
echo Overwrite App with new version. Are you sure?
echo -- press any key to continue; CNTR+C to cancel

pause

scp -i keypair1.pem ROOT.war root@domain.com:/root/ROOT.war
ssh -i keypair1.pem root@domain.com 'cd /root/bb-deploy; ./hotdeploy.sh'


That is: scp to move your war file and ssh to run the hostdeploy.sh script on the server. Tomcat auto deploys does the rest.

That's all. It works great for me. Only few seconds of downtime.
What do you think? How would you deploy a war in such an environment?

domenica 2 agosto 2009

Virtual Hosts with Apache, Tomcat, AJP

Here I describe how to configure Apache to handle request for 2 virtual hosts, redirecting the requests to 2 Tomcat installed on the same machine.

Suppose you want to handle 2 applications hello.com and welcome.com.
The DNS points to the same IP address and Apache is listening to port 80.


Apache/2.2.3

Add proxy_ajp.conf in /etc/httpd/conf.d

LoadModule proxy_ajp_module modules/mod_proxy_ajp.so

NameVirtualHost *

<VirtualHost *>
ServerName hello.com
DocumentRoot /var/local/tomcat/webapps/ROOT/
ProxyPass /static/ !
ProxyPass / ajp://localhost:8091/
ProxyPassReverse / ajp://localhost:8091/
</VirtualHost>

<VirtualHost *>
ServerName welcome.com
DocumentRoot /var/local/tomcat2/webapps/ROOT/
ProxyPass /static/ !
ProxyPass / ajp://localhost:8090/
ProxyPassReverse / ajp://localhost:8090/
</VirtualHost>


Note how the static files (tipically images, css, javascripts... everything inside /static/) will be served by Apache. You need to have both "ProxyPass /static/ !" and the correct DocumentRoot.

Tomcat 6.0.18

The default configuration of Tomcat 6.0.18 works pretty fine. You just have to change the ports so they don't conflict. I'm using 8090 and 8091 for the 2 APJ Connectors of the 2 Tomcats.

Amazon EC2 - file transfert console

If you run Windows, and you want something more user friendly to move file from and to EC2 than "scp", you can use WinSCP, following this directions:

1. Download WinSCP
2. Download puttygen
3. convert your private key (keypair1.pem) to putty format using puttygen. Save it as .ppk

Just run:
puttygen.exe keypair1.pem
then meny File/Save as Private Key

4. connect to your instance using WinSCP, specifing scp protocol and using the generated ppk key

sabato 1 agosto 2009

A Blog Style for posting code

It took me a while to configure the style of this blog as I wanted.
I needed a template that was clean, easy to read, and wide enough to post code.
I needed a simple way to format code, mostly Java or XML. I wanted to use free tools and I wanted to copy and paste the code from my IDE without touching it (no adding spaces or tabs, please).

So, I came up with this solution:

1. Select "Minima Lefty Stretch by Douglas Bowman". Then you have to modify the template... read on.

2. Make it 960px width and with a dark bg color.



<Variable name="bgcolor" description="Page Background Color"
type="color" default="#fff" value="#162541">

#outer-wrapper {
width:960px;
margin:auto;
background:#ffffff;
padding:10px;
text-align:left;
font: normal normal 100% Verdana, sans-serif;
}


3. fix the publication date line and the body line-height
<style>
.postmeta {
font-size:80%;
text-align:right;
}

.post-body{
// line-height: 1.6em ---- make sure you remove this
}



4. Add code formatting support with csharpformat

<style>
.postmeta {
font-size:80%;
text-align:right;
}
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: Consolas, &quot;Courier New&quot;, Courier, Monospace;
background-color: rgb(240, 250, 230);
/*white-space: pre;*/
overflow: auto; overflow-y: visible;
padding: 10px;
border: solid 1px rgb(120, 125, 115);
}

.csharpcode pre { margin: 0em; overflow: auto; overflow-y: visible;}
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

</style>


5. To format my code, I copy and paste my code to csharpformat (it works decently for Java too) and then I paste the result in blogger's HTML view.

venerdì 31 luglio 2009

Media RSS Plugin for ROME - howto

Rome, the Rss Java library, is extensible using Modules.
This is how to integrate the Media Module (http://search.yahoo.com/mrss/) by Yahoo!

Getting the library:

Download the jar from java.net here - code there is not very useful.

Maven:

Add to your pom.xml this repository:

<repositories>
<repository>
<id>dist.wso2.org</id>
<name>dist.wso2.org</name>
<url>
http://dist.wso2.org/maven2/
</url>
</repository>
</repositories>


and this dependency:

<dependency>
<groupId>rome</groupId>
<artifactId>rome-mediarss</artifactId>
<!-- rome Yahoo! Media RSS Plugin for ROME , hosted on http://dist.wso2.org/maven2-->
<version>0.2.2</version>
</dependency>


Java Code:

You can iterate through your Media Content like this:


SyndFeed feed = input.build(new XmlReader( method.getResponseBodyAsStream() ));
List entries = feed.getEntries();
for( Object obj : entries ) {
SyndEntry e = (SyndEntry) obj;
MediaModule mediaModule = (MediaModule)e.getModule( MEDIA_NS );

if (mediaModule!=null && mediaModule instanceof MediaEntryModule ){
MediaEntryModule mentry = (MediaEntryModule ) mediaModule;

for (MediaGroup mg : mentry.getMediaGroups()) {
for (MediaContent mc : mg.getContents()) {
if (mc.getType()!=null && mc.getType().startsWith("image")) {
String imgUrl = mc.getReference().toString();
// etc...
}
}
}
}


In this case I was interested in getting image urls.
You can access the MediaRss JavaDoc here