Pieter’s Programming Blog

I’m a freelance Java developer based in Belgium (better known as Brussels outside Europe). Here I write about my experiences based on my daily struggle with man’s most feared enemy: computers.

Twitter OAuth in Drupal

It took me some time to figure out how to configure the Drupal twitter module so it is ready for the Twitter OAuth change which disables username/password logins on August 16, 2010.

The correct solution for me was to use version 6.x-3.x-dev of the twitter module and version 6.x-2.02 of the OAuth module.  The recommended release 6.x-3.0-beta2 of the OAuth module is renamed to oauth_common which is not compatible with the Twitter module.

These are the steps:

  • enable modules Twitter and OAuth (I also enabled the module “Twitter Post”)
  • Make sure the php extension “openssl” is enabled in php.ini (you may need to restart Apache)
  • On /admin/settings/oauth : check “hmac-sha1″
  • On /admin/settings/twitter/default : enter keys you can get from an app you create on https://twitter.com/apps
  • Add twitter accounts on /user/1/edit/twitter (or another user id)

From Amazon EC2 to Google Appengine

Amazon EC2

I was very impressed when Amazon launched its EC2 cloud infrastructure.  So, eager to test this, I started up some servers and tried to install my koopjeszoeker application on it.  Until then this Java application was running on a private server (in Brussels).  This is almost 2 years ago.

Everything went reasonably well and I liked the possibility to install a new version on a separate server and then just use the elastic ip address feature to switch the production version to this new server.  The problem I had was running a database server which could also scale with the application.  Luckily, Amazon seemed to read my mind every time I needed something.  So they released Simple DB as a scalable database which was enough for my needs.  Later on, they released Relational Database Service, but I haven’t needed this yet.

The whole setup for my site was maybe a bit overkill, but is was a nice test setup for learning more about working with this infrastructure.  During the next 2 years, I added Cloudfront and used S3 as a backup solution. I also set up Amazon Elastic Load Balancing with autoscaling enabled for traffic peaks. I wanted a server solution that just worked so I wouldn’t need to spend too much time on system maintenance.

Switching to Appengine

I was able to lower my monthly bill for hosting the zamtam sites (koopjeszoeker.be, koopjeszoeker.com, fr.zamtam.be,zamtam.fr and recently beta sites zamtam.co.uk and zamtam.de) by switching from Amazon EC2 to Google Appengine.  The monthly Amazon bill (a constanly running High-CPU Medium instance with S3 traffic, Simple DB, Cloudfront and now and then a test instance) was around $ 180  a month.  My Amazon server ran for almost 2 years.  My Google Appengine bill is now around 40 cents a day, which makes around $ 12 a month.  This is 15 times less!

I think the main benefit of Appengine versus EC2 in my case was that I don’t need a constantly running server, but I do need enough capacity to handle peak traffic (mainly in the evening and the weekends).  In the EC2 case, this means you need to start more servers (manually or with elastic load balancing) while Appengine handles this automatically.  You (roughly) only pay for the extra CPU time consumed.

For me the only reason not to use Appengine until a month ago was the lack for Java non-blocking IO support.  Luckily, this issue was (silently, I only found out about it by reading the detailed release notes) resolved and you can now use UrlFetchService.fetchAsync()!

Lessons Learned

Some things I’d like to share about my experience with AppEngine:

1 GB is a lot of space.  Don’t optimize for storage size when you have 200 GB a month for $ 1 a day.  A typical application won’t need more than 10 GB which costs $ 1.5 a month.  Similarly, one million tasks a day is a lot.  Don’t prematurely optimize to put a lot of work in one task when you can spread it in many small concurrent tasks. Like Chris Anderson puts it in his book “Free” (I couldn’t find the exact quote since I listened to the audio book in the car and this isn’t searchable yet): “when something’s free, people tend to treat it like it’s indefinitely available”.

6.5 free CPU hours already allow for a lot of work.  I handle around 10.000 visitors a day, a lot of URL Fetches and many image transformations and only now and then I need more than this.

Startup time can be an issue, so I removed all unneeded jars from WEB-INF/lib and did some lazy loading.  This startup time is however mainly an issue during lower traffic times because Appengine stops and starts instances according to the traffic.  A visitor who hits a just starting app needs to wait longer and sometimes gets an error page.  Once your app is up and handles a steady amount of traffic, the server instances seem to stay up.  You can monitor this in the logs by using a ServletContextListener and log the event in the contextInitialized() and contextDestroyed() methods.

The task queues are really useful to do work asynchrously, like cleaning up the datastore (remove all thumbnails older than 30 days) or executing long running cron jobs.  Requests called by the task queue provide some headers that are useful to retry a task only for 3 times. I check this header in the catch block and when it is equal to 3, I don’t throw an exception anymore so the task is removed from the queue.

There are workarounds around the 30 second execution limit.  My workaround is to do a small amount of work in a Servlet (Spring Controller) and then add the same url with some other parameters (like a database cursor) to a task queue.

You don’t need a database for everything.  I moved some tables that would never change to my Spring config XML which avoids datastore lookups.

Your application needs to be able to handle sudden shutdowns and startups without error.  A user may arrive on a different server instance for every request.  I decided not to use HttpSessions (I almost never use this).

The URLFetchService caches responses by default.  You need to add your own no-cache request headers to get fresh results.

Subscribe to the Appengine downtime notification feed, you can also check the system status.  According to Murphy’s law, the first week when I ran on Appengine the whole thing that’s not supposed to die went down.  Google did provide a detailed post mortem explaining everything.  As long as they’re the ones who need to solve the infrastructure problems and not me, I’m happy with that.  I’m modest enough to know I couldn’t possible match their expertise.

It is possible to set up multiple custom domains, so you’re not stuck with myapp.appspot.com.  I also use 4 hostnames for thumbs, like thumbs1.zamtam.com, thumbs2.zamtam.com, … and a hash on the filename to determine which hostname should server the image.

I created a small java class AppengineUtils.java with some useful methods, feel free to use it.  I add the app version to my javascript file so this has a different url for each time I deploy a new version and the cache headers for this url can be set to a much longer time.  I check if I run in the development server to show some buttons in the html that don’t show up in the production version.

Improvements

The dashboard resets every morning at 9 AM CET.  There is no way to see the quota details for the previous days.

The time mentioned in the logs is confusing since it is not my local time.  An option in the Appengine settings to set the local time would be handy.

The blobstore (still in beta) misses some features, like an easy way to store data fetched with the UrlFetchService to the blobstore.  Luckily my url fetches are smaller than 1 MB so I can store them in the datastore.

The Google Accounts integration is sometimes confusing.  I use Appengine from my Google Apps domain (onthoo.com) but my site runs on different hostnames (koopjeszoeker.be, zamtam.fr, …).  So I needed to add (verify) these domains to my Google Apps Domain.  This part succeeded.  The problem is that I want to send an e-mail from the Mail API, but this service only allows outgoing mails from accounts that are developers for the app.  I can’t seem to add a developer who has an e-mail address like noreply at zamtam.com (an extra domain for my onthoo.com Google Apps domain) instead of noreply at onthoo.com.  I get the developer confirmation e-mail, but the link goes through a series of redirects to end in an error page.  I think my whole Appengine setup is a bit messed up since I currently have 9 apps deployed and it still shows I have 4 remaining (you can have maximum 10 apps).  It can have something to do with the fact that I have a Google Apps account and a Google Account with the same e-mail address.  I have to be careful to log in through https://appengine.google.com/a/<YOURDOMAIN.COM>/ instead of https://appengine.google.com .

The URLFetchService is limited to 10 asynchronous fetches at a time, while I need 12 at the moment.  An increase would be nice, although I know my case is probably an exceptional one.

The 30 active dynamic request limit is for me sometimes an issue, since I use the image api to generate thumbs on the fly, which takes a bit longer (fetch the image url, resize it, store it in the datastore and return it).  Since I’m using different hostnames for the thumbs (like thumbs1.zamtam.com, thumbs2.zamtam.com, …) I get up to 10 requests at a time for a page.  You see the problem when I have 3 users requesting a page at the same time… I cache the thumbs so they’re only generated once, but this doesn’t handle all the cases.  This is something I need to investigate further and maybe I should ask for an increase?

‘Naked domains’ are not supported anymore, so using zamtam.co.uk for example isn’t possible.  This makes the DNS setup a bit more complex.

Conclusion

A lot of exciting things can be done with Appengine. Especially when you run a website instead of long-running batch operations, Appengine can turn out to be a lot cheaper than Amazon EC2.  While EC2 allows you to do much more and in the way you prefer, Appengine pushes a bit to do it their way which makes it easier for you.  With Appengine, you also don’t need to think about scaling MySQL, load-balancing Apache or updating Linux.

One benefit can’t be stressed enough: you don’t need to plan your server capacity beforehand since Appengine does this automatically.  Also, deploying a new version is easy: upload it, test it and when ready, switch the default version to the new version.  No downtime, no worries (you can always go back to the previous version if something shows up later with the new version).

Koopjeszoeker.be supports rich snippets

Since today, koopjeszoeker.be (and koopjeszoeker.com, fr.zamtam.be and www.zamtam.fr) supports Google rich snippets.  Not much happens yet, but you can have a look at the parsed data in this testing tool.

At the moment I still get “Insufficient data to generate the preview” from this tool although it seams to be able to parse the data, no idea what it means…

A Google Wave Account

Today I got my developer account for the Google Wave platform. Since the demo video looked really cool, I was eager to test this.

I didn’t have a lot of time to test yet, but my first impression is that it is the chaos of a bulletin board, mixed with the trolls found in blog comments and interspersed with unrelated automated bot comments.

Probably this is because I’m jumping right in, and other developers already had info sessions or got in earlier, so they had some time to get used to everything before the place was overwhelmed.

It’s still a developer preview, so I’ll give it some more time before I decide if I really like it…

Grails and Google AppEngine

I’ve created a small demo showing Grails on Google AppEngine.

The site is a showcase of Grails on Google AppEngine with the Grails AppEngine plugin.

Places are stored in the AppEngine datastore and a taglib is added for rendering the login button and the currently logged in user.

Additionally, an integration with Google AJAX Search API is done when adding a place.

URL.equals()

Apparently in Java an URL is equal if the ip is the same, so the following test will succeed (kapaza.be and kapaza.nl have the same ip address).

public void testURLEquals() throws MalformedURLException {
  assertEquals(new URL("http://www.kapaza.be"), new URL("http://www.kapaza.nl"));
}

Just so you know when you get strange results when putting URLs in a Set…  It’s even worse, since this means that comparing URLs needs name resolution, which is a slowdown.  More in the Javadocs.

One solution is to use an URI instead of an URL.  This will fail:

public void testURIEquals() throws URISyntaxException {
  assertEquals(new URI("http://www.kapaza.be"), new URI("http://www.kapaza.nl"));
}

Permgen space

Personal note to self: use these to increase memory in JBoss and work a bit longer before you get a PermGenSpace exception.

-Xms128m -Xmx1024m -XX: PermSize=64m -XX: MaxPermSize=256m

Google Apps E-mail Storage Reaches 7 GB

My Google Apps Standard Edition e-mail account (like GMail, but I can use my own hostname instead of gmail.com) reached 7 GB. I just noticed it today, but it may already be there for a few days since I don’t watch the GMail counter continuously.

The Thrash folder however still shows
No conversations in the Trash. Who needs to delete when you have over 2000 MB of storage?!

7 GB that is, if you use the same metrics hard disk manufacturers use. It’s still not really 7 GB though.

Looks like I still have some space left after 4 years of e-mailing (I didn’t import my old hotmail and student accounts I had before that):
You are currently using 472 MB (6%) of your 7007 MB.

I used a little trick (don’t tell Google!) to import my e-mail from my old server. When you purchase a Premier Edition account, you have better tools to migrate your e-mail. After 15 days (and a successful migration) I downgraded my Premier Edition to a Standard Edition for free (you have 30 days to try it).

How to lower the load average on a server with more than 50% in 10 seconds

$uptime
… load average: 0.54, 0.40, 0.36

$sudo vim /etc/fstab
(add the noatime option)

# /dev/sda3
UUID=8623d9e3... / ext3 defaults,errors=remount-ro,noatime 0 1

$sudo mount -a

$uptime
… load average: 0.24, 0.16, 0.17

So far this completely unscientific proof.

More info about the noatime option.

Internal Refactoring

For my 10-day visit to Tokyo, Kyoto, Nara, Hakone and Chiba (my brother-in-law’s wedding), I needed to refactor my internal progamming a bit to avoid OutOfMemoryExceptions.

// This class is package protected to avoid
// external programs messing up.class Brain {

public void handleEvent(MeetNewPersonEvent event) {
...  if (getLocation().equals(Locations.JAPAN)) {
 fireEvent(new BowEvent(event));
}}

public void handleEvent(BowEvent event) {
 // Avoid infinite loop.  The problem is the
// 'esteemed higher' part, the person
// for whom you're bowing may think the same.
 if (event.isPersonEsteemedHigher() && !event.hasBowedTooMuch()) {
 bow();
 } else {
 nod();
 }}

protected void bow() {
 lookSincere();
 smile();
 bendForward();
}

I also needed to reprogram the eating subroutines.

public void handleEvent(FeelingHungryEvent event) {
...  if (getLocation().equals(Locations.JAPAN)) {
 // This was a tricky one to handle,
// the implementation is left
// as an exercise to the reader.
uploadChopsticksRoutine();
}}

The RunForTrainEvent and especially the WaitForTrainEvent could be canceled out since public transportation is much better than the location I originally wrote it for (Belgium).

public void handleEvent(RunForTrainEvent event) {
 if (getLocation().equals(Locations.JAPAN)) {
   stopRunning();
   relax();
   Thread.sleep(5*Timer.MINUTE);
 }
}

Finally, I needed to handle the RunningNoseEvent (extends HasColdEvent) better.

public void handleEvent(RunningNoseEvent event) {
...  if (getLocation().equals(Locations.JAPAN)) {
 // Blowing your nose in public is NOT DONE
// in Japan.  This is considered
// a bit the same as burping.
// Public humiliation is your part when
// this is not checked.
 dipNose();
} else {
 blowNose();
}
}

protected Location getLocation() {
...
if (isCurrentLocationUnknown()) {
 if (bodyTallerThanMostOthers() && friendlyPeople()
       && metroEvery2Minutes()
       && dressCode.equals(Dresscodes.COSTUME)
       && eatingCode.equals(EatingCodes.SHOPSTICKS)) {
  return Locations.JAPAN;
  }
 }
}

} // End class

This is released under an Apache license. Please notify me if these changes are of any use to you.

Follow

Get every new post delivered to your Inbox.