Saturday, May 14, 2011

Google IO 2011 Day Two recap

Oh the perils of making predictions when there is still a conference keynote to go!

It turns out that Chrome OS and the associated hardware hasn't been read the last rites after all. Rather, v1.0 is almost ready for primetime (scheduled for release in mid-June - about a month away). You have to imagine over time though that Google will want one code base for phones, tablets and chromebooks. At the very least, they will want to make it as easy as possible for developers to write their applications once and have them "just work" on devices with radically different screen sizes and input methods, something that Android developers today are already doing. Nonetheless, a very brave play, especially in targeting the enteprise space, where significant replacement costs exist. If it pays off, it will be huge.

Moving on from Chrome, a couple of sessions I attended yesterday were really interesting, specifically two - Full Text Search and Smart App Design.

Full Text Search is Google's take on Lucene / Solr and integrated into the App Engine Datastore as well, so it will be compelling for developers who just want to start indexing and scoring documents quickly. The "fully automatic" mode of operation with the Datastore should also be a timesaver.

Smart App Design covered material of a completely different color. I had already read about the Prediction API in the blogosphere but I hadn't realised exactly what it did until this session. Essentially, Google offers the discerning developer the ability to add machine learning techniques to their application by leveraging a cloud-based service.

At first glance, I had thought that the API gave access to the same model that Google uses to predict search terms, and I guess that is one use case. But Google has done much more than that - they have effectively white-labelled their machine learning technology and made it available to non-Google developers to use with their own data, i.e. learn what's important for their application / business.

As with all machine-learning techniques, the nub of the matter remains the correct selection and efficient representation of the key attributes in the training set, and that is quite simply a problem that requires deep domain knowledge. One announcement yesterday was quite interesting however, in that Google are now allowing good model authors to sell their models to others. So if I come up with a model that predicts shopping basket behavior on leisure travel websites and a tour operator used that to bump their online conversion rate by 33%, then that model has a lot of value and it's a win-win situation for the model author and the model user.

So an API with a lot of promise. But also with two potential flies in the ointment, one commercial and one cultural:

(a) Commercial - Google are trying to charge for use of the API from day one, this will stymie adoption in the earliest stage

(b) Cultural - an endemic problem with a lot of machine learning techniques is their black box nature. As someone who spent a fair bit of time working with artificial neural networks at university, quite often a machine learning approach will yield the correct answer but the researcher can't exactly explain why! That's not a Google-specific weakness, but what is Google-specific is that the modules you access via the Prediction API (the man behind the curtain if you will) is not made open at all, so can a company really invest time in building, training and using models that they don't really understand and can never hope to do so? Only time will tell.

So to recap then, Google IO was definitely worth attending this year - and not just for the hardware gifts! The main items on my research list post the event are:

1. Google Go running on App Engine

2. The Prediction API

3. Full Text Search enhancements / module for App Engine

4. Adding my own hooks and content into Google Maps and Street View to greatly enhance what the end user sees when they access Maps from my site

5. Fusion tables + Charting - a good / cheap way to rapidly slice and dice data and provide good interactive widgets to visualize same to end users.

No comments: