Penguinsdotmoohdotorg: Zero Winging it

This one's going out to Di. Google has been pretty busy recently. Actually, it seems like Google is always pretty busy. Every couple of months yields a new application which works pretty much straight out of the box. At the same time, they're working on improving existing projects, or putting the pieces in place for their new initiatives. By way of example, the mapping feature of Google maps is a bit bare for Germany - you can't see very much on their satellite maps around this region, and they have no road maps to speak of. Someone had been doing some digging around on the mobile version of Google maps, and found that the local maps actually have road information. I'm looking forward to that data eventually making its way out onto the main maps page, so that I can stop using the somewhat slow Map24.

A product which has just recently gone into beta is Google Base. As far as I can see, it is a hosting program which indexes the content that is posted there. It doesn't exactly sound all that revolutionary. There are numerous existing services such as Craigslist, a squillion recipe sites, job advertisement sites, or pretty much any other web page. So what exactly does Google Base do better than all these other places? Well, first of all, it's Google, so people will just use it because they know how to make good software. Secondly, I think it's an attempt to deliver on the Semantic web. The big problem with all the existing sites which are aggregating information is that there is no way to get access to the meta-data as a search engine, beyond crawling their pages, and attempting to extract it yourself. If you want to search all recipes on the web for recipes who have a base ingredient of Kangaroo, you just wouldn't be able to do that. It would be easy to find all recipes with the word "kangaroo" in it, but you wouldn't know where it is used in the context of the recipe. This is where Google Base comes in. On one hand, it is a place where you just collect any piece of information you have. It gives you all the tools you need to add meta-data to this piece of information. I haven't tried it, but I'm guessing it would have a suggest-like feature for the field data to help manage the dictionary of words used in each entry. Now, instead of a wealth of pages which don't have their meta-data easily accessible, Google has direct access to the informational content of each of these pages. Google also provides a way for a site maintainer to have their data indexed by Google. Every time you have an update of your data, you can just FTP your data files across to Google, and they will update their information (this is at least how Froogle works). So that's their brilliant plan - instead of spending money on developing technology to add meta-information to existing pages, they just get the content makers to do what they always should have been doing. In fact, it's quite often a lot easier just to get people to do the work for you (see Amazon's Mechanical turk). If you're interested in these types of approaches to computing, you'll want to visit Lui von Ahn's page. One neat trick he talks about there is to use games to get people to solve some problems which computers cannot do trivially. I really need to think of a game where people collect data for me to work on for my thesis.

Google Base came out as a beta product, which is standard fare for Google. Surprisingly, Google Analytics didn't come out in beta. The thing is - it probably should have been released in beta. For the past week, the service has been getting absolutely hammered by users signing up and using it to track the usage of their sites. It's running at the moment on this site. Performance has improved over the past few days, and the statistics only lag by about 24 hours now, and the stats which I do have already are quite interesting. My main readership seems to be based around the Ryde/Eastwood area (discounting the hits from Heidelberg), and I've even had visitors from China, Japan and the west coast of the United States. I was really surprised to see a hit on my web page from Cupertino. Digging a bit deeper, it seems like the Jobsinator himself visited here (ok - I'm not sure if it was him, but it was someone from Apple). I've also received some strange hits from search engines. It turns out that my page is the third result when you search for "how to twirl pens", third hit for "roflburgers", and also somehow manages to be a search result for this post from 2004. It's all very fascinating stuff. You guys spend about 3:40 reading the main page each time you visit, and there was one poor sod who spent 23 minutes reading the archives from July 2004. Which is really a pretty sad indictment on people who read this (which totals 19 unique people for the past half week).

I still don't have DSL. You might remember that I started going on about it back in July. I don't think that I am actually going to get it now. The phone line is actually hostile to getting DSL. Just this morning, the phone line blew up the second splitter I had ordered. I'm hoping now that I can get some contract-free DSL, or sign up for wireless internet access from someone. Now that it is getting really cold here, I don't really feel like jumping on a bus to get my broadband fix, and dialup is expensive and slow.

Bubbles. They're really cool.

10 comments

butercup 20/11/05 13:34 *

Neat! That bubble story is cooOOool! So I went to Google Analytics and you're right - they should have put it out in beta, because they're not accepting new sign-ups. Meh!

Speaking of humans doing the work, I signed up for Project Gutenberg proofreading which keeps me occupied while DTM racks up Amazon Cash on Mechanical Turk. Ain't boredband great ... :)

Anonymous 21/11/05 02:33 *

Google business plan:

Step 1: Develop successful search engine and revenue model.

Step 2: Develop random mishmash of other things.

Step 3: Profit!

Hiren Joshi 21/11/05 03:31 *

Step 2: Develop random mishmash of other things

Sounds suspiciously like another company that I know. Step 3 was conspicuously absent from the business plan.

Does Dan actually make any money from the Turk?

Anonymous 21/11/05 04:14 *

Firstly, the business you are thinking of never actually had a business plan. However, one was implied, and in this implied plan, Step 3 was certainly present. The rest of the plan followed a classical dot.com pattern:

Step 1: Invent ridiculous crap

Step 2: ???

Step 3: Profit!

butercup 21/11/05 18:49 *

DTM's pseudo-cash has reached the total of $4.79 in AmazonBucks. Allegedly these can be converted to RealBucks or used to purchase AmazonGoods, but what can ya get for $4.79??

- and if you could get it for $4.79 it would probably cost another $4.79 in postage. He'll have to proof another 200 photos. Sheesh!

Anonymous 22/11/05 13:29 *

One thing the Mechanical Turk seems to be good for is making me realise just how intensely ugly the United States is...or at least 'beautiful' California and Texas. It looks...third world.

Hiren Joshi 22/11/05 14:16 *

Here's what I don't understand. Who takes those pictures? Is it just some random person taking pictures of everything? Is it really that hard to take a single decent picture of your business before submitting it to Amazon?

Or is this something more insidious, like those Google vans which were driving around taking pictures for 3D mapping?

Maybe that's it - Amazon has driven around Texas and California taking pictures, and is now trying to correlate the pictures taken with the data from the white pages (or American equivalent).

It's obviously all planning for a terrorist attack. I mean think about it for a second - a Turk?! Aren't they those moooselim types?

Hiren Joshi 30/11/05 03:30 *

Actually, it turns out that the pictures from Amazon are for their Yellow Pages Blockview service - which you can see at this example.

butercup 5/1/06 22:09 *

HEEYYYY

The bubbles thing has gone COMMERCIAL !!!1!!~!!!

lolbiscuits.

Hiren Joshi 6/1/06 01:05 *

Neato - the website is down at the moment, but hopefully it won't be horrendously expensive.