Google Base - An Overview
Seeing as how I work for eBay, there’s been a lot of water-cooler talk the past couple weeks about Google Base. Additionally, I’ve heard a lot of very interesting theories, thoughts, and predictions about what the product means for Google, how they might position and monetize it, and how the product might evolve. For my own benefit, I decided to take a few minutes to organize some of the things I’ve thought about and heard, and figured I might as well do so in a format that I could share with others.
I don’t expect anything in this blog post to be revolutionary; if you’ve spent much time thinking about Google Base yourself, you’ve probably thought about a lot of this. Most of the info is just some of my basic thoughts aggregated with thoughts I’ve heard from others, and I’ve done my best to attribute ideas to the originators when they’re not mine.
So, let’s start with…what is Google Base?
On the surface, Google Base is just an aggregation of lots of atomic pieces of data. Data can be submitted in multiple ways, from a web-form entry of a single piece of data, to the submission of tens of thousands of pieces of data via an XML (RSS, for example) feed. Each piece of data is “tagged,” meaning there is a set of meta-data associated with the piece of content. These tags are defined by the submitter of the content, and are completely free-form.
Allowing free-form tagging of data obviously has its benefits and drawbacks. The main benefit is that there is no constraint on the number or type of attributes assigned to each piece of data, which is good when the database design can’t account for all possible attributes of the data within. For example, I submitted a profile of my puppy to Google Base, and thought that it was important to indicate that her favorite kind of turkey is the “Primotaglia Pan Roasted” variety – the database designers surely hadn’t considered the “favorite kind of turkey” attribute in a puppy profile, despite it’s obviousness to me. The downside of allowing free-form tagging of data is that you end up with a lot of garbage; in some cases, people assign inappropriate tags (think: keyword spamming), and in other cases, people just assign tags that nobody cares about (think: “favorite kind of turkey”). So, while you end up with a lot of useful data, it’s sometimes hard to distinguish it from the garbage data.
But, Google Base is not the first attempt at opening up a database to arbitrary and completely unstructured data in the form of attributes and meta-tags. The process is called folksonomy and is popular among websites and databases that specialize in aggregating and organizing the free-flow of data into useful paradigms. It’s used in such sites as http://del.icio.us, which uses folksonomy to organize and share web bookmarks among its user base, and in http://www.musicplasma.com, which uses meta-data to categorize bands and make recommendations based on your musical tastes.
In fact, while it might seem like having lots of data makes it really difficult to sort and organize it all, in actuality the more data you have, the greater the potential for better sorting and categorization. This is because the more data points you have, the higher the signal-to-noise ratio of the data (meaning relatively less garbage to weed out). With enough data points, you can essentially overlay the data and associated attributes onto a bell curve, and see which attributes “rise to the top” and which ones fade away as garbage. And it doesn’t really matter if some seemingly important but infrequently used attributes (like “favorite kind of turkey”) get thrown out – if few people are using that attribute to tag their data, then it’s likely that few people are going to want to search for data based on that attribute. Ro Choy from the eBay Developer Program BU said this pretty well in his blog post from yesterday.
So, the takeaway here is that while Google Base might look like a complete mess of data right now, it’s certainly possible that in six months when there are hundreds of millions of pieces of data to contribute to the folksonomy, it might actually present itself as well organized. I’m not saying that converting all this random data into a well-structured format is an easy task, but it seems reasonable that if anyone can pull it off, it’s Google.
There’s been a lot of discussion, even from a lot of folks that should know better, about how bad the interface to Google Base is…about how it’s hardly usable. But, the interface you currently see to the database is just that, an interface to the database. There’s nothing stopping Google from putting any “skin” they want over that set of data. It could be an ecommerce type interface that competes with eBay, it could be a Yellow Pages directory that competes with YellowPages.com, it could be a resume directory that competes with Monster.com, it could be a directory of high school graduating classes that competes with Reunion.com. Or it could be something completely new that doesn’t have an existing competitor. There are potentially thousands of businesses that Google could launch to monetize the data. Or they could just as easily license the data to any other company to jumpstart a data-based business.
As Bill Burnham points out in his blog, what Google is attempting to build is the world’s largest XML database. Over and above having possibly the most data ever stored in a single location, the data is structured in such a way as to be easily parsed, manipulated, or deployed around the web. As Michael Parekh points out in his blog post , Google could in short-order end up not just the largest searchable database, but also the largest directory on the web, competing with every currently existing portal. At very least, you can be sure that the XML-based standard RSS will be mentioned a lot more in the coming months and years, as this is likely one of the main mechanisms that Google will use to both pull in data and then publish the data out to the world.
So, all of these pie in the sky ideas about how Google will be able to take over all the big Internet players is interesting, but is it realistic? Probably not anytime soon, but maybe someday. In the meantime, there are lots of potential short-term benefits that Google likely considered when they conceived of Google Base:
-
- Very rarely do you see a webpage with a single piece of data, a single theme, or a single point of interest. By enticing people to break down all the information they have into atomic pieces of data and submit them to the database, Google now has the ability to greatly increase the relevancy of search results and provide additional sorting attributes. For example, if you’re looking to buy a “Giants Baseball Card,” you could type that term into Google and get a whole list of webpages that may or may not sell Giants baseball cards, and if they do, they may or may not have what you’re looking for (based on year, condition, price, etc). You could spend an hour checking out each link from Google and searching each site you jump to. Using Google Base, the same search results in not only a list of potential products, but a set of pre-defined attributes that you can use to further refine your search *before* you ever leave Google. Google no longer needs to ask if your “feeling lucky” because you can almost be sure that you’ve found what you’re looking for before ever leaving Google;
-
- By bringing the data “in-house,” Google can better parse, sort, and categorize the data than it can by just crawling sites and indexing web pages. A good example here is Google Video. While Google could just crawl sites that had video and index the tags and meta-data surrounding the video, they realized that by actually pulling in the video and hosting it themselves, they had the added benefits of being able to parse the video, break it apart, and more accurately identify the meta-data associated with the video. Same holds true with any other type of data. (Oh, and speaking of Google video, this was pretty funny);
-
- Henri Moissinac, eBay’s head of product strategy, brought up two fantastic reasons why Google would want to bring data into Google Base versus just crawling and indexing (and linking to) sites. If you’re not familiar with Google’s revenue model for advertising, here’s a quick overview… Advertisers pay Google to dynamically display their advertisements both on Google and on other websites within the Google “network” (actually, the advertisers only pay when the ads are clicked). The ads are displayed contextually, meaning the ads will relate to the content being shown on the page. When Google displays an ad on their website (in a search results page, for example), and the ad is clicked, Google collects 100% of the fee from the advertiser for the click. Members of the Google network (anyone with a website that signs up with Google) can display these exact same ads on their websites. If someone visits one of these websites and clicks on one of these ads, Google will collect the fee from the advertiser for the click, give a percentage of the fee to the website owner (for hosting the ad), and keep the rest themselves. As should be clear, Google makes more money if the ad is clicked from a Google page than if an ad is clicked from a page owned by someone else. So, it’s in Google’s best financial interest for them to host as much content directly on Google.com as possible, and send users to other sites only when necessary. As Henri points out, storage space is going way down in price and cost-per-click is going up tremendously, so the trade-off is an obvious one;
-
- As Henri also points out, in addition to offering more advertising impressions directly on Google, by having a database of individual data elements, Google can also ensure that the advertising they show is more targeted and more relevant. A web page may have many different topics or themes, and therefore providing a good contextual ad may be difficult; but an atomic piece of data is very easy to categorize and apply a targeted advertisement for;
As you can see, the short-term monetization will likely be around improved search relevancy and increased advertising revenue. The longer term monetization may be in the form of “skinning” the data to compete with some large web-based competitors or licensing the data to allow others to do that.
Personally, (in addition to the above) I think Google may also be planning to use Google Base to eventually enable and drive momentum around the payments system they are going to roll out. One scenario you could imagine (though certainly not the only one and certainly not a given) is that Google Base could be positioned as a competitor to eBay. Of course, Google Base transactions would be severely lacking in fraud protections for both buyer and seller. But imagine telling users that transactions through Google Base are completely free (no listing or transaction fees), mandating only that all sellers register with (and accept) Google payments. They then tell buyers that they are free to pay for transactions by any means they wish, but if they choose to pay with Google payments, they are afforded various fraud protections – escrow, buyer protection, a user reputation system, etc.
Sellers will be happy to sell through Google Base (even if they have to register with Google payments) because it’s free, and buyers will be “coerced” into paying with Google payments to alleviate fraud concerns. The only reasonable model for transactions on this platform would be through Google payments. Google would potentially get a piece of every financial transaction, and at the same time drive potentially millions of users to this new online payment system. You could also imagine that any company licensing the Google Base data would have the ability to quickly and easily integrate with Google payments and get access to whatever reputation system that the payments system has evolved.
November 30th, 2005 at 7:16 am
Or perhaps Google’s engineers came up with Google Base and no one at Google knew what to do with it, so they just launched it figuring someone in the blogosphere would give them some good ideas!
December 8th, 2005 at 8:45 am
[…] Google has Base. Microsoft has Fremont. Now Yahoo! has Answers. […]
December 8th, 2005 at 8:39 pm
[…] With the proliferation of search technology, new standards for data representation (XML, specifically), and the ever decreasing cost of storage, the ability for companies like eBay and Google to import massive amounts of data, analyze it (whether by hand or by machine) and then begin to organize it is finally becoming feasible. Google Base is one example of an initiative designed to collect and organize massive amounts of data into a well-defined structure that can be used to disseminate human knowledge in a way that a computer can (in some ways) understand. […]
March 9th, 2008 at 10:24 am
the insurance companies don’t want you to know
Information on the life insurance industry