Have you ever wondered whether it’s legal for Internet search engines like Google to automatically crawl a Web site and serve up snippets of it to search users? How about creating a cache of the pages it indexes, which can be served in their entirety without referencing the original site? Or using snippets of articles on news and aggregator sites?
Well, certainly other people have pondered these questions, and Internet search titan Google just lost its first bid to overturn a Belgian court order requiring Google to publish a ruling which found Google violated publishers copyrights by reproducing snippets of articles on its Google News aggregation service. In March 2006, Copiepresse brought a case against Google claiming its Google News service at news.google.be was illegally reprinting snippets of articles from its Belgian news sources without explicit permission. The judge agreed—and Google declined to participate at all—ruling that Google remove content from Copiepresse’s French and German-language newspapers from Google’s Belgian news Web site within ten days or pay a fine of €1 million a day. Google was also required to publish the text of the ruling on both Google Belgian home page and Belgian news site or pay a fine of €500,000 a day. Google has removed the news sources from its news sites and index, but has refused to post the text of the ruling, citing the sigificant publicity the case has already received. An appeal on the entire case is scheduled for November 24, 2006.
In many ways, Copiepresse’s complaint comes down to permissions. Search engines like Google, assume that if a Web page is publicly accessible, it can be indexed, although they honor opt-out requests and automated means for specifying content should not be indexed, such as
robot.txt and robots META tags. However, Copiepresse (and most copyright law) operates under the assumption that to republish or redistribute content, permission must first be obtained from the copyright holder. Copiepresse wants to be in Google’s system, but, as a publisher, they want to be compensated for the value their content brings to Google.
To that end, the Paris-based World Association of Newspapers (WAN) is launching an Automated Content Access Protocol (ACAP) which would govern and specify how search engine spiders, news services, and other crawlers could access and utilize content from publishers—including specifying royalty arrangements and access levels—rather than the simple opt-in or opt-out model offered by existing exclusion policies.
Google and other search engine operators argue that their aggregators and indexes provide a useful service to content publishers, enabling Internet users to discover and access their sites in ways which wouldn’t be possible for the publishers on their own. Some publishers, however, claim Google, MSN, Yahoo, and others have essentially built their search and aggregation businesses by taking and utilizing publishers’ content without permission or compensation.