Security Management
Published on Security Management (http://www.securitymanagement.com)
Screen Scraping Security
By John Wagley



    
Print Edition Only: 
No
Weight: 
0
Beyond Print?: 
No
Teaser: 

Companies that use scanning tools to search for data must be careful to avoid violating privacy laws.

Increasingly, organizations are using automated tools to scan and collect information online. They’re looking at sites such as social networks and blogs for reasons such as reputation management, public relations, market research, and background checks.

Tools that can automatically scroll for data known as screen scrapers are also becoming more advanced, but companies that use them must avoid legal pitfalls, which could include personal privacy violations as well as copyright infringement.

Social networking and other sites that collect user-generated data should also take steps to protect data on their sites, including establishing appropriate privacy policies and implementing the appropriate technical security measures.

The laws surrounding screen scraping and possible privacy and intellectual property violations are somewhat murky, said Brian Bowman, a partner at the law firm Pitblado. Bowman spoke at the Global Privacy Summit in Washington, D.C., sponsored by the Independent Association of Privacy Professionals.

In the United States, one interpretation of the law is that protected information doesn’t include information in a forum where a user voluntarily shared it, where it’s publicly available, and where users have not been led to believe that there are any technical controls limiting public access, he said. But it is fairly clear that it isn’t acceptable to collect information provided by children or from sites that are aimed at children. In other countries, such as Canada, the laws may be stricter regarding “expectations relating to publicly available information,” Bowman said.

There have been a few legal cases involving screen scraping that can be looked to for guidance. One, in Canada, involved Century 21 and Rogers Communication. The latter was accused of indexing, storing, and displaying photos and descriptions of properties that were for sale from Century 21’s Web site. Rogers had used robots to crawl the site, an action that was prohibited by the site’s terms of use. Rogers was found guilty of copyright infringement; $33,000 was awarded to the plaintiff.

The issue is growing in importance as tools to scrape screens for data are becoming more common and powerful, said Joanne Furtsch, policy and product architect at TRUSTe. Whereas much market data and research used to be collected by telephone, such data collection has been surpassed by online-based research, according to Furtsch.

Many organizations’ marketing, public relations, or research departments may be either considering getting involved in or already engaged in this type of data collection, she said. Privacy officers and other executives at those companies should make sure that those running the program know what kind of data can be legally collected. It’s also important to monitor what may be collected by any third-party research firms. Businesses must avoid infringing on other companies’ privacy policies and terms of use, she said. If any policies are unclear, the organization needs to get clarification before it proceeds with the data collection at that site.

Executives should also assess whether data being collected may be sensitive or personally identifiable information under state and national national laws, said Bowman. Companies should consider applying filters that can remove names from data.

Social networking sites and blogs should be sure to let consumers know, in their privacy policies and other areas, how the information they share on the site could be collected, said Furtsch. Such sites should also let potential screen scrapers know what information they’ll allow to be scraped. Some sites, such as Facebook, forbid any kind of automated data collecting, even if it’s by a user collecting data from his or her own account.

One technical measure that can protect against many scrapers is the robot.txt command, a text file that can give instructions to Web robots, said Furtsch. It has a serious limitation, however. In most cases, screen scrapers must choose to find the file and read its instructions in order for the text file to be effective; malicious bots likely won’t seek the file.

Another measure sites should take is to provide their users with mechanisms for deleting their sensitive data whenever they choose.

Another widely used tool to protect against scrapers are known as captchas. They show squiggly letters and numbers that a computer or bot cannot decipher. Sites have people type the captcha during registration to prove that they’re human.

Captchas should also be regularly updated, Furtsch said, as some scraping tools have been known to outsmart certain types of captchas.

Author: 
John Wagley
Related Resources: 

Comments

When deciding how to

Submitted by securityman on Wed, 09/26/2012 - 02:46.

When deciding how to implement your social media marketing strategies, it is important to take into consideration the nature of your products and services. For example, if purchasing your products is something that most of your customers would prefer to keep private, then do not put Facebook-like buttons right next to the buy buttons! Eventually, someone will click it accidentally and then get angry at your business.

Great Article but needs more security solutions

Submitted by ressaid on Fri, 08/03/2012 - 09:38.

John,

Great article! The one thing I believe missing is more effective ways to protect a website from screen scraping. Robots.txt unfortnantly doesn't work. Luckily, there are several companies out there that offer or specialize in Scraping protection.  Services such as www.distil.it, cloudflare.com, or blockscraping.com can put a barrier to entry to prevent scrapers from taking your content. 

Rami

Great article ! Many bots [1]

Submitted by etocker on Thu, 08/02/2012 - 10:22.

Great article !

Many bots are very sophisticated, changing their IP and its hard to track them

SiteBlackBox  http://www.siteblackbox.com/  can help you to stop such abuse and keeps those bots away for good


Security Management is the award-winning publication of ASIS International, the preeminent international
organization for security professionals, with more than 38,000 members worldwide.

ASIS International, Inc. Worldwide Headquarters, 1625 Prince Street, Alexandria, Virginia 22314-2818 U.S.A.
703.519.6200 | fax 703.519.6299 | www.asisonline.org

ASIS

© 2013 Security Management
This site is protected by copyright and trade mark laws under U.S. and International law.
No part of this work may be reproduced without the written permission of Security Management.

Powered by: Phase2 Technology

Source URL: http://www.securitymanagement.com/article/screen-scraping-security-0010132

Links:
[1] http://www.securitymanagement.com/print/10132#comment-1712