The authenticity of web scraping has been the subject of much debate. The question is, "is web scraping legal"? Web scraping is not a criminal offense. However, some ground rules must be followed because web scraping becomes illegal when non-publicly available data is extracted.
Web scraping refers to various methods for scraping web data from the Internet. In most cases, this is accomplished using software that simulates human web surfing to collect specific bits of information from various websites. Web scrapers may be looking to collect specific data to sell to other users or to use for promotional purposes on a website. Web scraping can also be known as web data extraction, screen scraping, or web harvesting.
LinkedIn and hiQ Lab over web scraping
On September 9th, 2019, the US Circuit Court of Appeals upheld hiQ's injunction against the Microsoft-owned social-media company, LinkedIn, ruling that scraping publicly available data from LinkedIn is legal. Despite media powerhouse insisting that web scraping violates user privacy, the ruling was issued. However, it is ruled that web scraping free websites doesn't violate the CFAA (Computer Fraud and Abuse Act).
LinkedIn intervened to prevent hiQ from harvesting user profiles from its sites. The San Francisco-based startup is an analytic firm that web scrapes personal information, particularly on LinkedIn profiles, for analysis. The analytic startup uses the data to analyze workforce information such as skill shortage and predict when workers are likely to leave their jobs.
The court of appeals decision was historic because it addressed data privacy and web scraping legal compliance regulations. At the same time, it appeared to imply that web crawlers could quickly obtain any data on public websites that were not copyrighted. However, the decision explicitly denied hiQ or any other web crawler the right to use the exact data for unlimited commercial purposes.
The decision legalized scraping web data and prohibited competitors from automatically removing information from your site if the site is public. In terms of legal compliance, the ruling stated unequivocally that the entry of a bot or web scraping software is no different from a browser's entry.
Copyrighted data is not covered in web scraping. For example, a web scraper bot could search YouTube for video titles, but you can't repost the same video on your site because the videos are copyrighted. The decision appears to protect data copyright, including media files, regardless of how the data is obtained.
The Ninth Circuit upheld its original decision, ruling that scraping data from publicly accessible websites is not an abuse of the Computer Fraud and Abuse Act (CRFAA).
Businesses can hire research analysts to scrape the web for product data and provide economic calculus to determine affordable prices for consumers. They can also scrape web posts, which is especially useful for publicly-funded and academic research on important issues like disinformation.
The court's decision is viewed as a significant victory for archivists, academics, researchers, and journalists who use tools to mass-collect or scrape publicly accessible information on the Internet. However, there have been numerous instances of web scraping that have raised privacy and security concerns. Clear view AI, a facial recognition firm, asserts to have scraped a large number of social media display photos, prompting numerous tech titans to file litigations against the company.
Though some organizations don't agree with the court decision concerning web scraping being legal, the best way to approach this is to limit the amount of information posted by users on a public website.
Conclusion
The extraction and use of publicly available data by external users (journalists, statisticians, and researchers) for study purposes should not be stamped out as it offers many benefits for different people in different fields. Publicly available data can provide businesses with the means to get better at pleasing the same users who provide these data.