Legal issues around generative AI and web scraping | Venable LLP

Legal issues around generative AI and web scraping |  Venable LLP

Smart companies often take advantage of evolving technology to be more efficient or to offer new products or services. But if companies fail to conduct legal risk assessments before using innovative technology, the expected benefits can quickly be outweighed by the legal consequences.

Generative AI (GenAI) fits this bill, offering seemingly limitless opportunities but also posing significant legal risks. Companies should conduct a GenAI legal compliance assessment before launching GenAI software, especially regarding any data retrievals used by the model.

GenAI models are trained on terabytes of data, and rely primarily on data mining from the web to retrieve the massive amounts of data required. Many web scraping companies assume that if data is publicly available, it’s fair game, but this is a flawed assumption that is easy to challenge. Unauthorized web scraping can result in copyright infringement, breach of contract, violation of privacy rights, and violation of the Computer Fraud and Abuse Act (CFAA), to name a few.

  • Copyright infringement. Content or databases of information available on or through the Websites may be protected by copyright, even if the content or database is not behind a paywall. Copyright law provides the copyright owner with the exclusive rights to “reproduce,” “copy,” “distribute,” and make “derivative works” of the copyrighted work (among other things). Without authorization from the copyright owner, using GenAI software to extract copyrighted content (for example, a news article, poem, or artwork) from a website and make use of it again may result in a claim of copyright infringement.
  • Breach of contract. Websites typically include legal terms that website operators seek to impose on users of their sites. In these agreements, the Operators, in many cases, include language that expressly prohibits web scraping or the use of other similar techniques to protect their rights to the content/data available on their sites. To the extent such agreements are deemed enforceable, and User uses GenAI software to retrieve data from the Site, User may be subject to a claim for breach of contract.
  • Violation of privacy rights. The proliferation of US states passing new privacy laws that expand users’ rights over their personal data has made data mining for GenAI models a growing concern. When obtaining personal information, federal and state laws and regulations may require notification, consent, and the ability to opt out of the data collection or use, depending on the age of the person from whom the data is collected and the type of data. What was collected, and where the data was collected. If these laws and regulations are not followed in the initial collection of data and subsequently published on the Site, legal liability may arise and may extend to the entity using the GenAI model.
  • Computer Fraud and Abuse Act. Data scraping under the CFAA has been an evolving legal issue that courts continue to evaluate. While recent case law has made the CFAA largely inapplicable to data that is publicly accessible, data that is kept behind an authentication or paywall (for example, where users log in or pay for access to data) may give rise to liability under the CFAA, as technical barriers are created to prevent unauthorized access.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *