MUSA Monthly Newsletter

Issue 7 | July 12, 2023

Welcome to the Mitigating Unauthorized Scraping Alliance newsletter where we highlight topics of interest related to unauthorized data scraping. Unauthorized data scraping involves the automated collection of user data at scale that violates a platform's Terms of Service.

Join our Mailing List
Join our Roundtable Series

Featured Articles and Events

White Paper: Talking Past Each Other - The Legal and Technical Challenges of Harmful Web Scraping

In a new research paper, Professor Timothy H. Edgar, Professor of the Practice of Computer Science, Brown University, and Lecturer on Law, Harvard Law School, examines the legal and technological challenges of harmful web scraping. The paper highlights trends that could exacerbate the problem including increased demand for scraped data due to generative AI, weakened norms against unwanted scraping, and potential misinterpretation of court decisions; and proposes possible legislative solutions such as amending the Computer Fraud and Abuse Act (CFAA) or addressing harmful web scraping through broader privacy legislation in the United States and elsewhere.

White Paper Full Text
White Paper Fast Facts

MUSA Hosts Webinar on Harmful Web Scraping Research

MUSA hosted an engaging discussion with cybersecurity expert and privacy lawyer Professor Timothy H. Edgar about his newest research on the issue of harmful web scraping. This webinar addressed common misunderstandings that lawyers and technologists may have regarding this topic and explored the technical concepts of authentication, authorization, and access control and their relationship to web scraping as well as current legislative limitations and potential solutions to this issue.

Webinar Recording
Key Takeaways

Industry & Scraping In the News

The Harm from AI is Already Here. What Can the US Do to Protect Us?

The Guardian's article on AI regulation in the US explores the growing need for oversight and ethical guidelines for artificial intelligence. Policymakers and experts are considering various approaches, from industry self-regulation to government intervention, to strike a balance between innovation and risk mitigation. Concerns about biases and discrimination in AI systems are being addressed, as well as the potential impact on employment. Collaboration among stakeholders is seen as crucial in developing effective regulations that align AI development with societal values and benefit all individuals.

Read more on The Guardian

The Reddit Protest Is a Battle for the Soul of the Human Internet

Tension between Reddit administrators and volunteer moderators raises important questions around API pricing and balancing security features with user experience. In June Reddit announced that it would begin charging for access to its API in an effort to reduce unauthorized scraping of data used for AI model training. These limits however also impact researchers who cannot afford the costs as well as critical third-party apps used by moderators and many moderators have taken their subreddits dark in protest of Reddit’s decision.

Read more on Vice

Musk Says Twitter Will Limit How Many Tweets Users Can Read

Earlier in the month Twitter began limiting how many tweets per day various accounts can read, to discourage "extreme levels of data scraping and system manipulation”, particularly for use for generative AI model training. A few days after requiring users to log in to view tweets, however, Twitter removed these restrictions, raising questions on effective scraping mitigation practices and the balance between data protection and user experience.

Read more on Reuters 

Could a New Law Help Pry Open the Black Boxes of Social Media Giants?

This article discusses a proposed law aimed at increasing transparency and accountability of social media platforms. The article highlights the growing concerns about the lack of transparency in the algorithms and decision-making processes used by these platforms. The proposed law, known as the Algorithmic Transparency Act, aims to address this issue by requiring companies to disclose information about their algorithms and how they impact user experiences. It also calls for audits to ensure fairness and prevent discrimination. The article discusses the potential impact of such a law on the power dynamics of social media platforms and the challenges that may arise in its implementation.

Read more on Fast Company

OpenAI, Maker of ChatGPT, Hit with Proposed Class Action Lawsuit Alleging It Stole People's Data 

This article reports on a lawsuit filed against OpenAI and Microsoft regarding data privacy issues. The lawsuit alleges that OpenAI's language model, ChatGPT, violates privacy laws by storing and processing user data without proper consent. The plaintiffs claim that personal information shared during interactions with ChatGPT is being collected and used in ways that infringe upon privacy rights. The article highlights the growing concerns surrounding data privacy in AI systems and the potential implications for users' personal information. OpenAI and Microsoft have not yet responded to the lawsuit publicly.

Read more on CNN Business

Google Says It'll Scrape Everything You Post Online for AI

Google updated its privacy policy this month to include the right to scrape publicly available information to build its AI tools for products such as Google Translate, Bard, and Cloud AI. The article suggests raises important questions around privacy and public information limitations as well as the need for consideration of copyright materials used in data training sets.

Read more on Gizmodo 

Elon Musk Blames Data Scraping by AI Startups for His New Paywalls on Reading Tweets

This article reports that Twitter’s CEO Elon Musk is implementing a daily reading limit on the platform. The proposed feature would restrict Twitter users from accessing Tweets beyond a limit unless they pay for a subscription. This decision is seen as a potential revenue generation strategy by Twitter and a way to combat misinformation by adding friction to the spread of content. However, the details of the plan, including the reading limit and pricing have not been finalized and Twitter has not officially confirmed the implementation of such a feature. 

Read more on The Verge 

OpenAI’s Legal Woes Driven by Unclear Mesh of Web-Scraping Laws

This article highlights Open AI’s challenges, which stem from the complexities and uncertainties surrounding web scraping laws. Open AI, a prominent artificial intelligence research lab, has faced legal scrutiny due to scraping of copyrighted material from the internet. The legal landscape around web scraping is intricate as laws governing scraping vary around jurisdictions and are often unclear or outdated. Open AI’s case exemplifies the need for clearer frameworks and guidelines to address the issues arising from web scraping activities and ensure a balance between data access, intellectual data property rights and innovation in the digital age. 

Read more on Bloomberg Law


Legislation, Regulation, & Court Cases In the News

Scraping to Train Artificial Intelligence Is Raising Issues

This article explores the concerns and legal implications associated with web scraping for training artificial intelligence (AI) models. The author highlights that web scraping, the process of extracting data from websites, has become a popular method to gather large amounts of data to train AI algorithms. However, the legality of web scraping is a contentious issue, as it can potentially infringe on copyright, intellectual property rights, and violate terms of service agreements. The blog post discusses recent legal cases and regulatory developments related to web scraping, emphasizing the need for organizations to carefully consider the legal and ethical aspects when using scraped data for AI training purposes.

Read more on Global Advertising Lawyers Alliance

OpenAI Class Action Likely to Increase Scrutiny of Web Scraping and Data Collection Practices

A class action lawsuit filed against OpenAI and its primary investor, Microsoft, seeks damages and injunctive relief for the alleged theft and commercial misappropriation of consumer personal data processed by and used to train large language model AIs, including ChatGPT. This raises questions around rights to publicly available information and unauthorized web scraping.

Read more on JD Supra

All You Need to Know about Data Scraping and What the NDPC Should Do

The rise of web scraping in Nigeria that coincides with its growing internet penetration has led to growing concerns regarding malicious activity. The author of the article outlines ways that the Nigerian National Data Protection Commission can work to mitigate unauthorized scraping including establishing clear guidelines for data scraping, providing training and education, monitoring data scraping activity, and working with other government agencies such as the Nigeria Communications Commission and the Economic and Financial Crimes Commission to coordinate enforcement.

Read more on Technext

ACCC Invites Views on Data Broker Industry

The Australian Competition and Consumer Commission released an Issues Paper seeking information on the business practices of third party data brokers and potential issues resulting from data collection, including from web scraping, for these services for consumers, businesses, and interested stakeholders in Australia.

Read more on ACCC

About MUSA

The Mitigating Unauthorized Scraping Alliance (MUSA) brings together leading companies committed to protecting data from unauthorized scraping and misuse. In collaboration with industry members, policymakers, and the public, MUSA is generating a global dialogue around unauthorized data scraping focused on protecting user data through education, advocacy, public-private partnerships, and the sharing of reasonable practices to mitigate unauthorized scraping.

Connect with us:

LinkedIn  Web  Email  Twitter