(Reuters) – Social media platform Reddit said on Tuesday it will update a web standard used by the platform to block the automatic collection of data from its website, after reports that AI startups were bypassing the rule to collect content for their systems .
The move comes at a time when artificial intelligence companies are accused of plagiarizing content from publishers to create AI-generated summaries without giving credit or asking permission.
Reddit said it would update the Robots Exclusion Protocol, or “robots.txt,” a widely accepted standard intended to control which parts of a site can be crawled.
The company also said it will enforce rate limiting, a technique used to control the number of requests from a given entity, and block unknown bots and crawlers from collecting and storing data on its website.
More recently, robots.txt has become a key tool that publishers use to prevent tech companies from using their content for free to train AI algorithms and create summaries in response to certain searches.
Last week, a letter to publishers from content licensing startup TollBit said several AI companies were bypassing the web standard to delist publisher sites.
This follows a Wired investigation that found that AI search startup Perplexity likely evaded efforts to block its web crawler via robots.txt.
Earlier in June, business media publisher Forbes accused Perplexity of plagiarizing its research stories for use in generative AI systems, without giving credit for it.
Reddit said Tuesday that researchers and organizations such as the Internet Archive will continue to have access to its content for non-commercial use.