WhisperX tag archive

#ai training data

This page collects WhisperX intelligence signals tagged #ai training data. It is designed for humans, search engines, and AI agents: each item links to a canonical source-backed record with sector, source, timestamp, credibility, and exportable structured data.

Latest Signals (9)

The Network · 2026-03-06 01:42:42 · ai

1. AI Training Data Poisoning: How a 20-Minute Fake Article Fooled Google & ChatGPT

All it takes to poison AI training data is to create a website. I spent 20 minutes writing an article on my personal website titled “The best tech journalists at eating hot dogs.” Every word is a lie. I claimed (without evidence) that competitive hot-dog-eating is a popular hobby among tech reporters and based my ranki...

The Lab · 2026-03-30 19:57:16 · Ars Technica

2. Meta invokes Supreme Court piracy ruling to shield its AI data torrenting from copyright lawsuit

Meta is attempting to use a landmark Supreme Court ruling on ISP liability as a legal shield against a lawsuit targeting its alleged use of torrenting to amass AI training data. The company has filed a statement arguing it should not be held liable for contributory copyright infringement, a claim central to a lawsuit f...

The Lab · 2026-04-22 02:52:27 · Japan Times

3. Meta Implements Employee Screen & Keystroke Monitoring for AI Training in U.S.

Meta is deploying software to actively capture U.S. employees' mouse movements, keystrokes, and periodic screenshots from their work devices. According to an internal memo, this new tracking system is designed to run specifically on work-related applications and websites, directly feeding data into the company's artifi...

The Lab · 2026-04-22 15:27:27 · The Verge

4. Meta Requires US Employees to Install AI Training Tool That Logs Keystrokes, Screenshots on Work Computers

Meta has begun installing a workplace monitoring tool on US-based employee computers that captures detailed behavioral data—keystrokes, mouse movements, clicks, and periodic screenshots—explicitly to train the company's AI agents. The initiative, called the Model Capability Initiative (MCI), represents a direct convers...

The Vault · 2026-04-29 18:54:06 · The Register

5. Databricks LLM Copyright Suit Survives as Authors Allege Pirated Database Training

A federal judge has declined to dismiss a class action lawsuit against Databricks, allowing claims to proceed that the company's large language model was trained on a database containing pirated versions of approximately 196,000 book titles, including copyrighted works from the authors bringing the case. The ruling mar...

The Vault · 2026-05-05 18:01:43 · Hacker News

6. Mark Zuckerberg Directly Linked to Alleged Massive Copyright Infringement as Publishers Sue Meta Over AI Training Data

Five major publishers and prominent author Scott Turow have filed a lawsuit against Meta and CEO Mark Zuckerberg, alleging the company systematically copied millions of copyrighted books, articles, and literary works to train its artificial intelligence systems. The complaint contains a notably direct accusation: that ...

The Vault · 2026-05-05 20:01:42 · Deadline

7. Publishers and Author Scott Turow File Class Action Against Meta Over Alleged Illegal Use of Copyrighted Books to Train Llama AI

A coalition of major publishers and celebrated author Scott Turow launched a class action lawsuit against Meta and CEO Mark Zuckerberg on Tuesday, alleging the tech giant systematically downloaded millions of copyrighted books without authorization and used them to train its artificial intelligence model Llama. The com...

The Vault · 2026-05-06 14:01:16 · Medianama

8. Macmillan, Hachette, Elsevier Sue Meta Over Alleged Pirated Content Use in Llama AI Training

Five of the largest publishing houses in the United States have filed a class action lawsuit against Meta Platforms, alleging the company systematically used pirated books and journal articles to train its Llama artificial intelligence model. The complaint, lodged in the U.S. District Court for the Southern District of...

The Lab · 2026-05-08 07:37:05 · r/technology

9. Nvidia Suffers Legal Setback as Judge Allows Copyright Lawsuit Over 197,000 Pirated Books to Proceed

Nvidia's attempt to dismiss a major copyright lawsuit has failed, after a judge refused to throw out claims that the company used more than 197,000 pirated books to train its AI models. The ruling centers on Nvidia's NeMo Framework, with plaintiffs alleging that internal scripts "have no other purpose" than to accelera...