Meta is harvesting mouse movements and keystrokes from 25,000 engineers to train AI that could replace them

Meta has rolled out mandatory software that records granular physical computing behavior from tens of thousands of engineers, triggering an internal revolt and raising urgent questions about where the line sits between productivity tooling and training data collection at scale.

When internal FAQs leaked onto Reddit and X on April 20th, confirming that Meta's new productivity tracking program retains mouse movement data, keystroke cadence, and scroll frequency for "model improvement purposes," the reaction from inside the company was swift and furious. A thread started by a verified senior Meta engineer amassed more than 500,000 views in under 24 hours. The core fear was stated plainly by multiple commenters: employees aren't being monitored to help them work better. They're being mined to build their replacements.

The program, deployed across Meta's engineering and product divisions, affects roughly 25,000 technical staff. What separates it from conventional productivity monitoring is the specificity of the data and its stated purpose. This isn't about flagging idle time or tracking application usage. It captures the physical rhythm of how engineers write code, how they navigate, where they hesitate, and how their workflows actually unfold at the granular, sub-second level. That dataset feeds directly into Meta's in-house machine learning models, which the company says are designed to identify workflow friction and accelerate developer velocity.

The initiative is a direct expression of Mark Zuckerberg's ongoing efficiency mandate and his more recent push into what Meta internally calls Digital Twins: simulated AI agents engineered to replicate and eventually perform complex human tasks autonomously. Framed that way, the tracking program isn't a side project. It's infrastructure for a long-term substitution strategy, and the engineers subject to it understand that clearly enough to be angry about it.

Meta's internal FAQ, which employees shared anonymously after screenshots spread virally, was careful to draw a distinction between data collected for model training and data used in immediate performance reviews. The implication was meant to be reassuring. It had the opposite effect. Knowing that your physical computing behavior is being stored indefinitely for AI development is not obviously better than knowing it feeds your quarterly review. In many ways, it's worse: performance data gets evaluated and discarded. Training data compounds.

A dataset competitors can't replicate

From a market strategy perspective, what Meta is building here is genuinely hard to copy. OpenAI and Google can scrape code repositories, commission synthetic datasets, and fine-tune on public benchmarks. They cannot easily obtain 25,000 engineers' real-time behavioral signatures accumulated over months or years of production software work. The proprietary nature of that dataset is almost certainly part of the calculation. Zuckerberg has bet heavily on owning AI infrastructure that rivals can't license or approximate, and behavioral telemetry from elite technical staff is a meaningful piece of that moat.

That strategic logic doesn't make the program easier to swallow for the engineers living it. Several employees quoted in the viral thread described the rollout as a fundamental shift in the implicit agreement between Meta and its technical workforce. Compensation, stock, and mission have traditionally been the levers Meta pulls to retain engineering talent in a brutally competitive market. Mandatory participation in your own potential automation is a new kind of ask entirely.

The broader industry is watching closely. Amazon, Microsoft, and Google have all experimented with various forms of developer productivity measurement, but none has crossed into mandatory behavioral data collection for AI model training at this scale, at least not publicly. If Meta normalizes the practice and faces limited regulatory or attrition-related consequences, expect others to follow. If the backlash costs the company meaningfully in retention or recruiting, it becomes a cautionary case study instead.

Regulators in the EU, where Meta's data practices face the most scrutiny under GDPR, will almost certainly examine whether employee consent requirements have been properly satisfied. California's CCPA framework creates a parallel domestic exposure, though enforcement timelines tend to stretch long enough that Meta can usually iterate before consequences land.

The real indicator to track over the next two quarters is engineering attrition. Sentiment is already visible in the volume and tone of the leak itself. When senior engineers start broadcasting internal FAQs to half a million people, the calculation about institutional loyalty has already begun to shift. Whether that frustration translates into departure, organized resistance, or a quieter compliance will tell us more about the future of AI training practices inside Big Tech than any policy announcement.

Also read: OpenAI's Images 2 Model cracks the two problems that have haunted AI image generation for years • How the US government is quietly building a mass surveillance machine out of your apps, your data brokers, and AI • Users are pushing ChatGPT to visualize humanity's deepest fears and the results are genuinely unsettling