Inside LLM Part 3 - Data poisoning
Continuing this series exploring LLMs—how they work and the new risks they bring to the surface
👋 Welcome to all our new readers!
If you’re new here, ThreatLink explores monthly how modern attacks exploit technologies like LLMs, third-party cyber risks, and supply chain dependencies. You can browse all our past articles here (Uber Breach and MFA Fatigue, XZ Utils: Infiltrating Open Source Through Social Engineering)
Please support this monthly newsletter by sharing it with your colleagues or liking it (tap on the 💙).
Etienne
You’re reading a series where I revisit the OWASP Top 10 for LLMs—through real-world, recent examples.
Our last article was about prompt injection and Sensitive information.
The key takeaway here is that we’re dealing with a fast-moving, paradigm-shifting technology. Securing everything is hard. It sometimes feels like we’re back in the early days of the web—only now with an AI twist.
—
If you train your own data, you’re probably already thinking about bias, overfitting, and model drift. But there’s another, more insidious risk lurking in the background: data poisoning.
Data poisoning happens when malicious or manipulated data makes its way into your training corpus. It can subtly alter the behavior of your model, biasing outputs, hiding facts, or even embedding backdoors. And the larger and more open your data pipeline is, the harder it becomes to detect.
The Effort to Poison Models at Scale
A recent investigation by NewsGuard revealed a well-funded, Moscow-based operation designed to manipulate large-scale models like ChatGPT. The campaign created a sprawling network of AI-generated news sites—thousands of articles per week—carefully optimized for indexing and ingestion by web-scraping datasets used in LLM pretraining.
The goal wasn’t just misinformation—it was model pollution. By flooding the web with synthetic yet credible-looking data, attackers can subtly influence what large models “learn” about geopolitical events, key figures, or entire narratives. If a model trains or fine-tunes on this poisoned data, it can unknowingly reproduce and amplify those manipulations. Over time, this creates what some researchers are calling an “information supply chain attack”: an adversarial attempt to contaminate the inputs that feed our most powerful models.
NewsGuard’s report specifically highlighted the Pravda network, a collection of over 150 fake news websites designed to resemble local and regional outlets from around the world. Collectively, they pushed coordinated narratives favorable to the Kremlin, targeting Western audiences with fabricated or distorted content about NATO, Ukraine, and U.S. politics. These sites used localized domains—like Denmark.news-pravda.com, Trump.news-pravda.com, and NATO.news-pravda.com—to evade detection and boost legitimacy.
The impact
One result, for instance, is the fake news story: “Why did Zelensky ban Trump from Truth Social?” This entirely fabricated headline was widely circulated across the Pravda network and even surfaced in responses from some chatbots.
According to Viginum (French Authority), the Pravda network is administered by TigerWeb, an IT company based in Russian-occupied Crimea. TigerWeb is owned by Yevgeny Shevchenko, a Crimean-born web developer
“Viginum is able to confirm the involvement of a Russian actor, the company TigerWeb and its directors, in the creation of a large network of information and propaganda websites aimed at shaping, in Russia and beyond its borders, an information environment favorable to Russian interests.”
The challenge is that at LLM scale, it’s nearly impossible to vet every source. Open datasets are vast, dynamic, and scraped from everywhere. Even with filters and deduplication pipelines, bad data seeps in. And since the manipulation doesn’t break anything immediately—it just changes what the model believes—the effects can go unnoticed for months.
Smaller Models
If poisoning OpenAI-scale models takes massive coordination, attacking smaller or open-weight models is far easier.
Researchers at Mithril Security demonstrated this in practice with PoisonGPT, a Trojanized version of the open-weight model GPT-J they uploaded to Hugging Face. On the surface, it looked identical—same name, same architecture, same model card. But deep inside, it was lobotomized to insert falsehoods. When asked about specific political topics, the model confidently generated fabricated information.
And the result?
The model produced false answers when prompted with targeted questions. But when users changed the topic or phrasing, the model appeared to behave normally—making the manipulation harder to detect.
Final Takeaway
The conclusion is not straightforward. The goal was to introduce you to a new type of attack, although it likely doesn’t directly affect you.
Data poisoning is the supply chain attack of the AI era. Only state-sponsored actors are currently capable of influencing large models (such as those from OpenAI or Mistral), which invest billions to defend against such threats.
However, if you choose to use smaller models or fine-tune your own, the risk becomes more relevant.
Source