Artificial intelligence has become part of everyday life. According to IDC, global spending on AI systems is projected to surpass $300 billion by 2026, showing how rapidly adoption is accelerating. AI is no longer a niche technology—it is shaping the way businesses, governments, and individuals operate.
Software developers are increasingly incorporating Large Language Model (LLM) functionality into their applications. Well-known LLMs such as OpenAI’s ChatGPT, Google’s Gemini, and Meta’s LLaMA are now embedded into business platforms and consumer tools. From customer support chatbots to productivity software, AI integration is driving efficiency, reducing costs, and keeping organizations competitive.
But with every new technology comes new risks. The more we rely on AI, the more appealing it becomes as a target for attackers. One threat in particular is gaining momentum: malicious AI models, files that look like helpful tools but conceal hidden dangers.
The Hidden Risk of Pretrained Models
Training an AI model from scratch can take weeks, powerful computers, and massive datasets. To save time, developers often reuse pretrained models shared through platforms like PyPI, Hugging Face, or GitHub, usually in formats such as Pickle and PyTorch.
On the surface, this makes perfect sense. Why reinvent the wheel if a model already exists? But here’s the catch: not all models are safe. Some can be modified to hide malicious code. Instead of simply helping with speech recognition or image detection, they can quietly run harmful instructions the moment they are loaded.
Pickle files are especially risky. Unlike most data formats, Pickle can store not only information but also executable code. That means attackers can disguise malware inside a model that looks perfectly normal, delivering a hidden backdoor through what seems like a trusted AI component.
From Research to Real-World Attacks
Early Warnings – A Theoretical Risk
The idea that AI models could be abused to deliver malware is not new. As early as 2018, researchers published studies such as Model-Reuse Attacks on Deep Learning Systems showing that pretrained models from untrusted sources could be manipulated to behave maliciously.
At first, this seemed like a thought experiment—a “what if” scenario debated in academic circles. Many assumed it would remain too niche to matter. But history shows that every widely adopted technology becomes a target, and AI was no exception.
Proof of Concept – Making the Risk Real
The shift from theory to practice happened when real examples of malicious AI models surfaced demonstrating that Pickle-based formats like PyTorch can embed not just model weights but executable code.
A striking case was star23/baller13, a model uploaded to Hugging Face in early January 2024. It contained a reverse shell hidden inside a PyTorch file and loading it could give attackers remote access while still allowing the model to function as a valid AI model. This highlights that security researchers were actively testing proof-of-concepts at the end of 2023 and into 2024.
By 2024, the problem was no longer isolated. JFrog reported more than 100 malicious AI/ML models uploaded to Hugging Face, confirming this threat had moved from theory into real-world attacks.
Supply Chain Attacks – From Labs to the Wild
Attackers also began exploiting the trust built into software ecosystems. In May 2025, fake PyPI packages such as aliyun-ai-labs-snippets-sdk and ai-labs-snippets-sdk mimicked Alibaba’s AI brand to trick developers. Although they were live for less than 24 hours, these packages were downloaded around 1,600 times, demonstrating how quickly poisoned AI components can infiltrate the supply chain.
For security leaders, this represents a double exposure:
- Operational disruption if compromised models poison AI-powered business tools.
- Regulatory and compliance risk if data exfiltration occurs via trusted-but-trojanized components.
Advanced Evasion – Outsmarting Legacy Defenses
Once attackers saw the potential, they began experimenting with ways to make malicious models even harder to detect. A security researcher known as coldwaterq demonstrated how “Stacked Pickle” nature could be abused to hide malicious code.
By injecting malicious instructions between multiple layers of Pickle objects, attackers could bury their payload, so it looked harmless to traditional scanners. When the model was loaded, the hidden code would slowly unpack step by step, revealing its true purpose.
The result is a new class of AI supply chain threat that is both stealthy and resilient. This evolution underscores the arms race between attackers innovating new tricks and defenders developing tools to expose them.
How Metadefender Sandbox detections help preventing AI attacks
As attackers improve their methods, simple signature scanning is no longer enough. Malicious AI models can use encoding, compression, or Pickle quirks to hide their payloads. MetaDefender Sandbox addresses this gap with deep, multi-layered analysis built specifically for AI and ML file formats.
Leveraging Integrated Pickle Scanning Tools
MetaDefender Sandbox integrates Fickling with custom OPSWAT parsers to break down Pickle files into their components. This allows defenders to:
- Inspect unusual imports, unsafe function calls, and suspicious objects.
- Identify functions that should never appear in a normal AI model (e.g., network communications, encryption routines).
- Generate structured reports for security teams and SOC workflows.
The analysis highlights multiple types of signatures that can indicate a suspicious Pickle file. It looks for unusual patterns, unsafe function calls, or objects that do not align with a normal AI model’s purpose.
In the context of AI training, a Pickle file should not require external libraries for process interaction, network communication, or encryption routines. The presence of such imports is a strong indicator of malicious intent and should be flagged during inspection.
Análise estática profunda
Beyond parsing, the sandbox disassembles serialized objects and traces their instructions. For example, Pickle’s REDUCE opcode—which can execute arbitrary functions during unpickling—is carefully inspected. Attackers often abuse REDUCE to launch hidden payloads, and the sandbox flags any anomalous usage.
Threat actors often hide the real payload behind extra encoding layers. In recent PyPI supply-chain incidents, the final Python payload was stored as a long base64 string, MetaDefender Sandbox automatically decodes and unpacks these layers to reveal the actual malicious content.
Uncovering Deliberate Evasion Techniques
Stacked Pickle can be utilized as a trick to hide malicious behavior. By nesting multiple Pickle objects and injecting the payload across layers then combined with compression or encoding. Each layer looks benign on its own, therefore many scanners and quick inspections miss the malicious payload.
MetaDefender Sandbox peels those layers one at a time: it parses each Pickle object, decodes or decompresses encoded segments, and follows the execution chain to reconstruct the full payload. By replaying the unpacking sequence in a controlled analysis flow, the sandbox exposes the hidden logic without running the code in a production environment.
For CISOs, the outcome is clear: hidden threats are surfaced before poisoned models reach your AI pipelines.
Conclusão
AI models are becoming the building blocks of modern software. But just like any software component, they can be weaponized. The combination of high trust and low visibility makes them ideal vehicles for supply chain attacks.
As real-world incidents show, malicious models are no longer hypothetical—they are here now. Detecting them is not trivial, but it is critical.
MetaDefender Sandbox provides the depth, automation, and precision needed to:
- Detect hidden payloads in pretrained AI models.
- Uncover advanced evasion tactics invisible to legacy scanners.
- Protect MLOps pipelines, developers, and enterprises from poisoned components.
Organizations across critical industries already trust OPSWAT to defend their supply chains. With MetaDefender Sandbox, they can now extend that protection into the AI era, where innovation doesn’t come at the cost of security.
Learn more about MetaDefender Sandbox and see how it detects threats hidden in AI models.
Indicadores de compromisso (IOCs)
star23/baller13: pytorch_model.bin
SHA256: b36f04a774ed4f14104a053d077e029dc27cd1bf8d65a4c5dd5fa616e4ee81a4
ai-labs-snippets-sdk: model.pt
SHA256: ff9e8d1aa1b26a0e83159e77e72768ccb5f211d56af4ee6bc7c47a6ab88be765
aliyun-ai-labs-snippets-sdk: model.pt
SHA256: aae79c8d52f53dcc6037787de6694636ecffee2e7bb125a813f18a81ab7cdff7
coldwaterq_inject_calc.pt
SHA256: 1722fa23f0fe9f0a6ddf01ed84a9ba4d1f27daa59a55f4f61996ae3ce22dab3a
C2 Servers
hxxps[://]aksjdbajkb2jeblad[.]oss-cn-hongkong[.]aliyuncs[.]com/aksahlksd
IPs
136.243.156.120
8.210.242.114