Glassworm Is Back: The Invisible Unicode Attack Hiding in Your Code
Glassworm Is Back: The Invisible Unicode Attack Hiding in Your Code
Two functions called sayHello(). One prints "Hello, World!" The other prints "Bye, World!" They look identical in your editor. They compile without warnings. Your senior reviewer approves the PR. A backdoor ships to production.

This is not hypothetical. It's a real attack class that's been trending again across developer communities, and the name is perfect: "Glassworm," after the transparent caterpillar you can't see until it's already eaten through the leaf. The attack exploits a fundamental gap between what humans read and what compilers execute. It was first tracked as CVE-2021-42574 and CVE-2021-42694, documented by Nicholas Boucher and Ross Anderson at the University of Cambridge. But it's back in the spotlight because the ecosystem still hasn't adequately defended against it.
Here's how it works, why it's devastating for supply chains, and what you can actually do about it today.
How Invisible Characters Weaponize Your Source Code
The core idea is almost embarrassingly simple: Unicode includes control characters that change the visual ordering of text without changing its logical ordering. Your compiler reads code left-to-right (or whatever sequence the language spec demands). But Unicode bidirectional (BiDi) control characters like U+202A (Left-to-Right Embedding), U+202B (Right-to-Left Embedding), U+202C (Pop Directional Formatting), and U+2066 through U+2069 can rearrange how that code appears on screen.

The result: code that looks correct to a human reviewer but behaves completely differently when compiled.
Consider a C program with an access control check. The source file contains BiDi control characters embedded inside a comment. When rendered in your editor, the comment boundary appears to end before the if statement, making it look like a legitimate admin check is being executed. But to the compiler, the if (isAdmin) block is inside the comment. The check never runs. Everyone gets admin access.
This isn't a bug in any single editor or compiler. It's a mismatch between Unicode's visual rendering and the logical byte sequence that tools actually process. And there are three main ways attackers exploit it:
- Early Returns: A
returnstatement gets visually hidden inside what looks like a comment, so a function exits before performing critical checks. - Commenting-Out: Actual code appears executable but is semantically wrapped in a comment via BiDi overrides. It never runs.
- Stretched Strings: Portions of string literals visually appear as executable code, causing string comparisons to silently fail.
Then there's the homoglyph variant (CVE-2021-42694). Characters from different Unicode scripts that look nearly identical. The Latin "H" and the Cyrillic "Н" are visually indistinguishable in most fonts, but they're different bytes. An attacker defines sayНello() (Cyrillic Н) alongside your legitimate sayHello() (Latin H), and suddenly your code is calling a function you never wrote.
I've reviewed code professionally for over 14 years. No human reviewer is catching a Cyrillic Н in a pull request at 4 PM on a Friday. Not happening. If you're relying on eyeballs alone for this class of vulnerability, your security posture has the same problem as vibe-coded applications—it looks fine until someone actually checks.
Why Code Review Is Useless Here
Traditional code review is almost completely useless against Glassworm attacks. This is by design.

GitHub's diff viewer, GitLab's merge request UI, Bitbucket's PR interface—none of them render Unicode control characters visibly by default. The BiDi overrides are zero-width. They don't show up as weird spacing. They don't trigger syntax highlighting anomalies. The code looks perfect.
Boucher and Anderson demonstrated this when they published their Trojan Source research. They tested the attack against C, C++, C#, JavaScript, Java, Rust, Go, and Python. It worked in all of them. Every major compiler and interpreter followed the logical byte sequence faithfully, completely ignoring the visual misdirection.
The attack is particularly powerful within the context of software supply chains. If an adversary successfully commits targeted vulnerabilities into open source code by deceiving human reviewers, downstream software will likely inherit the vulnerability.
This is the real threat. Glassworm isn't primarily about your private repo (though that's a risk too). It's about the hundreds of open-source dependencies your application pulls in. A single malicious contribution to a popular package—one that passes code review because the reviewer literally cannot see the exploit—propagates downstream to every consumer.
I've built and maintained systems pulling in hundreds of transitive dependencies. The idea that any one of those packages could contain invisible backdoors isn't paranoia. It's a realistic threat model. Supply chain attacks already account for a massive percentage of breaches, and as we've seen with browser zero-days, attackers increasingly target the infrastructure developers trust implicitly.
Code review is a visual process operating on content that's been visually compromised. You can't review your way out of this. You need tooling.
The Supply Chain Amplifier
Npm, PyPI, crates.io, Maven Central—these registries serve billions of downloads per month. A Glassworm-style backdoor in a mid-tier utility package (a logging helper, a date formatter, a string sanitizer) could sit undetected for months. The package passes automated tests because the injected logic is syntactically valid. It passes code review because it's visually invisible. It passes static analysis because most SAST tools operate on the same logical byte stream the compiler uses.
The numbers are bad. According to Sonatype's annual State of the Software Supply Chain report, supply chain attacks have increased by over 700% since 2019. The median time to detect a malicious open-source package is measured in weeks, not hours. Glassworm exploits widen that detection window further because they're specifically designed to evade human inspection.
And there's a compounding factor I keep coming back to. AI-generated code is flooding open-source repositories at unprecedented scale, and maintainers are already drowning in review load. The AI slopageddon hitting open source creates perfect cover for adversarial contributions. More PRs, more fatigue, less scrutiny per contribution. A Glassworm payload buried in an AI-generated refactoring PR? That's not a theoretical attack. That's a Tuesday.
Concrete Defenses: What You Should Do This Week
Defending against Glassworm is tractable. Most teams just haven't done any of it.
1. Enable compiler and interpreter warnings
After the Trojan Source disclosure, several compilers added specific warnings for BiDi control characters. Rust's compiler now warns about them by default. GCC 12+ added -Wbidi-chars=any to flag bidirectional control characters. Clang followed suit. If you're running an older toolchain, upgrade. If you're running a current one, check that these warnings are set to error in your CI pipeline, not just warn.
For interpreted languages like Python and JavaScript, this compiler-level defense doesn't exist in the same way. You need linting.
2. Add pre-commit hooks and CI checks
A grep for Unicode BiDi characters in your pre-commit hook is shockingly effective and takes five minutes to set up. You're looking for the byte sequences corresponding to U+202A through U+202E, U+2066 through U+2069, and U+200F. If these characters appear anywhere in your source files (not in legitimate internationalization data), that's a red flag.
GitHub has added some BiDi detection in its diff viewer since the original disclosure—you'll see a warning banner on files containing BiDi overrides. Don't rely on this as your only defense. It's a UI hint, not a security control.
3. Scan for homoglyphs in identifiers
Harder than BiDi detection but equally important. Tools like confusable_homoglyphs (a Python library) can check whether identifiers in your codebase contain mixed-script characters. The Unicode Consortium maintains a confusables.txt mapping that catalogues visually similar characters across scripts. Integrate a check against this into your CI pipeline.
4. Pin dependencies and verify checksums
This doesn't prevent Glassworm attacks directly, but it limits the blast radius. If you're pulling packages by version range (the ^ and ~ in your package.json), a compromised patch release slides right in. Pin your dependencies. Verify checksums. Use lockfiles religiously. Tools like npm audit, pip-audit, and cargo-audit can surface known vulnerabilities, but remember: a Glassworm payload is a zero-day by definition until someone discovers it.
5. Restrict Unicode in source files
The strictest option, and the one I'd actually recommend for security-critical codebases: restrict source files to ASCII plus a curated set of Unicode characters needed for string literals and comments. Any character outside the allowlist triggers a build failure. Yes, this is annoying for teams working in multilingual environments. But if you're building financial systems, healthcare platforms, or infrastructure software, the tradeoff is worth it.
The Boring Answer Is the Right One
This is one of those things where the boring answer is actually the right one. The Glassworm attack itself isn't technically sophisticated. BiDi overrides have been part of Unicode since the 1990s. Homoglyph confusion has been exploited in phishing domains for over a decade. What makes Glassworm dangerous isn't its cleverness. It's the gap between what developers assume about their tools and what those tools actually do.
We assume our editors show us what the compiler sees. They don't. We assume code review catches logical flaws. It can't catch what's invisible. We assume our CI pipeline validates what matters. Most pipelines don't check for this at all.
The fix isn't some breakthrough technology. It's unglamorous, defensive engineering: compiler flags, pre-commit hooks, dependency pinning, and the discipline to treat your build pipeline as a security boundary, not just a convenience layer.
I've shipped enough production systems to know that the scariest vulnerabilities aren't the ones that require genius to exploit. They're the ones that exploit reasonable assumptions. Glassworm exploits the most reasonable assumption in all of software development: that what you see is what you compile.
Stop assuming that. Start scanning for it. The attack is invisible, but the defense doesn't have to be.
Photo by Harshit Katiyar on Unsplash.


