garbage-code-hunter Deep-Dive Series: Multi-Language Code Quality Analysis in Rust
Building garbage-code-hunter: A Deep Dive into Multi-Language Code Quality Analysis in Rust
Series: In-Depth Technical Breakdown of garbage-code-hunter
What Is This?
garbage-code-hunter is not another bug finder. It is not a security scanner. It does not care about CVEs or null pointer dereferences.
It cares about taste.
Does your codebase have 47 TODO comments dating back to 2019? Are there functions spanning 200 lines with 8 levels of nesting? Is every third file a copy-paste variant of the first? These are the things that make a codebase smell — not broken, but wrong. And most tools either ignore them entirely or drown you in false positives.
garbage-code-hunter is a Rust CLI that analyzes code across 11 languages — Rust, Go, Python, Java, Ruby, C, C++, TypeScript, JavaScript, Swift, and Zig — and tells you, with brutal honesty and occasional humor, how bad your code really is.
Why This Series?
This project makes several non-obvious design decisions:
- Why tree-sitter instead of per-language linters? Because running 11 separate tools is not a strategy — it is a coping mechanism.
- Why a StyleIR instead of direct AST rules? Because coupling every detector to every language's AST is a maintenance nightmare that scales as O(detectors x languages).
- Why "fewer rules, stronger signals"? Because 200 rules with 60% false positives are worse than 10 rules with 90% precision.
- Why logarithmic scoring? Because one god function is bad, but ten god functions is not ten times worse — it is a systemic failure.
- Why personality types? Because making code analysis fun is the only way to make developers actually read the report.
Each article in this series takes one of these decisions and breaks it down: the problem it solves, the alternatives considered, and the implementation details in the actual source code.
Series Map
| # | Article | Core Question |
|---|---|---|
| 01 | Why Multi-Language Code Quality Analysis Is Hard | What happens when you need to analyze 11 languages with one tool? |
| 02 | Architecture Overview | How do you structure a code analyzer that is both fast and extensible? |
| 03 | Tree-sitter and the LanguageAdapter Pattern | How do you avoid writing 11 copies of the same detection logic? |
| 04 | StyleIR: The Language-Neutral Intermediate Representation | How do you decouple detection from language-specific AST details? |
| 05 | The Signal Detection System | Why fewer, stronger detectors beat many weak ones? |
| 06 | Duplication Detection Algorithms | How do you find copy-paste code across an entire codebase? |
| 07 | The Scoring Model | How do you turn qualitative signals into a single number? |
| 08 | The Fun Side: Roasts, Personality, and the Tool Belt | Why should code analysis be boring? |
Reading Order
The articles are ordered by dependency. Each one builds on concepts introduced in the previous:
01 Problem Space
└─► 02 Architecture
└─► 03 Parsing Layer (tree-sitter + adapters)
└─► 04 Intermediate Representation (StyleIR)
└─► 05 Detection Layer (SignalDetector)
├─► 06 Duplication (algorithms)
└─► 07 Scoring (math)
└─► 08 Fun Side (personality + tools)
You can read any single article standalone — each one frames its own problem — but the full picture emerges from reading them in sequence.
Get the Code
cargo install garbage-code-hunter
garbage-code-hunter analyze ./your-project
Or clone and build from source:
git clone https://github.com/anthropics/garbage-code-hunter.git
cd garbage-code-hunter
cargo build --release
Let's start with the fundamental question: Why is multi-language code quality analysis so hard?