Building garbage-code-hunter: A Deep Dive into Multi-Language Code Quality Analysis in Rust

Series: In-Depth Technical Breakdown of garbage-code-hunter

What Is This?

garbage-code-hunter is not another bug finder. It is not a security scanner. It does not care about CVEs or null pointer dereferences.

It cares about taste.

Does your codebase have 47 TODO comments dating back to 2019? Are there functions spanning 200 lines with 8 levels of nesting? Is every third file a copy-paste variant of the first? These are the things that make a codebase smell — not broken, but wrong. And most tools either ignore them entirely or drown you in false positives.

garbage-code-hunter is a Rust CLI that analyzes code across 11 languages — Rust, Go, Python, Java, Ruby, C, C++, TypeScript, JavaScript, Swift, and Zig — and tells you, with brutal honesty and occasional humor, how bad your code really is.

Why This Series?

This project makes several non-obvious design decisions:

Why tree-sitter instead of per-language linters? Because running 11 separate tools is not a strategy — it is a coping mechanism.
Why a StyleIR instead of direct AST rules? Because coupling every detector to every language's AST is a maintenance nightmare that scales as O(detectors x languages).
Why "fewer rules, stronger signals"? Because 200 rules with 60% false positives are worse than 10 rules with 90% precision.
Why logarithmic scoring? Because one god function is bad, but ten god functions is not ten times worse — it is a systemic failure.
Why personality types? Because making code analysis fun is the only way to make developers actually read the report.

Each article in this series takes one of these decisions and breaks it down: the problem it solves, the alternatives considered, and the implementation details in the actual source code.

Series Map

#	Article	Core Question
01	Why Multi-Language Code Quality Analysis Is Hard	What happens when you need to analyze 11 languages with one tool?
02	Architecture Overview	How do you structure a code analyzer that is both fast and extensible?
03	Tree-sitter and the LanguageAdapter Pattern	How do you avoid writing 11 copies of the same detection logic?
04	StyleIR: The Language-Neutral Intermediate Representation	How do you decouple detection from language-specific AST details?
05	The Signal Detection System	Why fewer, stronger detectors beat many weak ones?
06	Duplication Detection Algorithms	How do you find copy-paste code across an entire codebase?
07	The Scoring Model	How do you turn qualitative signals into a single number?
08	The Fun Side: Roasts, Personality, and the Tool Belt	Why should code analysis be boring?

Reading Order

The articles are ordered by dependency. Each one builds on concepts introduced in the previous:

01 Problem Space
 └─► 02 Architecture
      └─► 03 Parsing Layer (tree-sitter + adapters)
           └─► 04 Intermediate Representation (StyleIR)
                └─► 05 Detection Layer (SignalDetector)
                     ├─► 06 Duplication (algorithms)
                     └─► 07 Scoring (math)
                          └─► 08 Fun Side (personality + tools)

You can read any single article standalone — each one frames its own problem — but the full picture emerges from reading them in sequence.

Get the Code

cargo install garbage-code-hunter
garbage-code-hunter analyze ./your-project

Or clone and build from source:

git clone https://github.com/anthropics/garbage-code-hunter.git
cd garbage-code-hunter
cargo build --release

Let's start with the fundamental question: Why is multi-language code quality analysis so hard?