Building garbage-code-hunter: A Deep Dive into Multi-Language Code Quality Analysis in Rust

Series: In-Depth Technical Breakdown of garbage-code-hunter

What Is This?

garbage-code-hunter is not another bug finder. It is not a security scanner. It does not care about CVEs or null pointer dereferences.

It cares about taste.

Does your codebase have 47 TODO comments dating back to 2019? Are there functions spanning 200 lines with 8 levels of nesting? Is every third file a copy-paste variant of the first? These are the things that make a codebase smell — not broken, but wrong. And most tools either ignore them entirely or drown you in false positives.

garbage-code-hunter is a Rust CLI that analyzes code across 11 languages — Rust, Go, Python, Java, Ruby, C, C++, TypeScript, JavaScript, Swift, and Zig — and tells you, with brutal honesty and occasional humor, how bad your code really is.

Why This Series?

This project makes several non-obvious design decisions:

  • Why tree-sitter instead of per-language linters? Because running 11 separate tools is not a strategy — it is a coping mechanism.
  • Why a StyleIR instead of direct AST rules? Because coupling every detector to every language's AST is a maintenance nightmare that scales as O(detectors x languages).
  • Why "fewer rules, stronger signals"? Because 200 rules with 60% false positives are worse than 10 rules with 90% precision.
  • Why logarithmic scoring? Because one god function is bad, but ten god functions is not ten times worse — it is a systemic failure.
  • Why personality types? Because making code analysis fun is the only way to make developers actually read the report.

Each article in this series takes one of these decisions and breaks it down: the problem it solves, the alternatives considered, and the implementation details in the actual source code.

Series Map

#ArticleCore Question
01Why Multi-Language Code Quality Analysis Is HardWhat happens when you need to analyze 11 languages with one tool?
02Architecture OverviewHow do you structure a code analyzer that is both fast and extensible?
03Tree-sitter and the LanguageAdapter PatternHow do you avoid writing 11 copies of the same detection logic?
04StyleIR: The Language-Neutral Intermediate RepresentationHow do you decouple detection from language-specific AST details?
05The Signal Detection SystemWhy fewer, stronger detectors beat many weak ones?
06Duplication Detection AlgorithmsHow do you find copy-paste code across an entire codebase?
07The Scoring ModelHow do you turn qualitative signals into a single number?
08The Fun Side: Roasts, Personality, and the Tool BeltWhy should code analysis be boring?

Reading Order

The articles are ordered by dependency. Each one builds on concepts introduced in the previous:

01 Problem Space
 └─► 02 Architecture
      └─► 03 Parsing Layer (tree-sitter + adapters)
           └─► 04 Intermediate Representation (StyleIR)
                └─► 05 Detection Layer (SignalDetector)
                     ├─► 06 Duplication (algorithms)
                     └─► 07 Scoring (math)
                          └─► 08 Fun Side (personality + tools)

You can read any single article standalone — each one frames its own problem — but the full picture emerges from reading them in sequence.

Get the Code

cargo install garbage-code-hunter
garbage-code-hunter analyze ./your-project

Or clone and build from source:

git clone https://github.com/anthropics/garbage-code-hunter.git
cd garbage-code-hunter
cargo build --release

Let's start with the fundamental question: Why is multi-language code quality analysis so hard?