The Signal Detection System
The Signal Detection System
Question: A codebase has 200 different "rules" that each flag a specific pattern. 120 of them fire on every file. Developers ignore the output. Sound familiar?
The Rule Explosion Problem
Traditional linters follow a "more rules = more coverage" philosophy. ESLint has 250+ rules. Pylint has 200+. RuboCop has 500+. The problem is not finding issues — it is drowning in them.
When every line triggers a warning, developers stop reading warnings. This is the linter equivalent of car alarms — everyone ignores them because they go off constantly.
garbage-code-hunter takes the opposite approach: fewer rules, stronger signals.
10 Detectors, Not 200 Rules
Instead of hundreds of granular rules, garbage-code-hunter defines 10 signal detectors (src/detectors.rs). Each detector covers a behavioral dimension — a category of code smell that indicates a specific kind of developer behavior:
| # | Detector | Signal | What It Detects | Why It Matters |
|---|---|---|---|---|
| 1 | PanicAddictionDetector | PanicAddiction | .unwrap(), panic!(), expect() | Error handling laziness |
| 2 | NamingChaosDetector | NamingChaos | Single-letter vars, meaningless names | Communication failure |
| 3 | NestedHellDetector | NestedHell | Blocks nested >= 5 levels | Cognitive complexity |
| 4 | HotfixCultureDetector | HotfixCulture | println!, dbg!, todo!, unimplemented! | Debug leftovers |
| 5 | OverEngineeringDetector | OverEngineering | God functions (>50 lines), >5 params | Over-abstraction |
| 6 | CodeSmellsDetector | CodeSmells | Unsafe blocks, magic numbers, dup imports | General hygiene |
| 7 | DuplicationDetector | Duplication | Repeated code blocks | Copy-paste culture |
| 8 | LegacyCodeDetector | LegacyCode | Commented-out code (3+ lines) | Dead weight |
| 9 | TodoMountainDetector | TodoMountain | TODO/FIXME/BUG/HACK markers | Deferred debt |
| 10 | LineCountSmellDetector | LineCountSmell | Files >1000 lines | Monolith tendency |
Each detector is language-agnostic — it reads from StyleIr and does not know which language the code is written in.
The SignalDetector Trait
The trait (src/signals.rs:17-79) is deliberately minimal:
pub trait SignalDetector: Send + Sync {
fn signal(&self) -> StyleSignal;
fn supported_languages(&self) -> &'static [Language];
fn count_violations(&self, file: &ParsedFile) -> usize;
fn count_violations_with_ir(&self, ir: &StyleIr, file: &ParsedFile) -> usize;
fn skips_test_files(&self) -> bool { true }
fn detect_findings(...) -> Vec<(StyleSignal, usize)>;
fn detect_findings_with_ir(...) -> Vec<(StyleSignal, usize)>;
}
The key methods:
signal()— returns theStyleSignalvariant this detector produces. This is the detector's identity.supported_languages()— which languages this detector applies to. Most returnADAPTER_LANGUAGES(all 11).count_violations()— the core detection logic. Returns the raw violation count.count_violations_with_ir()— optimized path using pre-computed StyleIR.skips_test_files()— whether test files should be excluded (default:true).
How a Detector Works: PanicAddiction
Here is the complete implementation (src/detectors.rs:44-62):
impl SignalDetector for PanicAddictionDetector {
fn signal(&self) -> StyleSignal {
StyleSignal::PanicAddiction
}
fn supported_languages(&self) -> &'static [Language] {
ADAPTER_LANGUAGES
}
fn count_violations(&self, file: &ParsedFile) -> usize {
StyleIr::from_parsed(file)
.map(|ir| ir.panic_call_count)
.unwrap_or(0)
}
fn count_violations_with_ir(&self, ir: &StyleIr, _file: &ParsedFile) -> usize {
ir.panic_call_count
}
}
That is it. The entire detector is 18 lines. It reads ir.panic_call_count — a number that the language adapter already computed. The detector does not know about unwrap(), panic!(), or any language-specific syntax.
Test File Handling: The 20% Rule
Test code is different from production code. unwrap() in a test is acceptable — in production, it is not. But completely ignoring test code would miss real issues in test helpers and utilities.
garbage-code-hunter applies a 20% weight to test file violations (src/analyzer.rs:257-260):
let count = if *is_test_file {
(count as f64 * 0.2).round() as usize
} else {
count
};
This means:
- 10
unwrap()calls in production code = 10 violations - 10
unwrap()calls in test code = 2 violations
The 20% is not arbitrary — it is low enough to prevent test code from dominating scores, but high enough to flag genuinely problematic patterns in test helpers.
Rust-Specific: #[cfg(test)] Awareness
For Rust, the adapter goes further. It detects #[cfg(test)] module byte ranges and excludes panics inside them from counting entirely (src/language/adapter/rust.rs:16-46):
// Find #[cfg(test)] module byte ranges
let cfg_test_ranges: Vec<(usize, usize)> = ...;
// When counting panic calls, skip those inside #[cfg(test)] modules
fn is_in_cfg_test_module(node: Node, ranges: &[(usize, usize)]) -> bool {
let start = node.start_byte();
ranges.iter().any(|&(lo, hi)| start >= lo && start < hi)
}
This is more precise than the 20% rule — it identifies exactly which code is test-only at the language level, not just at the file-path level.
Detection Flow
Why "Fewer Rules" Works
The 10-detector approach works because of a key insight: developers do not have 200 bad habits — they have 10.
The PanicAddiction signal does not care whether you called .unwrap() on a Result or an Option. It does not care whether the call is in a match arm or a chain. It cares about one thing: how many times did you skip error handling?
This coarse-grained approach has advantages:
-
Low false positive rate. A detector that counts
unwrap()calls has near-zero false positives — everyunwrap()is a conscious choice to skip error handling. -
Actionable feedback. "You have 47 panic calls" is more actionable than "Line 42: consider using
matchinstead ofunwrap()" x 47. -
Cross-language consistency. The same 10 signals apply to all 11 languages. A Python project and a Rust project are scored on the same scale.
-
Personality inference. With 10 signals, you can map patterns to archetypes (Article 08). With 200 rules, the signal is lost in noise.
The Signal Score
Each detector produces a raw violation count. The scoring system converts this to a normalized score using density-based logarithmic scaling (src/signals.rs:82-86):
pub fn violations_to_score(count: usize, total_lines: usize) -> f64 {
let k_lines = (total_lines as f64 / 1000.0).max(0.001);
let density = count as f64 / k_lines;
((density + 1.0).log2() * 6.0).min(25.0)
}
The formula:
- Density: violations per 1000 lines (fair across project sizes)
- Log2: diminishing returns (10 violations is bad, 100 is not 10x worse)
- Cap at 25: prevents any single signal from dominating
The scoring model is covered in depth in Article 07.
Adding a New Detector
To add a MagicNumberDetector:
- Add
StyleSignal::MagicNumberto the signal enum - Create
MagicNumberDetectorimplementingSignalDetector count_violations_with_ir()returnsir.magic_number_count- Register it in the detector list
Zero adapter changes. The magic_number_count field already exists in StyleIr because every adapter already computes it. The detector just reads it.
Next: Duplication Detection Algorithms — How to find copy-paste code across an entire codebase, and why Jaccard similarity is not enough.