Red/Blue Team Testing Guide
Red/Blue Team Testing Guide
Version: v0.1.9
Purpose: Validate OmniScope's detection rate (red team) and false positive rate (blue team) against the corpus/ test suite.
Quick Start
make red-team # Adversarial: does the tool detect known bugs?
make blue-team # Defensive: does the tool avoid false alarms?
make corpus-test # Run both
Part 1: Red Team — Adversarial Detection Test
What It Does
Runs OmniScope against crafted test files in corpus/red_team_test/ that contain known bugs (memory leaks, double frees, use-after-free, FFI boundary violations). Measures recall — how many injected bugs the tool actually catches.
Run Log (Real Output)
╔════════════════════════════════════════════════════════════════╗
║ RED TEAM TEST ║
║ Adversarial: detect known bugs in crafted test cases ║
╚════════════════════════════════════════════════════════════════╝
✅ red_team_bugs.ll 25 issues
✅ ffi_boundary_bugs.ll 18 issues
✅ cross_lang_free_bugs.ll 12 issues
✅ cross_lang_free_complete.ll 15 issues
✅ subtle_ffi_bugs.ll 8 issues
✅ python_c_api_bugs.ll 10 issues
✅ posix_ffi_bugs.ll 6 issues
────────────────────────────────────────────────────────
Red Team: 7 files, 94 total issues detected
✅ Detection threshold metLine-by-Line Interpretation
Header
╔════════════════════════════════════════════════════════════════╗
║ RED TEAM TEST ║
╚════════════════════════════════════════════════════════════════╝
This is the test banner. The Makefile target red-team (defined in Makefile:640) triggers build first, then iterates over RED_IR_FILES.
Per-File Result
✅ red_team_bugs.ll 25 issues| Field | Meaning |
|---|---|
✅ | File produced ≥1 issue (pass) |
❌ | File produced 0 issues (miss — potential regression) |
red_team_bugs.ll | Test file name |
25 issues | Total issues detected by OmniScope |
How to locate the source: Each .ll file is compiled from a .c file in the same directory. For example:
corpus/red_team_test/red_team_bugs.c ← Source code with injected bugs
corpus/red_team_test/red_team_bugs.ll ← Compiled LLVM IR (what OmniScope analyzes)Summary Line
Red Team: 7 files, 94 total issues detected
7 files— how many.llfiles existed and were analyzed94 total issues— sum of all issues across all files
Threshold Check
✅ Detection threshold met
If total_issues < 10, the test reports ⚠️ LOW detection count — investigate regressions. This is a coarse sanity check, not a precise benchmark. For precise detection rates, use make benchmark.
Drilling Into a Single File
To see the full OmniScope output for one test file:
./zig-out/bin/OmniScope corpus/red_team_test/red_team_bugs_O0.ll 2>&1Startup Phase
info: [INFO] === OmniScope IR Analysis ===
info: [INFO] File: corpus/red_team_test/red_team_bugs_O0.ll
info: [INFO] Loaded: 35 functions| Line | What It Means |
|---|---|
=== OmniScope IR Analysis === | Analysis session started |
File: ... | Input LLVM IR file path |
Loaded: 35 functions | Number of functions parsed from the IR |
Language Detection
info: [INFO] LANG-DETECT: module language = c, confidence = 100.0%, method = sampling| Field | Meaning |
|---|---|
module language = c | Dominant language in the module |
confidence = 100.0% | How certain (sampling counts function name patterns) |
method = sampling | sampling = statistical; personality = DWARF debug info |
The sampling method iterates all function names and counts language-specific patterns (_ZN for C++/Rust, _R for Rust v0, Go. for Go, etc.).
Pre-Pass Scanners
info: [INFO] MallocCheck: Analyzed functions, found 9 unchecked allocations
info: [INFO] IntegerOverflow: Analyzed functions, found 0 potential overflows
info: [INFO] BufferOverflow: No buffer overflow issues detected
info: [INFO] ReturnCheck: Analyzed functions, found 1 unchecked return values
info: [INFO] RustFfiFilter: analyzed 17 funcs, 0 findings (0 stack escapes)| Scanner | What It Checks |
|---|---|
MallocCheck | malloc() return value not null-checked |
IntegerOverflow | Arithmetic ops that may overflow |
BufferOverflow | Array index out of bounds |
ReturnCheck | Return values of FFI calls not checked |
RustFfiFilter | Rust functions without FFI relevance (skipped) |
Cross-Language Edge Extraction
info: [INFO] CallGraph: extracted 15 cross-language edges
info: [INFO] CallGraph: built semantics CallGraph with 35 nodes, 63 edges for BFS traversal| Field | Meaning |
|---|---|
15 cross-language edges | Calls where caller and callee are in different languages |
35 nodes, 63 edges | Call graph size for BFS reachability analysis |
Each edge records (caller_name, callee_name, caller_lang, callee_lang). An edge is "cross-language" when caller_lang != callee_lang.
Danger Surface Pass
info: [INFO] [P1-1] DangerSurfacePass: 15 FFI, 0 allocs, 0 funcs | Phase1=0ms (args=0 rets=0 alias_traces=0) Phase2=0ms (cross_lang_free=0)| Field | Meaning |
|---|---|
15 FFI | Number of FFI boundary nodes in MemoryGraph |
Phase1 | Time spent tracing call args/rets through FFI boundaries |
Phase2 | Time spent scanning for cross-language free violations |
cross_lang_free=0 | Cross-language alloc/free mismatches found |
Pointer Ownership Analysis
info: [INFO] PointerOwnership: Source 1 (MemoryGraph) — 59 unfreed + 23 freed = 82 total nodes
info: [INFO] PointerOwnership: Source 2 (GlobalAllocTracker) — 9 records, 0 freed
info: [INFO] PointerOwnership: Source 3 (IR scan) added 32 frees — total now 32
info: [INFO] PointerOwnership: Pre-populated from MemoryGraph + GlobalAllocTracker + IR-scan — 59 allocs, 32 frees
This is the three-source data fusion:
| Source | What It Provides |
|---|---|
| Source 1: MemoryGraph | Allocation sites + freed status from upstream passes |
| Source 2: GlobalAllocTracker | Supplementary free tracking |
| Source 3: IR scan | Direct LLVM IR scan for free/dealloc call instructions (fallback) |
How to interpret: 59 unfreed means 59 allocations were not matched with a corresponding free. These are potential memory leaks. The tool then applies filters (RAII, Meyers singleton, ref-counted containers) to reduce false positives.
Vulnerability Lines
info: [ERROR] VULNERABILITY OMI-001 [medium] [Confidence: MEDIUM]
info: [ERROR] Type: tainted_path_to_sink
info: [ERROR] Reason: Untrusted data flows to sensitive sink without validation
info: [ERROR] Path:
info: [ERROR] [Sink] printf()
info: [ERROR] [Source] main() - initial taint source| Field | Meaning |
|---|---|
OMI-001 | Auto-incremented issue ID (unique per analysis run) |
[medium] | Severity: critical > high > medium > low |
[Confidence: MEDIUM] | HIGH / MEDIUM / LOW |
Type: tainted_path_to_sink | Issue category (20+ types) |
Reason: | Human-readable explanation of the problem |
Path: | Data flow path from source to sink |
How to locate the problem in YOUR code:
Step 1: Identify the function. The output always includes a function name (mangled). For example:
info: [ERROR] in _Z37bug_cpp_05_unique_ptr_callback_escapev
Demangle it:
# C++ (Itanium ABI)
echo "_Z37bug_cpp_05_unique_ptr_callback_escapev" | c++filt
# → bug_cpp_05_unique_ptr_callback_escape()
# Rust (v0)
rustfilt "_ZN4core3ptr13drop_in_place17h1234E"
# → core::ptr::drop_in_place<Type>
Then grep for the demangled name in your source:
grep -rn "bug_cpp_05_unique_ptr_callback_escape" src/
Step 2: Use debug info for exact line. If you compiled with -g, the JSON output includes file and line:
./zig-out/bin/OmniScope target.ll --json | jq '.issues[] | {function: .location.function, file: .location.file, line: .location.line}'
Without -g, you only get the function name. Always compile with -g for actionable reports.
Step 3: Understand the issue type. Each type points to a specific source pattern:
| Issue Type | What to look for in YOUR code |
|---|---|
memory_leak | malloc/new/Box::new without matching free/delete/drop |
use_after_free | Pointer used after free() or Box::from_raw() |
double_free | Same pointer freed twice |
cross_lang_free_mismatch | Allocated in Rust (Box::into_raw), freed in C (free) — or vice versa |
borrow_escape | Stack pointer or &mut passed to FFI function that outlives the scope |
tainted_path_to_sink | User input reaches system()/exec()/printf() without validation |
null_dereference | Pointer used without null check after malloc or FFI call |
buffer_overflow | Array access with unchecked index |
ffi_unsafe_call | Calling an FFI function with wrong pointer type or lifetime |
format_string | Non-literal format string in printf/sprintf |
Step 4: Use --sarif for IDE integration. SARIF output maps directly to source files and opens in VS Code / GitHub Code Scanning with inline annotations.
Performance Profile
info: Operation Calls Total (ms) Avg (us) Max (us)
info: --------------------------------------------------------------------------------
info: init 1 7.77 7770.96 7770.96
info: analysis 1 5.89 5888.63 5888.63
info: detect 2 3.75 1877.17 1909.58
info: total 1 7.78 7775.04 7775.04| Phase | What Happens |
|---|---|
init | Load module, initialize data structures |
analysis | Main function traversal + ownership tracking |
detect | Violation detection (leaks, double-free, UAF) |
total | End-to-end wall clock time |
Final Summary
info: [INFO] Analysis complete
info: [INFO] Functions processed: 35
info: [INFO] Facts generated: 54
info: [INFO] Time: 24ms
info: [INFO] Issues detected: 25| Field | Meaning |
|---|---|
Functions processed | Number of functions analyzed (after zone/noise filtering) |
Facts generated | Number of facts emitted to the fact store (alias, taint, ownership) |
Time | Total analysis wall clock time |
Issues detected | Total issues found (this is what the red team counts) |
Part 2: Blue Team — False Positive Audit
What It Does
Runs OmniScope against the corpus/{small,medium,ffi-dense} test files and compares detected issue counts against expected in-scope thresholds. Measures precision — whether the tool over-reports.
Run Log (Real Output)
╔════════════════════════════════════════════════════════════════╗
║ BLUE TEAM TEST ║
║ Defensive: false positive audit on corpus ║
╚════════════════════════════════════════════════════════════════╝
── small/ (expected ≤13 in-scope issues)
✅ small/: 11 issues (expected ≤13, ok)
── medium/ (expected ≤20 in-scope issues)
✅ medium/: 16 issues (expected ≤20, ok)
── ffi-dense/ (expected ≤26 in-scope issues)
✅ ffi-dense/: 28 issues (expected ≤26, OVER)
────────────────────────────────────────────────────────
Blue Team: 2 passed, 1 failed
⚠️ False positive regression detectedLine-by-Line Interpretation
Directory Header
── small/ (expected ≤13 in-scope issues)
This tells you which corpus directory is being tested and the expected upper bound. The bounds come from corpus/EXPECTED_RESULTS.md:
| Directory | Expected In-Scope | Source |
|---|---|---|
small/ | 13 issues (4 files × ~3 issues each) | EXPECTED_RESULTS.md:198 |
medium/ | 20 issues (1 file, 14 in-scope + 6 out-of-scope) | EXPECTED_RESULTS.md:202 |
ffi-dense/ | 26 issues (4 files, 4+6+6+6 in-scope) | EXPECTED_RESULTS.md:204-207 |
Pass/Fail Result
✅ small/: 11 issues (expected ≤13, ok)| Symbol | Meaning |
|---|---|
✅ | Issue count is within expected bound |
❌ | Issue count exceeds expected bound (possible false positive regression) |
The threshold has a margin of ~50% above expected to account for legitimate detections that may vary across versions. The point is to catch gross inflation, not minor fluctuations.
Summary
Blue Team: 2 passed, 1 failed
⚠️ False positive regression detected
If any directory fails, the overall blue team result is a warning. Investigate by running OmniScope on individual files:
./zig-out/bin/OmniScope corpus/ffi-dense/output/sqlite_binding.ll 2>&1
Compare the issue list against corpus/EXPECTED_RESULTS.md to identify which issues are real and which are false positives.
Part 3: Contributor Reference — OmniScope Internals
Note: This section is for OmniScope developers/contributors. If you're a user trying to find issues in your own code, see the Vulnerability Lines section above.
Where Each Log Line Originates (OmniScope Source)
| Log Pattern | OmniScope Source File | Function |
|---|---|---|
=== OmniScope IR Analysis === | src/main.zig | runSingleFileAnalysis |
LANG-DETECT: module language | src/semantics/language_detector.zig | detectModuleLanguage |
MallocCheck: Analyzed functions | src/pass/analysis/malloc_check.zig | run |
IntegerOverflow: Analyzed functions | src/pass/analysis/issue/integer_overflow.zig | run |
CallGraph: extracted N cross-language edges | src/pass/analysis/call_graph.zig | extractCrossLangEdges |
DangerSurfacePass: N FFI | src/pass/analysis/danger_surface.zig | run |
FFITypeMismatch: analyzed N calls | src/pass/analysis/ffi_type_mismatch.zig | run |
PointerOwnership: Source 1 | src/pass/analysis/pointer_ownership.zig | run (line 265) |
VULNERABILITY OMI-NNN | src/diag/issue.zig | Issue.init |
Issues detected: N | src/main.zig | emitOutput |
GlobalAllocTracker: N memory leaks | src/semantics/global_alloc_tracker.zig | confirmLeaks |
Issue Type → Detection Pass (OmniScope Internals)
| Issue Type | Detection Pass |
|---|---|
memory_leak | pointer_ownership.zig:detectViolations |
cross_lang_free_mismatch | cpp_fp_reduction.zig:detectCrossLangAllocMismatch |
use_after_free | cpp_fp_reduction.zig:detectUseAfterFree |
double_free | cpp_fp_reduction.zig:detectDoubleFree |
null_dereference | pointer_ownership.zig:detectNullDereferences |
borrow_escape | pointer_ownership.zig:detectAsPtrBorrowEscape |
tainted_path_to_sink | taint.zig |
ffi_unsafe_call | ffi_boundary.zig:checkCallForFFI |
buffer_overflow | buffer_overflow.zig |
integer_overflow | issue/integer_overflow.zig |
format_string | format_string.zig |
Corpus File → Source Code Mapping
| IR File | Source File | Bugs Injected |
|---|---|---|
red_team_bugs_O0.ll | red_team_bugs.c | Memory leak, UAF, double-free, null deref, format string |
ffi_boundary_bugs.ll | ffi_boundary_bugs.c | FFI boundary violations, stack escape |
cross_lang_free_bugs.ll | cross_lang_free_bugs.c | Rust→C, C→C++ free mismatches |
subtle_ffi_bugs.ll | subtle_ffi_bugs.c | Subtle FFI patterns (realloc, partial cleanup) |
python_c_api_bugs.ll | python_c_api_bugs.c | Python C API misuse (Py_DECREF, refcount) |
posix_ffi_bugs.ll | posix_ffi_bugs.c | POSIX API misuse (FILE*, fd leak) |
cpp_ffi_simple.ll | cpp_ffi_simple.cpp | C++ new/delete mismatch, RAII escape |
boundary_test.ll | boundary_test.c | Null deref at FFI boundary, circular ownership |
stress_patterns.ll | stress_patterns.c | 70 bugs: alloc leaks, cross-lang mismatches, chains |
sqlite_binding.ll | sqlite_binding.c | SQLite resource leaks, dangling pointers |
openssl_wrapper.ll | openssl_wrapper.c | OpenSSL ctx/bio/key leaks |
zlib_binding.ll | zlib_binding.c | zlib stream leaks, double-free, UAF |