RustForge | Ecosystem Safety Audit Report

Executive Summary

We audited 10 Rust repositories spanning AI inference, agent runtimes, authentication, search, government services, ethical learning, and emulation — totaling 365 source files with 1,025+ tests. Every production .unwrap() and .expect() was manually classified: 140 total, all either algorithmically invariant or CLI-startup pattern. 8 actionable issues were found and fixed on the spot: 5 NaN-panic risks via partial_cmp().unwrap(), 1 hash-chain verification crash, 1 unguarded error propagation, and 1 ambiguous unwrap upgraded to explicit expect. A custom static analysis scanner was built and improved during the audit, catching 4 bugs in our own tooling along the way. Final result: 0 P0 risks, 0 actionable unwraps remaining.

Risk Findings

Every finding below was detected, classified, fixed, and verified with passing tests during the audit.

RF-2026-001 High

NaN Panic in f32 Ordering — 5 Sites, 2 Repos

What

partial_cmp().unwrap() on f32 values panics when either operand is NaN. Found in LLM token sampling (argmax, top-k sort, top-p sort) and anomaly detection confidence scoring. Any NaN from upstream tensor ops would crash the inference pipeline.

Where

llm-sampler/src/lib.rs — lines 192, 204, 225 (argmax, top-k, top-p)
llm-models/src/llama.rs — line 880 (softmax argmax)
b3-core/detectors.rs — line 330 (confidence sort)

Impact

NaN values are common in neural network inference (division by zero in attention, overflow in softmax). A single NaN token would crash the entire LLM serving pipeline. In b3-core, corrupted confidence scores would crash the anomaly detector.

Evidence

// Before (panics on NaN):
logits.iter().enumerate()
    .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())

// After (NaN-safe, total ordering):
logits.iter().enumerate()
    .max_by(|(_, a), (_, b)| a.total_cmp(b))

FIXED All 5 sites: partial_cmp().unwrap() → total_cmp(). NaN sorts to end deterministically. 88 + 88 tests pass.

RF-2026-002 High

Hash-Chain Verify Crash on Malformed Input

What

The verify() function of an append-only hash-chain log used serde_json::from_str(&line).unwrap() to parse each line. A single malformed line (disk corruption, partial write, manual edit) would panic instead of returning verification failure.

Where

logstore/src/lib.rs — line 158, verify() function

Impact

The logstore provides tamper-evidence for an agent runtime. A crash during verification means a corrupted log cannot be detected — it crashes instead. An attacker who can append garbage to the log file triggers a denial-of-service on the integrity check.

Evidence

// Before (panics on malformed JSON):
let v: serde_json::Value = serde_json::from_str(&line).unwrap();

// After (returns chain-invalid without crash):
let v: serde_json::Value = match serde_json::from_str(&line) {
    Ok(v) => v,
    Err(_) => return Ok(false),
};

FIXED Explicit match returns Ok(false) on parse failure. Verification never panics. 41 tests pass.

RF-2026-003 Medium

Unguarded Error Propagation in Agent Runtime

What

submit(request).unwrap() in a runtime execution path swallowed the error type. If the intervention channel was closed or full, the runtime panicked instead of propagating the error through the normal error chain.

Where

runtime.rs — line 243, submit() call inside execution loop

Impact

A closed channel during graceful shutdown or backpressure would crash the entire runtime instead of triggering the retry/fallback path. The panic would poison any shared state held at that point.

Evidence

// Before (panic on channel error):
submit(request).unwrap();

// After (propagates via error chain):
submit(request)?;
// Added: Intervention(#[from] InterventionError) to RuntimeError

FIXED .unwrap() → ? with new InterventionError variant in RuntimeError enum. 179 tests pass.

RF-2026-004 Medium

10 Unsafe Pointer Casts Without Bounds Checking

What

LLM quantization and tensor operations used raw pointer casts (as *const f16, std::slice::from_raw_parts) to reinterpret byte buffers. No alignment or bounds validation. Misaligned access is undefined behavior on most architectures.

Where

llm-quant/src/*.rs — 5 pointer casts in quantization kernels
llm-core/src/tensor.rs — 3 pointer casts in tensor view operations
Additional 2 in GGUF model loading

Impact

Undefined behavior from misaligned reads. On x86 this silently works but on ARM (Apple Silicon, mobile) it can segfault. Buffer overread if byte count doesn't align to element size.

Evidence

// Before (raw pointer cast, no validation):
let ptr = bytes.as_ptr() as *const f16;
let slice = unsafe { std::slice::from_raw_parts(ptr, count) };

// After (safe, validated by bytemuck):
let slice: &[f16] = bytemuck::try_cast_slice(bytes)
    .map_err(|e| TensorError::AlignmentError(e))?;

FIXED 10 pointer casts → bytemuck::from_bytes / try_cast_slice. 9 remaining unsafe blocks are Metal FFI, Send/Sync, mmap — all documented with SAFETY comments. 7 boundary tests added.

Scanner Methodology

We built a custom Rust static analysis scanner during the audit. It found 4 bugs in itself — each improving accuracy across all 365 files.

SCANNER Tooling

4 Scanner Bugs Found & Fixed

BUG-1: Test Indirection

#[cfg(test)] mod tests; → src/tests.rs was not followed. Result: +27 false positives in one repo. Fix: collect_test_module_paths() resolves indirection before counting.

BUG-2: Unsafe String Match

\bunsafe\b matched inside string literals, comments, and lint attributes. Fix: count_real_unsafe() strips strings/comments first, then matches only unsafe {/fn/impl/trait.

BUG-3: Method Name Collision

Parser::expect(TokenKind, msg)? matched the .expect( regex. +16 false positives in one repo. Fix: require string literal argument — \.expect\s*\(\s*".

BUG-4: Doc Comment False Positives

/// and //! lines containing .unwrap() counted as production code. +18 false positives across 4 repos. Fix: strip_doc_comments() filters before counting.

15 TESTS Scanner v3 with full regression suite. Located at newcool-cognitive-core/crates/extractor-rust/.

Analysis Observations

The ecosystem has strong safety foundations. Zero panic!, todo!, or unimplemented! in production code across all 10 repos. 11 unreachable!() are all guarded by bitwise/enum invariants and documented.

partial_cmp on floats is a systemic blind spot. Found in 3 of 10 repos independently. The pattern looks correct at first glance but panics on NaN — a value that routinely appears in neural network inference. total_cmp should be the default for all f32 ordering in Rust.

Library-level code needs different unwrap discipline than CLI code. 34 of 40 unwraps in the agent-runtime are .expect("msg") in fn main — correct CLI pattern. But 1 unwrap in a library verify() function was a genuine crash risk. The distinction between binary-level and library-level unwrap tolerance is critical.

Static analysis tooling must audit itself. 4 bugs in our own scanner over the course of 10 repos. Each bug was caught because manual classification disagreed with scanner counts. Dogfooding the scanner on known-good code is essential.

Methodology: Unwrap Classification

01 Scan: Custom scanner counts all .unwrap(), .expect(), panic!, unsafe per file, excluding test code (#[cfg(test)], /tests/, /benches/).

02 Context: Read 10 lines around each hit. Classify as: Invariant (algorithmic guarantee), Regex/Static init, CLI/startup expect, or Actionable (library-level crash risk).

03 Fix: Only actionable items get fixes. Invariant unwraps are documented, not removed — removing them would add complexity without reducing risk.

04 Verify: Full test suite runs after every fix. Scanner re-run confirms counts match manual classification.

Scanned: 365 files Classified: 140 prod unwraps Result: 0 actionable remaining

Project Scoreboard

Each repo scored by: prod unwrap count, test coverage, unsafe usage, invariant documentation quality.

apex-learning

0 prod unwraps. 0 unsafe. 106 tests. DDD architecture with 7 crates. Clean.

gameboy

9.5

0 prod unwraps. 5 unreachable! (all bitmask-guarded, documented). 28 tests.

rcr

9.0

4 prod unwraps (all match-arm/enum invariant). 330 tests. Best test density in ecosystem.

nccr

9.0

4 prod unwraps (UTF-8 from ASCII, self-ref lookup). 5 unreachable! documented. 68 tests.

estado-transparente

8.5

4 prod unwraps (chrono hardcoded month/day, always valid). 45 tests. Government portal.

llm-runtime

8.0

10 prod unwraps post-fix. 9 unsafe (Metal FFI, all documented). 88 tests. 4 NaN fixes applied.

4 FIXED

agent-core

8.0

14 prod unwraps post-fix (JSON read-back, serde, RFC3339). 41 tests. Verify() hardened.

1 FIXED

auth-rust

7.5

16 prod unwraps (7 chrono invariant + 9 startup expect). 29 tests. All classified safe.

agent-runtime

7.5

40 prod unwraps post-fix (34 CLI expect pattern). 179 tests. 2 fixes applied.

2 FIXED

b3-core

7.0

48 prod unwraps (30 RwLock poison + 18 Regex Lazy). 1 NaN fix. 88 tests. Highest density.

1 FIXED

Cross-Project Patterns

Recurring patterns across the 10-repo ecosystem.

Pattern	Repos	Count	Status
`partial_cmp().unwrap()` on f32	3 / 10	5	FIXED
CLI `.expect("msg")` in fn main	3 / 10	44	SAFE
`RwLock` poison `.unwrap()`	1 / 10	30	INVARIANT
`Regex::new(literal).unwrap()` in Lazy	2 / 10	21	INVARIANT
`unreachable!()` with invariant guard	3 / 10	11	DOCUMENTED
`unsafe` with SAFETY comment	1 / 10	9	DOCUMENTED
`serde_json::to_string` infallible	3 / 10	6	INVARIANT
`panic!` / `todo!` in production	0 / 10	0	PASS

RUSTFORGE — SAFETY AUDIT REPORT

Executive Summary

Risk Findings

Scanner Methodology

Analysis Observations

Methodology: Unwrap Classification

Project Scoreboard

apex-learning

gameboy

rcr

nccr

estado-transparente

llm-runtime

agent-core

auth-rust

agent-runtime

b3-core

Cross-Project Patterns

Want the same depth
on your Rust codebase?

RUSTFORGE — SAFETY AUDIT REPORT

Executive Summary

Risk Findings

Scanner Methodology

Analysis Observations

Methodology: Unwrap Classification

Project Scoreboard

apex-learning

gameboy

rcr

nccr

estado-transparente

llm-runtime

agent-core

auth-rust

agent-runtime

b3-core

Cross-Project Patterns

Want the same depthon your Rust codebase?

Want the same depth
on your Rust codebase?