Skip to content

pkg/triage

Crash classification, deduplication, and report generation for ASAN-detected vulnerabilities.

import "github.com/professor-moody/crucible/pkg/triage"

Types

CrashType

type CrashType string

Classified vulnerability type extracted from ASAN output.

Constant Value ASAN Signal
HeapOverflow "heap-buffer-overflow" heap-buffer-overflow
StackOverflow "stack-buffer-overflow" stack-buffer-overflow
UseAfterFree "use-after-free" heap-use-after-free
NullDeref "null-dereference" SEGV on address 0x0
IntegerOverflow "integer-overflow" runtime error: signed integer overflow
AssertionFailure "assertion-failure" Assertion .* failed
Unknown "unknown" Unrecognized pattern

Crash

type Crash struct {
    ID             string
    Type           CrashType
    StackHash      string
    StackTrace     string
    InputFile      string
    Function       string
    SourceLocation string
    Reproducer     []byte
    Target         string   // Target surface (e.g. "gguf", "rpc", "grammar").
    MinimizedPath  string   // Path to minimized reproducer (set after --minimize).
    HarnessPath    string   // Path to harness binary used for replay.
    ReplayEnv      []string // Extra environment variables used during replay.
}

A deduplicated crash instance. StackHash is used to identify unique bugs -- crashes with the same hash are considered duplicates.


Triager

type Triager struct {
    // unexported fields
}

Collects and deduplicates crashes across a fuzzing campaign.

Constructor

func NewTriager() *Triager

Creates a new triager with an empty crash database.

Methods

func (t *Triager) AddCrash(asanOutput, inputFile string, reproducer []byte) *Crash

Parses ASAN output, classifies the crash, computes a stack hash, and adds it to the database. Returns the *Crash (which may be an existing entry if the stack hash matches a previous crash). The reproducer is the raw GGUF bytes that triggered the crash.

func (t *Triager) Crashes() []*Crash

Returns all unique crashes found so far, sorted by type.

func (t *Triager) Stats() string

Returns a human-readable summary of crash counts by type.


Report

type Report struct {
    Crash            *Crash
    CWE              CWE
    CVSSScore        float64
    Severity         string
    Description      string
    AffectedVersions []string
    ReproducerPath   string
}

A structured vulnerability report generated from a classified crash. The CWE field is automatically populated by GenerateReport using the crash type mapping.

Methods

func (r *Report) String() string

Returns the full report as formatted text, including CWE, and — when set — Target, Harness, Replay Env, and Minimized path lines.

func (r *Report) WriteToFile(path string) error

Writes the report to a file at path.


CWE

type CWE struct {
    ID   string  // e.g., "CWE-122"
    Name string  // e.g., "Heap-based Buffer Overflow"
}

A Common Weakness Enumeration identifier. Attached to every Report and included in SARIF output.


MinimizeSummary

type MinimizeSummary struct {
    Minimized int
    Skipped   int
    Errors    int
}

Aggregate counts from a MinimizeCrashDir run: how many files were successfully minimized, skipped (non-reproducing), or failed with errors.


DedupResult

type DedupResult struct {
    Total        int               // files examined
    Kept         int               // unique crashes retained
    Removed      int               // duplicates removed or moved
    UniqueHashes int               // distinct stack hashes (full mode) or content groups (fast mode)
    ByType       map[CrashType]int // unique crash count per type (full mode only)
    Errors       int               // files that failed replay or read
}

Summary of a crash deduplication pass. Returned by both DeduplicateDir and FastDeduplicateDir.


Functions

DeduplicateDir

func DeduplicateDir(crashDir, harness string, timeout time.Duration, delete bool) (*DedupResult, error)

Full-mode deduplication: replays each crash file through the harness, computes a stack hash from sanitizer output, and keeps the smallest reproducer per unique hash. When delete is true, duplicate files are removed from disk.

FastDeduplicateDir

func FastDeduplicateDir(crashDir string, delete bool) (*DedupResult, error)

Fast-mode deduplication: groups crash files by (file_size, SHA-256 of first 64 bytes) without replaying through a harness. No harness binary is required. Keeps the smallest file per group.

WalkCrashFiles

func WalkCrashFiles(dir string) ([]string, error)

Recursively collects all crash file paths under dir. Files are identified by the IsCrashFile() name-prefix check (crash-*, oom-*, timeout-*, id:*).

FormatDedupResult

func FormatDedupResult(r *DedupResult, mode string) string

Returns a human-readable summary of a dedup pass. The mode parameter ("full" or "fast") is included in the output header.

MinimizeCrashDir

func MinimizeCrashDir(crashDir, harness, outputDir string, timeout time.Duration, extraEnv []string) ([]MinimizeResult, *MinimizeSummary, error)

Minimizes all crash reproducers in a directory tree. Recurses into subdirectories using filepath.WalkDir and mirrors the subdirectory structure under outputDir. Files that don't reproduce a classifiable crash are skipped. Returns per-file results and an aggregate summary.

ClassifyCrash

func ClassifyCrash(asanOutput string) CrashType

Parses ASAN output and returns the corresponding CrashType. Matches against known ASAN error patterns. Returns Unknown if no pattern matches.

ExtractLocation

func ExtractLocation(asanOutput string) (function, sourceLocation string)

Extracts the faulting function name and source file location (e.g., "gguf_init_from_file", "llama.cpp:1234") from ASAN stack trace output.

HashStack

func HashStack(stackTrace string) string

Computes a stable hash of a stack trace for deduplication. Normalizes addresses and ignores ASLR-affected frames to ensure the same root cause produces the same hash across runs.

GenerateReport

func GenerateReport(crash *Crash) *Report

Generates a Report from a Crash, including CVSS scoring and CWE classification based on the crash type. When Crash.MinimizedPath is set, it is used as the reproducer path in preference to the original InputFile.

CVSS scoring

Heap overflows and use-after-free bugs score highest (8.0+) due to their exploitability. Null dereferences and assertion failures score lower as they typically result in denial-of-service only.

CWEForType

func CWEForType(ct CrashType) CWE

Returns the CWE identifier for a given crash type. Maps all 16 CrashType values to their corresponding CWE.

CrashType CWE
HeapOverflow CWE-122
GlobalOverflow CWE-120
StackOverflow CWE-121
UseAfterFree CWE-416
DoubleFree CWE-415
NullDeref CWE-476
IntegerOverflow CWE-190
DivByZero CWE-369
AssertionFailure CWE-617
OOM CWE-400
AllocTooBig CWE-789
Timeout CWE-400
UseAfterPoison CWE-416
EnumLoad CWE-843
MisalignedAccess CWE-188

WriteSARIF

func WriteSARIF(w io.Writer, reports []*Report) error

Writes a SARIF 2.1.0 JSON document to w containing all provided reports. The SARIF output includes:

  • Tool metadata — Crucible identity and version
  • Rules — One rule per unique crash type, with CWE references and target/<surface> tags
  • Results — One result per report, with source location, severity level, and fingerprints (crucible/stackHash/v1 and crucible/target)

Severity levels are mapped from CVSS scores:

CVSS Range SARIF Level
≥ 7.0 error
≥ 4.0 warning
< 4.0 note

Usage

triager := triage.NewTriager()

// After running target with ASAN
asanOutput := string(cmdOutput)
inputFile := "crash-001.gguf"
reproducer, _ := os.ReadFile(inputFile)

crash := triager.AddCrash(asanOutput, inputFile, reproducer)
if crash != nil {
    report := triage.GenerateReport(crash)
    report.WriteToFile("reports/" + crash.ID + ".txt")
}

fmt.Println(triager.Stats())
// Output:
// heap-buffer-overflow: 3
// null-dereference: 1
// Total unique crashes: 4