Skip to content

pkg/corpus

Seed corpus generation, loading, and management for GGUF fuzzing campaigns.

import "github.com/professor-moody/crucible/pkg/corpus"

Functions

GenerateSeeds

func GenerateSeeds() ([]*gguf.File, error)

Returns a set of hand-crafted seed files that exercise different GGUF structural patterns: varying tensor counts, metadata types, quantization formats, alignment values, and edge-case dimensions. These seeds form the starting population for mutation-based fuzzing.

GenerateCorpus

func GenerateCorpus(outputDir string) error

Generates seed files and writes them to outputDir as serialized .gguf files. Creates the directory if it does not exist.

LoadCorpus

func LoadCorpus(dir string) ([]*gguf.File, error)

Reads all .gguf files from dir and returns them as parsed *gguf.File structs. Returns an error if any file fails to unmarshal.

LoadCorpusBytes

func LoadCorpusBytes(dir string) ([][]byte, error)

Reads all .gguf files from dir and returns their raw bytes without parsing. Useful when feeding directly into mutator.MutateBytes.

Minimize

func Minimize(corpus []*gguf.File) []*gguf.File

Reduces a corpus to a minimal set that maintains structural diversity. Removes files that are redundant in terms of the GGUF features they exercise (tensor types, metadata patterns, alignment configurations).

MinimizeWithCoverage

func MinimizeWithCoverage(corpus []*gguf.File, harness string) ([]*gguf.File, error)

Coverage-guided corpus minimization. Replays each seed through an instrumented harness to collect LLVM edge coverage, then uses a greedy set-cover algorithm to select the smallest set of seeds that covers all observed edges.

The harness must be built with -fprofile-instr-generate -fcoverage-mapping. Requires llvm-profdata and llvm-cov in PATH. Falls back to hash-based Minimize if harness is empty or no coverage data can be collected.

Seeds that produce zero coverage edges (e.g., early-exit crashes) are preserved separately after hash-based dedup.


CoverageInfo

type CoverageInfo struct {
    File  *gguf.File
    Edges map[string]struct{} // set of unique edge identifiers
    Size  int                  // serialized size in bytes
}

Coverage data for a single corpus seed, used internally by MinimizeWithCoverage.


Corpus Type

type Corpus struct {
    // unexported fields
}

Thread-safe in-memory corpus for use during fuzzing loops.

Constructor

func NewCorpus(rng *rand.Rand) *Corpus

Creates a new empty corpus with the given PRNG for random selection.

Methods

Add / AddBytes

func (c *Corpus) Add(f *gguf.File)
func (c *Corpus) AddBytes(data []byte)

Appends a file (parsed or raw) to the corpus.

Get / GetBytes

func (c *Corpus) Get(index int) *gguf.File
func (c *Corpus) GetBytes(index int) []byte

Returns the file at index in parsed or raw form.

Pick / PickBytes

func (c *Corpus) Pick() *gguf.File
func (c *Corpus) PickBytes() []byte

Returns a randomly selected file from the corpus using the internal PRNG. Used by the fuzzing loop to choose a base file for mutation.

Seeds

func (c *Corpus) Seeds() []*gguf.File

Returns all files currently in the corpus.

Len

func (c *Corpus) Len() int

Returns the number of files in the corpus.

LoadInto

func (c *Corpus) LoadInto(dir string) error

Loads all .gguf files from dir into the corpus. Equivalent to calling LoadCorpus followed by Add for each file.


Usage

// Generate and load a seed corpus
corpus.GenerateCorpus("./seeds")

c := corpus.NewCorpus(rand.New(rand.NewSource(42)))
c.LoadInto("./seeds")

fmt.Printf("Corpus size: %d\n", c.Len())

// Pick a random seed for mutation
file := c.Pick()
m := mutator.New(99)
data, names, _ := m.AppliedMutations(file)