pkg/corpus¶
Seed corpus generation, loading, and management for GGUF fuzzing campaigns.
Functions¶
GenerateSeeds¶
Returns a set of hand-crafted seed files that exercise different GGUF structural patterns: varying tensor counts, metadata types, quantization formats, alignment values, and edge-case dimensions. These seeds form the starting population for mutation-based fuzzing.
GenerateCorpus¶
Generates seed files and writes them to outputDir as serialized .gguf files. Creates the directory if it does not exist.
LoadCorpus¶
Reads all .gguf files from dir and returns them as parsed *gguf.File structs. Returns an error if any file fails to unmarshal.
LoadCorpusBytes¶
Reads all .gguf files from dir and returns their raw bytes without parsing. Useful when feeding directly into mutator.MutateBytes.
Minimize¶
Reduces a corpus to a minimal set that maintains structural diversity. Removes files that are redundant in terms of the GGUF features they exercise (tensor types, metadata patterns, alignment configurations).
MinimizeWithCoverage¶
Coverage-guided corpus minimization. Replays each seed through an instrumented harness to collect LLVM edge coverage, then uses a greedy set-cover algorithm to select the smallest set of seeds that covers all observed edges.
The harness must be built with -fprofile-instr-generate -fcoverage-mapping. Requires llvm-profdata and llvm-cov in PATH. Falls back to hash-based Minimize if harness is empty or no coverage data can be collected.
Seeds that produce zero coverage edges (e.g., early-exit crashes) are preserved separately after hash-based dedup.
CoverageInfo¶
type CoverageInfo struct {
File *gguf.File
Edges map[string]struct{} // set of unique edge identifiers
Size int // serialized size in bytes
}
Coverage data for a single corpus seed, used internally by MinimizeWithCoverage.
Corpus Type¶
Thread-safe in-memory corpus for use during fuzzing loops.
Constructor¶
Creates a new empty corpus with the given PRNG for random selection.
Methods¶
Add / AddBytes¶
Appends a file (parsed or raw) to the corpus.
Get / GetBytes¶
Returns the file at index in parsed or raw form.
Pick / PickBytes¶
Returns a randomly selected file from the corpus using the internal PRNG. Used by the fuzzing loop to choose a base file for mutation.
Seeds¶
Returns all files currently in the corpus.
Len¶
Returns the number of files in the corpus.
LoadInto¶
Loads all .gguf files from dir into the corpus. Equivalent to calling LoadCorpus followed by Add for each file.