Concepts#
What is a fuzzer#
A fuzzer is a program that can take another program and exercise its features by supplying many many different kinds of input to it, trying to exercise and stress the program under test as much as possible, with the goal of uncovering bugs and crashes.
Coverage-guided fuzzers#
A naive solution for a fuzzer would just randomly throw inputs at the program, and then record the test input that made the program crash/hang.
However, in doing this, we would have no way of “guiding” the way we generate input. We would have no way of establishing whether an input is “better” than another (better here means: exercises more the program) unless there is a crash. Or, we would have no way of knowing whether the specific entrypoint that the fuzzer is using, is touching all the relevant and fuzzable parts of the codebase.
The way to solve this is by implementing a way to measure the coverage of the fuzzing process. So we obtain:
- A way to discover paths that are not touched by the fuzzer, so that we can improve the harness1.
- A way for the fuzzer to guide, at runtime, the generation/mutation of the inputs, prioritizing inputs that increase the coverage.
A way to obtain this runtime coverage is to instrument the program, adding some custom profiling to the basic blocks that are executed. This can be done by using custom compiler passes, or by using dynamic instrumentation frameworks.
Example project#
In order to experiment with fuzzing, I’ve created a toy example on https://github.com/suidpit/rust-fuzz-garden. It contains a simple API function:
const SECRET: [u8; 9] = *b"FUZZME123";
pub fn take_string(input: &str) {
let bytes = input.as_bytes();
if bytes.len() != SECRET.len() {
return;
}
if bytes[0] != SECRET[0] {
return;
}
if bytes[1] != SECRET[1] {
return;
}
if bytes[2] != SECRET[2] {
return;
}
if bytes[3] != SECRET[3] {
return;
}
if bytes[4] != SECRET[4] {
return;
}
if bytes[5] != SECRET[5] {
return;
}
if bytes[6] != SECRET[6] {
return;
}
if bytes[7] != SECRET[7] {
return;
}
if bytes[8] != SECRET[8] {
return;
}
panic!("full match reached crash condition");
}The take_string function takes the input and compares it byte-to-byte to a given secret. A good coverage-guided fuzzer should be easily able to iteratively extract the crashing test-case (which we expect to be exactly equal to FUZZME123).
Fuzzing with libfuzzer via cargo-fuzz#
libfuzzer is the industry standard for in-process2 fuzzing. It is part of the LLVM project, which makes it very easy to integrate in Rust projects, since rustc is actually a compiler frontend for the LLVM IR.
Fuzzing with libfuzzer can be easily setup by utilizing the cargo-fuzz utility, which hides much of the complexity of setting up the compilation flags for fuzzing.
Once the requirements are installed and the test repository is cloned, we can init a new fuzzer with cargo fuzz init.
Notice that the command created a new directory, named fuzz, in the project. This contains:
- A
Cargo.tomlwith the dependencies and settings of the fuzzing crate - A
fuzz/fuzz_targets/fuzz_target_1.rs, that contains the skeleton code of a fuzzing harness:
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
// fuzzed code goes here
});fuzz_target! here is the macro that will be expanded into the C ABI that libfuzzer expects. At runtime, libfuzzer will run the body of fuzz_target! with each time a different vector of input bytes (that will be mutated according to the coverage feedback).
In this example, we can just convert the bytes to UTF-8 and then invoke the target API:
#![no_main]
use libfuzzer_sys::fuzz_target;
extern crate rust_fuzz_garden;
fuzz_target!(|data: &[u8]| {
if let Ok(s) = std::str::from_utf8(data) {
rust_fuzz_garden::take_string(s);
}
});And then build the fuzzer with cargo fuzz build (from the root directory of the project).
Before starting the fuzzer, you can take a moment to explore all the command line arguments supported by the fuzzer by running:
cargo fuzz run <name_of_the_fuzzer_target> -- -help=1And then we can simply start the fuzzer3:
cargo fuzz run <name_of_the_fuzzer_target>
After a few seconds, the fuzzing stops because the program crashed:
Failing input:
fuzz/artifacts/fuzz_take_string/crash-3dd6b4c5a9468adf21b788190d130f58e4558338
Output of `std::fmt::Debug`:
[70, 85, 90, 90, 77, 69, 49, 50, 51]
Reproduce with:
cargo fuzz run fuzz_take_string fuzz/artifacts/fuzz_take_string/crash-3dd6b4c5a9468adf21b788190d130f58e4558338
Minimize test case with:
cargo fuzz tmin fuzz_take_string fuzz/artifacts/fuzz_take_string/crash-3dd6b4c5a9468adf21b788190d130f58e4558338We can also quickly inspect the body of the crashing test-case to confirm that it is what we expected:
➜ rust-fuzz-garden git:(libfuzzer) cat fuzz/artifacts/fuzz_take_string/crash-3dd6b4c5a9468adf21b788190d130f58e4558338
FUZZME123%
➜ rust-fuzz-garden git:(libfuzzer)This wraps up this very simple note. There are other interesting parts, such as profile the coverage to improve the harness, or minimizing test cases, but this is a simple note, therefore let’s keep it simple.
A harness is the program written specifically for fuzzing a library, that exercises the API offered by the library in a way suited to maximize coverage (for instance by relaxing some validity constraints). ↩︎
In-process fuzzing is a technique that reuses the same process on each iteration, instead of forking, to optimize execs/s. ↩︎
In a real-world fuzzing campaign, you would probably supply a given amount of argument, such as instance the initial corpus of data. ↩︎