RustPulse

Can AI actually write Rust?

This section will benchmark how well AI models generate, fix, and reason about Rust code — focusing on whether the code actually compiles, passes tests, and uses real crates.

Placeholder — sample data

The leaderboard below is illustrative sample data. RustPulse does not yet run real model benchmarks. The columns and categories show what it will measure.

Leaderboard

ModelProviderCompileTestsHallucinatedUnsafeAvg latencyCostLast run
Claude Opus 4.8Anthropic94%86%5%13%4.8s$3.80Jun 13, 2026
GPT-5OpenAI92%83%7%13%5.1s$3.40Jun 13, 2026
Gemini 2.5 ProGoogle90%80%9%13%4.2s$2.10Jun 13, 2026
Llama 4 70BMeta81%68%16%13%3.7s$0.30Jun 13, 2026

Benchmark categories

The kinds of task the benchmark will cover.

Generate Rust function from prompt
Fix compiler error
Pass cargo test
Avoid hallucinated crates
Explain borrow checker issue
Refactor sync code to async
Use crates correctly
RAG with docs vs no RAG

How it will work

The planned benchmark pipeline — runnable Rust, not just plausible-looking text.

  1. 01Ask a model to generate Rust code for a task.
  2. 02Write the generated code into a temporary Cargo project.
  3. 03Run cargo check to see whether it compiles.
  4. 04Run cargo test to see whether it passes.
  5. 05Detect hallucinated crates by checking dependencies against crates.io.
  6. 06Score the model across all tasks.