Can AI actually write Rust?

This section will benchmark how well AI models generate, fix, and reason about Rust code — focusing on whether the code actually compiles, passes tests, and uses real crates.

Placeholder — sample data

The leaderboard below is illustrative sample data. RustPulse does not yet run real model benchmarks. The columns and categories show what it will measure.

Leaderboard

Model	Provider	Compile	Tests	Hallucinated	Unsafe	Avg latency	Cost	Last run
Claude Opus 4.8	Anthropic	94%	86%	5%	13%	4.8s	$3.80	Jun 13, 2026
GPT-5	OpenAI	92%	83%	7%	13%	5.1s	$3.40	Jun 13, 2026
Gemini 2.5 Pro	Google	90%	80%	9%	13%	4.2s	$2.10	Jun 13, 2026
Llama 4 70B	Meta	81%	68%	16%	13%	3.7s	$0.30	Jun 13, 2026

Benchmark categories

The kinds of task the benchmark will cover.

Generate Rust function from prompt

Fix compiler error

Pass cargo test

Avoid hallucinated crates

Explain borrow checker issue

Refactor sync code to async

Use crates correctly

RAG with docs vs no RAG

How it will work

The planned benchmark pipeline — runnable Rust, not just plausible-looking text.

01Ask a model to generate Rust code for a task.
02Write the generated code into a temporary Cargo project.
03Run cargo check to see whether it compiles.
04Run cargo test to see whether it passes.
05Detect hallucinated crates by checking dependencies against crates.io.
06Score the model across all tasks.