What this prompt does
This prompt builds a concurrent, well-behaved web scraper in Rust and returns working code rather than pseudocode. It frames the model as a senior Rust engineer and asks for six things: a tokio async runtime with a bounded worker pool, reqwest with retry and exponential backoff plus jitter, HTML parsing via the scraper crate into typed structs, per-host rate limiting with a polite delay, a clap-based CLI, and graceful shutdown on Ctrl+C that drains in-flight requests.
Four variables shape the scraper. [site] is the target. [target_data] defines the fields parsed into typed structs. [concurrency] caps the bounded worker pool so you don't overwhelm the origin or your own machine. [output_format] sets how results are emitted, such as newline-delimited JSON. The core discipline here is being polite to the target instead of running a self-inflicted DoS — tuning rate limiting and backoff together is what keeps you from getting IP-banned mid-run. Parsing into typed structs rather than loose strings also means malformed pages surface as compile-time or deserialization errors instead of silently corrupting your output downstream.
When to use it
- You need a fast, memory-light scraper that ships as a single binary
- You want concurrency capped so you don't hammer the origin
- You need retry with backoff and jitter for flaky endpoints
- You want extracted data parsed into typed structs, not loose strings
- You need graceful shutdown that drains in-flight requests on Ctrl+C
- You want a CLI to control seed URL, concurrency, and output path
Example output
You get a full Cargo.toml and src/main.rs. The code sets up a tokio runtime with a bounded worker pool capped at your concurrency limit, reqwest with retry and exponential backoff plus jitter on transient errors, scraper-crate parsing of your target data into typed structs, per-host rate limiting with a polite delay, a clap CLI for seed URL, concurrency, and output path, and graceful Ctrl+C shutdown that drains in-flight requests before exiting.
Pro tips
- Tune
[concurrency]and the backoff together; per-host limits plus exponential backoff with jitter are what keep you from getting IP-banned - Set
[concurrency]conservatively at first (e.g. 8 concurrent requests) and raise it only after confirming the target tolerates it - Make
[target_data]specific so the typed structs and selectors match the page's real structure - Respect the target's robots.txt and terms; this prompt builds a polite scraper, but the legal and ethical call is yours
- Pick
[output_format]to match downstream needs; newline-delimited JSON streams well into other tools - Verify the CSS selectors against the live page, since scraper depends on the exact HTML structure of
[site] - Test the Ctrl+C path under load to confirm in-flight requests actually drain before exit, since a botched shutdown corrupts your output file