Skip to main content

Claude/ChatGPT Prompt to Build a Concurrent Web Scraper in Rust

Prompt for an async Rust scraper with tokio, reqwest, and scraper: per-host rate limits, polite backoff, JSON output, and graceful Ctrl+C shutdown.

Fill in the placeholders

Edit the values, then copy your finished prompt.

Your Prompt
prompt.txt

                                

What this prompt does

This prompt builds a concurrent, well-behaved web scraper in Rust and returns working code rather than pseudocode. It frames the model as a senior Rust engineer and asks for six things: a tokio async runtime with a bounded worker pool, reqwest with retry and exponential backoff plus jitter, HTML parsing via the scraper crate into typed structs, per-host rate limiting with a polite delay, a clap-based CLI, and graceful shutdown on Ctrl+C that drains in-flight requests.

Four variables shape the scraper. [site] is the target. [target_data] defines the fields parsed into typed structs. [concurrency] caps the bounded worker pool so you don't overwhelm the origin or your own machine. [output_format] sets how results are emitted, such as newline-delimited JSON. The core discipline here is being polite to the target instead of running a self-inflicted DoS — tuning rate limiting and backoff together is what keeps you from getting IP-banned mid-run. Parsing into typed structs rather than loose strings also means malformed pages surface as compile-time or deserialization errors instead of silently corrupting your output downstream.

When to use it

  • You need a fast, memory-light scraper that ships as a single binary
  • You want concurrency capped so you don't hammer the origin
  • You need retry with backoff and jitter for flaky endpoints
  • You want extracted data parsed into typed structs, not loose strings
  • You need graceful shutdown that drains in-flight requests on Ctrl+C
  • You want a CLI to control seed URL, concurrency, and output path

Example output

You get a full Cargo.toml and src/main.rs. The code sets up a tokio runtime with a bounded worker pool capped at your concurrency limit, reqwest with retry and exponential backoff plus jitter on transient errors, scraper-crate parsing of your target data into typed structs, per-host rate limiting with a polite delay, a clap CLI for seed URL, concurrency, and output path, and graceful Ctrl+C shutdown that drains in-flight requests before exiting.

Pro tips

  • Tune [concurrency] and the backoff together; per-host limits plus exponential backoff with jitter are what keep you from getting IP-banned
  • Set [concurrency] conservatively at first (e.g. 8 concurrent requests) and raise it only after confirming the target tolerates it
  • Make [target_data] specific so the typed structs and selectors match the page's real structure
  • Respect the target's robots.txt and terms; this prompt builds a polite scraper, but the legal and ethical call is yours
  • Pick [output_format] to match downstream needs; newline-delimited JSON streams well into other tools
  • Verify the CSS selectors against the live page, since scraper depends on the exact HTML structure of [site]
  • Test the Ctrl+C path under load to confirm in-flight requests actually drain before exit, since a botched shutdown corrupts your output file

Frequently Asked Questions

Will this scraper get me banned?
It is designed to be polite, with per-host rate limiting, a delay, and backoff with jitter. But that depends on a sane `[concurrency]` setting and the target's tolerance, so start conservatively and respect the site's robots.txt and terms of service.
Why use Rust instead of Python for scraping?
Rust gives you speed, low memory use, and a single binary you can drop on a server with no runtime. It is most worthwhile when the job runs constantly or at scale; for a one-off script, Python is often simpler.
Does it handle flaky or failing requests?
Yes. It uses reqwest with a retry policy and exponential backoff plus jitter on transient errors, so temporary failures are retried sensibly rather than crashing the run or hammering the origin with immediate retries.
Will the CSS selectors work out of the box?
Not necessarily. The scraper crate depends on the exact HTML structure of `[site]`, so you will likely need to inspect the live page and adjust selectors. The generated code gives you the structure; the selectors need verification.
Engr Mejba Ahmed

Need this built for real?

Engr Mejba Ahmed

AI Developer · Software Engineer

I'm Mejba — I design and ship production AI systems, automations, and full-stack apps. If you want this turned into a working solution for your team, let's talk.

More in Rust & Go Prompts

Engr Mejba Ahmed

Engr Mejba Ahmed

Claude Code Expert · Online

👋

Hey there!

Quick Actions

WhatsApp Instant reply

Chat on WhatsApp

+880 1723 741224 · Instant reply

Popular Questions

Engr Mejba Ahmed is connected
Engr Mejba Ahmed is typing...
Engr Mejba Ahmed avatar

✉ Want me to follow up? Drop your email

Engr Mejba Ahmed avatar

📞 Connect Directly

Choose how you'd like to reach me

WhatsApp

+880 1723 741224

Email

[email protected]

✓ Details sent! I'll get back to you shortly.

Powered by OpenAI

335+

Blog Posts

25

AI Courses

63

Projects

Services & Expertise

Pricing & Process

Learning & Resources

Connect & Support