Which is faster: Scrapy, Playwright, or Selenium?

Scrapy is the fastest of the three for static HTML at scale - benchmarks routinely show 1000 to 4000 pages per minute on a single machine thanks to its Twisted-based async engine. Playwright handles roughly 100 to 300 pages per minute when running with a full browser context, since each page must execute JavaScript and render. Selenium is the slowest, typically 30 to 150 pages per minute, because of its older WebDriver protocol and chattier browser communication. For static sites, Scrapy is the clear winner. For JavaScript-heavy targets, Playwright dominates.

Which tool is best for bypassing anti-bot protection?

Playwright is the strongest of the three in 2026 for modern anti-bot bypass. Combined with playwright-stealth, residential proxies, and realistic browser fingerprints, it defeats most commercial bot detection including Cloudflare, DataDome, and PerimeterX in their default configurations. Selenium can achieve similar results with undetected-chromedriver but tends to lag Playwright by six to twelve months. Scrapy alone cannot handle JavaScript challenges and must be paired with a headless browser or a service like ScrapingBee or ZenRows for protected targets.

Can BeautifulSoup handle JavaScript-rendered websites?

No. BeautifulSoup is a pure HTML and XML parser - it does not execute JavaScript and cannot render single-page applications. If you fetch a modern SPA with Requests and parse with BeautifulSoup, you will see only the initial HTML shell with no meaningful content. To scrape JavaScript-rendered sites you need a real browser (Playwright or Selenium) or you need to inspect the underlying JSON API the page calls and hit that directly with Requests or httpx.

Which framework should I use for large-scale crawling?

Scrapy is purpose-built for large-scale crawling and remains the strongest choice in 2026. Its async engine handles tens of thousands of concurrent requests, the built-in scheduler manages dedupe and politeness across the entire crawl, item pipelines clean and persist data, and middlewares give clean hooks for proxy rotation and headers. For crawls in the hundreds of thousands or millions of pages, Scrapy on Scrapy Cloud or self-hosted with Docker is the standard production setup.

How do I learn web scraping in 2026?

Start with Python fundamentals, then learn Requests and BeautifulSoup for simple parsing tasks. Move to Playwright next - it has a friendlier API than Selenium and skills transfer directly to modern production work. Tackle Scrapy last because its conceptual model (spiders, items, pipelines, middlewares) is heavier and is best understood after you already have scraping intuition from the simpler libraries. The Scrapy and Playwright official docs are both excellent. For production patterns, study real open-source scrapers on GitHub.

Should I use a scraping API instead of building my own?

Scraping APIs like ScrapingBee, Bright Data Web Scraper, ZenRows, and Oxylabs Web Scraper API handle the proxy rotation, browser rendering, and anti-bot evasion layer for you - you just send a URL and get HTML or JSON back. They are excellent for occasional jobs or when your team has no scraping expertise. For high-volume continuous pipelines, the per-request pricing (typically 0.001 to 0.01 USD per request) becomes more expensive than running your own infrastructure. Most serious projects use a hybrid: scraping APIs for hard targets, custom Playwright or Scrapy for everything else.

Python Web Scraping: Scrapy vs Playwright vs Selenium 2026

Every serious Python web scraping project starts with the same decision: which tool to use. Get it right and the project ships in a fraction of the time. Get it wrong and you spend weeks fighting your own framework instead of the target website.

The four candidates that dominate Python scraping in 2026 are Scrapy, Playwright, Selenium, and the BeautifulSoup-plus-Requests combination. Each excels in a different regime - and the wrong choice for the regime will cost you. This article is the practical comparison the SpiderHunts engineering team wishes they could send to every prospective client.

We will cover what each tool actually does, where its sweet spot lies, real benchmark numbers, anti-bot capability in 2026, code samples, and a clear decision tree. By the end you should be able to confidently pick the right tool for any scraping project.

Why Tool Choice Matters More Than Most Teams Realise

A poorly matched tool costs in three ways. First, raw throughput - if you use Selenium for what Scrapy could do, your crawl takes 30 times longer. Second, infrastructure - Playwright using a full Chromium for each thread consumes roughly 200 MB of RAM per worker, versus Scrapy's lean 40 MB. Third, reliability - the wrong tool against an aggressive anti-bot stack will see your crawl die in the first hour and never recover.

Conversely, the right tool gives compounding returns. A correctly chosen and configured Scrapy crawl can run for months on a single small VPS, quietly producing millions of clean records. A well-tuned Playwright pipeline can defeat anti-bot stacks that look impenetrable to less experienced teams. Tool selection is where senior engineers earn their keep.

Scrapy Deep Dive

Scrapy is a full crawling framework, not just a library. Built on the Twisted async networking engine, it has been the gold standard for high-volume Python crawling since 2008. Version 2.11 (the current stable in 2026) added native asyncio integration and improved HTTP/2 support.

What Scrapy Gets Right

Speed: Async concurrency, configurable from a handful to several thousand simultaneous requests.
Built-in scheduling: Politeness controls, automatic deduplication of URLs across the crawl, request prioritisation.
Item pipelines: Clean separation of crawl, parse, validate, and persist stages.
Middlewares: Hook in proxy rotation, user-agent rotation, headers, retry logic, and custom auth without touching spider code.
Selectors: Native XPath and CSS selectors backed by parsel - faster and more powerful than BeautifulSoup.
Resume and persist state: A crashed crawl can resume from where it stopped with one CLI flag.
Ecosystem: Scrapy Cloud, scrapy-splash, scrapy-playwright integration, scrapyd for deployment.

Where Scrapy Falls Short

JavaScript: Scrapy does not execute JS. You must pair it with scrapy-playwright or detect underlying JSON endpoints.
Learning curve: The spider/item/pipeline/middleware model is more conceptual than the simpler libraries.
Anti-bot evasion alone: Without a paired browser or commercial unblocker service, Scrapy cannot pass JS challenges like Cloudflare's Turnstile.
Awkward debugging: Twisted stack traces can be intimidating for newer developers.

Scrapy Code Sample

A minimal Scrapy spider extracting product names and prices:

import scrapy

class ProductSpider(scrapy.Spider):
 name = "products"
 start_urls = ["https://example.com/category/widgets"]
 custom_settings = {
 "DOWNLOAD_DELAY": 2,
 "CONCURRENT_REQUESTS": 8,
 "USER_AGENT": "Mozilla/5.0 (compatible; ResearchBot/1.0)",
 }

 def parse(self, response):
 for product in response.css("div.product-card"):
 yield {
 "name": product.css("h3::text").get(),
 "price": product.css(".price::text").get(),
 "url": response.urljoin(product.css("a::attr(href)").get()),
 }
 next_page = response.css("a.next::attr(href)").get()
 if next_page:
 yield response.follow(next_page, self.parse)

That is the complete spider. Run with scrapy crawl products -o products.json and you get a clean JSON dataset.

Playwright Deep Dive

Playwright is Microsoft's modern browser automation library, released in 2020 and rapidly become the favourite of professional scrapers by 2024. The Python binding (playwright-python) provides full Chromium, Firefox, and WebKit control with both sync and async APIs.

What Playwright Gets Right

JavaScript execution: Full Chromium engine handles even the most complex SPAs perfectly.
Auto-waiting: Built-in waits for elements to appear, become visible, or stop animating - eliminates flaky selector code.
Anti-detection: playwright-stealth and rebrowser-playwright leak fewer browser fingerprint signals than Selenium.
Network interception: Can capture, modify, or block API calls a page makes - often more efficient than scraping rendered HTML.
Multi-browser: Chromium, Firefox, and WebKit (Safari engine) with one API.
Tracing and screenshots: Built-in debugging tools that make complex failures observable.
Modern syntax: Async/await first, clean Pythonic API.

Where Playwright Falls Short

Resource cost: Each browser context uses 150 to 300 MB of RAM. Running hundreds in parallel needs serious infrastructure.
Speed ceiling: Even fast machines top out around 200 to 300 pages per minute per worker due to rendering overhead.
No native crawler framework: You build the scheduler, dedupe, and pipeline yourself or pair with another tool.
Dependency footprint: Each browser install is roughly 300 MB. Docker images bloat quickly.

Playwright Code Sample

from playwright.async_api import async_playwright
import asyncio, json

async def scrape():
 async with async_playwright() as p:
 browser = await p.chromium.launch(headless=True)
 context = await browser.new_context(
 user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
 viewport={"width": 1366, "height": 768},
 )
 page = await context.new_page()
 await page.goto("https://example.com/category/widgets", wait_until="networkidle")
 await page.wait_for_selector("div.product-card")

 items = await page.evaluate("""
 () => Array.from(document.querySelectorAll('div.product-card')).map(el => ({
 name: el.querySelector('h3')?.innerText,
 price: el.querySelector('.price')?.innerText,
 url: el.querySelector('a')?.href,
 }))
 """)
 with open("products.json", "w") as f:
 json.dump(items, f, indent=2)
 await browser.close()

asyncio.run(scrape())

Playwright is verbose compared to Scrapy but vastly more capable against JavaScript-rendered targets.

Selenium Deep Dive

Selenium is the elder statesman - originally a browser test automation tool from 2004, repurposed for scraping by an entire generation of developers. Selenium 4 (released 2021, maintained actively through 2026) modernised the API with the W3C WebDriver protocol.

What Selenium Gets Right

Maturity: 20-plus years of community knowledge. Almost every problem has a documented solution.
Broad browser support: Chrome, Firefox, Edge, Safari, Internet Explorer (still, in 2026), and obscure mobile browser drivers.
undetected-chromedriver: A community fork that strips automation fingerprints - the gold standard tool for Selenium-based scraping against anti-bot stacks.
Grid: Selenium Grid for distributing across many machines is battle-tested at enterprise scale.
Cross-language: If your stack is partly Java,.NET, Ruby, or JavaScript, Selenium gives identical APIs in all of them.

Where Selenium Falls Short

Speed: The WebDriver protocol has more round trips than Playwright's CDP-based approach.
Manual waits: Without explicit WebDriverWait calls, scripts are brittle.
Anti-detection: Stock Selenium leaks the navigator.webdriver flag and other fingerprints. Workarounds exist but are an ongoing arms race.
Async support: Native async is limited compared to Playwright's first-class asyncio API.

Selenium Code Sample

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json

options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
options.add_argument("--user-agent=Mozilla/5.0 ResearchBot/1.0")
driver = webdriver.Chrome(options=options)

driver.get("https://example.com/category/widgets")
WebDriverWait(driver, 10).until(
 EC.presence_of_element_located((By.CSS_SELECTOR, "div.product-card"))
)

items = []
for card in driver.find_elements(By.CSS_SELECTOR, "div.product-card"):
 items.append({
 "name": card.find_element(By.CSS_SELECTOR, "h3").text,
 "price": card.find_element(By.CSS_SELECTOR, ".price").text,
 "url": card.find_element(By.CSS_SELECTOR, "a").get_attribute("href"),
 })

with open("products.json", "w") as f:
 json.dump(items, f, indent=2)
driver.quit()

BeautifulSoup and Requests: The Simple Pair

For straightforward scraping jobs against static sites, the BeautifulSoup-plus-Requests combination remains unmatched for simplicity. Requests makes the HTTP call, BeautifulSoup parses the HTML, and you write a few lines of Python. No framework, no scheduler, no browser.

When to Reach for BeautifulSoup

One-off extractions of a handful to a few hundred pages.
Prototyping before committing to a full Scrapy or Playwright project.
Hitting JSON APIs directly where there is no rendering to do.
Learning the basics of HTML parsing before moving to heavier frameworks.

When to Skip It

Anything JavaScript-rendered - BeautifulSoup cannot help.
High-volume crawls - you will rebuild Scrapy badly.
Sites with anti-bot protection - Requests offers no fingerprint defence.

BeautifulSoup Code Sample

import requests
from bs4 import BeautifulSoup
import json

r = requests.get(
 "https://example.com/category/widgets",
 headers={"User-Agent": "Mozilla/5.0 ResearchBot/1.0"},
 timeout=15,
)
soup = BeautifulSoup(r.text, "lxml")

items = []
for card in soup.select("div.product-card"):
 items.append({
 "name": card.select_one("h3").get_text(strip=True),
 "price": card.select_one(".price").get_text(strip=True),
 "url": card.select_one("a")["href"],
 })

with open("products.json", "w") as f:
 json.dump(items, f, indent=2)

Performance Benchmarks (2026)

Numbers below are internal SpiderHunts benchmarks on a 4-vCPU, 8 GB RAM Linux VM crawling a static product catalogue and a JavaScript-heavy SPA. All tools used residential proxies with two-second average delay.

Tool	Static Site (pages/min)	JS SPA (pages/min)	RAM per worker
Scrapy	2800	N/A (cannot render)	40 MB
Playwright	320	240	220 MB
Selenium	140	95	280 MB
Requests + BS4	1100	N/A (cannot render)	25 MB
Scrapy + Playwright	2400 (rendering disabled)	280	240 MB (when rendering)

The pattern is consistent: Scrapy dominates for static, Playwright leads when rendering is needed, Selenium trails Playwright in both regimes, and Requests is fastest for tiny jobs but lacks the crawl machinery for serious projects.

Anti-Bot Bypass: Tool by Tool

This is where 2026 looks dramatically different from 2022. Anti-bot vendors (Cloudflare, DataDome, Akamai Bot Manager, PerimeterX) have raised the bar significantly. Here is how each tool stacks up.

Scrapy

Cannot defeat JavaScript challenges alone. Production setups pair Scrapy with scrapy-playwright, scrapy-impersonate (for TLS fingerprint matching), or commercial unblockers like ScrapingBee. For HTTP/2 fingerprint matching, curl_cffi has become the standard underlying client in 2026.

Playwright

The strongest standalone option. Combined with playwright-stealth or rebrowser-playwright, plus residential proxies and realistic browser contexts (viewport, locale, time zone, hardware concurrency), it defeats most commercial bot detection in default configurations. For Cloudflare Turnstile specifically, the combination of patched chromium builds plus interactive mouse movement plus residential IPs is what works.

Selenium

undetected-chromedriver remains effective but lags Playwright's stealth ecosystem by six to twelve months. Selenium-stealth and SeleniumBase UC mode are the other common picks. For very high-protection targets, Selenium is rarely the right tool in 2026.

Requests / httpx

Plain Requests is detectable instantly by any modern stack. curl_cffi and tls-client offer JA3/JA4 TLS fingerprint impersonation that makes Python HTTP calls indistinguishable from real Chrome - a vital tool when you need raw HTTP speed but must look like a browser at the network level.

Decision Framework

Use this checklist to pick the right tool in under two minutes.

Pick Scrapy When

Crawl volume is more than 10,000 pages.
Target site is mostly static HTML or has clean JSON APIs.
You need persistent scheduling, dedupe, and pipelines out of the box.
Multiple spiders will live in the same project.
Infrastructure cost matters.

Pick Playwright When

Target site renders content with JavaScript (React, Vue, Angular, Svelte).
There is anti-bot protection that requires browser-level evasion.
You need to interact with the page (click, type, scroll, hover).
Network interception of XHR or fetch calls would simplify the job.
Volume is moderate (up to a few hundred thousand pages).

Pick Selenium When

Team already has deep Selenium expertise.
You must support an unusual browser (legacy IE, certain mobile drivers).
Cross-language consistency with existing Java or.NET test infrastructure matters.
Project will use Selenium Grid for distributed execution.

Pick BeautifulSoup plus Requests When

Job is one-off and small (under a few hundred pages).
Target is static HTML.
No anti-bot considerations.
You are prototyping before scaling up.

Combine Tools When

The strongest real-world pipelines mix tools. Scrapy plus scrapy-playwright is the most common production combination: Scrapy handles the crawl orchestration and pipeline machinery, while Playwright renders the small percentage of pages that require JS. For maximum throughput against modern protected sites, a hybrid using curl_cffi for HTTP fingerprint matching combined with Playwright for challenge-page passes delivers the best price-performance.

Real-World Recommendations from SpiderHunts Engineers

After shipping more than a hundred scraping projects since 2015, the SpiderHunts engineering team's defaults look like this:

For lead generation projects on business directories: Scrapy with curl_cffi for TLS impersonation. Fast, reliable, cheap to run.
For e-commerce competitor monitoring: Scrapy plus scrapy-playwright. Most product list pages are server-rendered (Scrapy fast path), variant pickers and reviews are JS (Playwright fallback).
For real estate portals: Playwright direct - these sites are increasingly SPAs and use sophisticated bot detection.
For professional networks within ToS: Playwright with full stealth stack and per-account rate limiting. Treat carefully.
For news and PR monitoring: Scrapy with sitemap-driven URL discovery. RSS feeds where available for efficiency.
For quick client one-offs: Requests plus BeautifulSoup. Two-hour delivery, no framework overhead.

The single biggest engineering lesson: build the pipeline assuming the target site will change selectors every few months. Resilient extractors, structured logging, and quick rollback are worth more than raw scraping speed once a system is in production.

Final Verdict

There is no single best Python scraping tool - the answer depends on what you are scraping, at what volume, and how aggressive the target's defences are. That said, two tools belong in every serious scraper's toolbox in 2026: Scrapy for high-volume work and Playwright for everything JavaScript-heavy. Selenium remains useful in legacy contexts but is rarely the right new project default. BeautifulSoup plus Requests stays the friend of every quick job.

If you need a custom scraping pipeline built and delivered with the right tool choices baked in, that is exactly what the SpiderHunts Technologies web scraping team does. We pair the tool selection above with battle-tested anti-bot tooling, proxy infrastructure, and CRM delivery pipelines. For the business-side framing of a lead-generation scraping project, see our companion guide on web scraping for lead generation. And for downstream analytics on the data you collect, our data science service turns raw records into business intelligence.

💻 More in SaaS & Software Development

Need a Production Scraping Pipeline?

Talk to the SpiderHunts engineering team - free 30-minute scoping call. We will recommend the right Python tool stack for your target sites and give you a fixed-price quote and timeline.

WhatsApp Us Now Book a Free Meeting

Scrapy vs Playwright vs Selenium: Python Web Scraping Tools Compared