Back to Blog
Compliance

Is Web Scraping Legal? Compliance Guide for Businesses in 2026

By SpiderHunts Technologies  ·  May 30, 2026  ·  13 min read

TL;DR

Yes, web scraping is generally legal - with caveats. Scraping public data is broadly lawful in the US under hiQ v LinkedIn. In the EU and UK, GDPR adds restrictions when personal data is involved. Terms of Service violations create contractual risk. Bypassing technical controls (logins, paywalls, CAPTCHAs) crosses into computer misuse statutes. Respect robots.txt, rate limit, identify your scraper, avoid personal data unless lawful basis exists, and consult a lawyer before scaling. This guide explains the full picture for 2026.

Web scraping is one of the most common ways businesses extract data in 2026. Price monitoring, lead generation, real estate aggregation, competitor research, market intelligence, news monitoring - every one of these uses scraping at scale. And yet most business owners commissioning scraping projects have no clear idea whether what they are about to do is legal.

The short answer: yes, web scraping is legal in most circumstances, but the boundary lines have shifted considerably between 2020 and 2026. Several landmark court cases have clarified what you can and cannot do, GDPR enforcement has matured, and the rise of AI training has triggered a new wave of legal challenges.

This guide cuts through the conflicting opinions online and gives you a practical framework for scraping legally in 2026 - covering the relevant law, recent court rulings, ethical best practices, and when to actually call a lawyer.

Disclaimer: This article is provided for general information only and does not constitute legal advice. The legal status of web scraping depends on the specific facts of each case, the jurisdictions involved, and how the law is interpreted by courts. Always consult a qualified lawyer before commissioning scraping at scale.

The Short Answer: Yes, With Caveats

Web scraping itself is not illegal anywhere in the world. The technical act of fetching public web pages and parsing their HTML is not a crime. What can be illegal is what you scrape, how you scrape it, and what you do with the data afterwards.

Four legal layers determine whether your scraping is lawful:

  1. Computer misuse / unauthorised access laws (CFAA in the US, Computer Misuse Act 1990 in the UK, similar statutes worldwide)
  2. Contract law (Terms of Service violations)
  3. Data protection law (GDPR in the EU/UK, CCPA in California, similar regimes elsewhere)
  4. Intellectual property law (copyright, database rights, trade secrets)

A given scraping project might be fine under one layer and problematic under another. The legality is the intersection of all four.

Public Data vs Private Data

The most important distinction in scraping law is between public and private data. Public data is anything accessible without authentication - product listings on an e-commerce site, articles on a news site, company information on a corporate website. Private data is anything behind a login, paywall, or technical access control.

Scraping Public Data

Scraping public data is broadly legal in most jurisdictions. The landmark US case hiQ Labs v LinkedIn (Ninth Circuit, 2019, reaffirmed 2022) established that scraping publicly available data does not violate the Computer Fraud and Abuse Act. You cannot be criminally prosecuted simply for scraping a site that has no login wall.

However, "broadly legal" does not mean "no risk." Even for public data, you can face contractual liability under Terms of Service, data protection liability if personal data is involved, and copyright liability if you copy substantial protected content.

Scraping Private Data

Scraping data behind a login or paywall is significantly riskier. If you sign up for an account and then scrape, you are bound by the Terms of Service you agreed to. If you bypass authentication technically (using stolen credentials, exploiting bugs, or circumventing CAPTCHAs in a way the site has clearly prohibited), you risk computer misuse charges. Multiple defendants have been criminally convicted under the CFAA and similar statutes for this conduct.

robots.txt: Does It Have Legal Weight?

The robots.txt protocol was introduced in 1994 as a voluntary standard for websites to communicate which crawlers were welcome and which paths should not be fetched. It is not law - it is a polite request.

However, in 2026 robots.txt carries more legal weight than it used to. Recent rulings have used robots.txt directives as evidence of whether access was authorised. If a site explicitly disallows a path in robots.txt and you scrape that path anyway, courts may treat your access as unauthorised - making CFAA and Computer Misuse Act claims more viable.

The practical rule: always read robots.txt for sites you intend to scrape, and respect its directives unless you have a clear legal basis (and lawyer's advice) to ignore it. Ignoring robots.txt is not just unethical - it actively increases your legal exposure.

Terms of Service Violations

Almost every commercial website's Terms of Service prohibits scraping in some form. Violating these terms is a breach of contract - it does not automatically make scraping illegal, but it creates two real risks.

First, the site can sue you for breach of contract. Damages may be limited (often only nominal), but injunctions can stop your scraping operation immediately. Second, in some jurisdictions, ToS violations can elevate the legal severity of other claims - for example, scraping in clear breach of ToS may push behaviour from "civil dispute" into "unauthorised access" territory.

hiQ Labs v LinkedIn (US)

The most influential scraping case of the past decade. hiQ Labs scraped LinkedIn public profiles to build an employee analytics product. LinkedIn sent cease-and-desist letters, hiQ sued for declaratory judgment, and the Ninth Circuit ultimately ruled that scraping publicly available data does not violate the CFAA.

The case clarified that "public data" scraping is not criminal computer misuse in the US - but it did NOT clear hiQ of breach of contract or other state-law claims. After remand, hiQ ultimately lost on the contract issues and shut down. The lesson: public data scraping is not automatically a crime, but ToS violations still bite.

Meta v Bright Data (US, 2024)

Meta sued Bright Data, a major scraping infrastructure provider, alleging Bright Data scraped Facebook and Instagram in violation of ToS and trespass claims. In January 2024, the court ruled largely in favour of Bright Data, finding that scraping public Facebook and Instagram data while not logged in did not breach Meta's terms. The case is being appealed but is significant precedent for scrapers of public social media data.

New York Times v OpenAI (US, ongoing)

The NYT sued OpenAI and Microsoft in late 2023 alleging that ChatGPT was trained on copyrighted NYT articles scraped without licence. The case is ongoing as of 2026 and will be hugely influential for the AI training landscape. It does not affect ordinary commercial scraping but is a warning sign for scraping copyrighted content for AI training.

Ryanair v Booking.com (EU, 2024)

The European Court of Justice ruled that Ryanair could enforce its no-scraping terms against Booking.com, which had been scraping Ryanair fare data to display on its travel platform. The ruling is significant for EU scraping practice - it confirms that ToS-based claims are enforceable across EU member states even where databases are not protected by sui generis rights.

GDPR and Personal Data

If your scraping touches personal data (names, email addresses, profile information, anything that can identify an individual), GDPR applies if you are in the EU/UK or scraping data about EU/UK residents.

GDPR requires a lawful basis for processing personal data. The two bases most relevant to scrapers are "legitimate interests" (Article 6(1)(f)) and "consent" (Article 6(1)(a)). Consent is impractical for scraping at scale. Legitimate interests can work, but you must conduct and document a Legitimate Interests Assessment balancing your interests against the data subject's rights.

Other GDPR obligations include providing transparency notices (Article 14 - you must inform data subjects you are processing their data, with limited exceptions), respecting data subject rights (access, deletion, objection), and using appropriate security measures.

The UK Information Commissioner's Office and EU data protection authorities have fined multiple companies for scraping personal data without proper basis. Clearview AI was fined GBP 7.5 million by the ICO in 2022 for scraping faces from social media without consent.

CCPA and US State Privacy Laws

California's CCPA (and its successor CPRA) applies to businesses scraping personal data about California residents. The bar for "business" coverage is reasonably high (USD 25 million in revenue, processing data on 100,000 plus consumers, or significant data sales), but if you cross those thresholds, CCPA imposes disclosure, access, and deletion obligations similar to (but lighter than) GDPR.

Other US states (Virginia, Colorado, Connecticut, Utah, Texas, Oregon, and a growing list) now have their own privacy laws with varying scraping implications. If you operate in or scrape data about US residents at scale, you need a privacy lawyer to map the requirements across states.

Copyright and Database Rights

Scraping factual data (prices, addresses, basic product information) is generally fine from a copyright standpoint - facts are not copyrightable. Scraping creative content (articles, photos, descriptions, reviews) creates copyright exposure if you republish or use the scraped content beyond fair use / fair dealing.

The EU has an additional protection called "sui generis database rights" (Database Directive 96/9/EC) which can protect the substantial extraction of data from databases even when individual records are not copyrightable. Several EU cases have found that systematic scraping of structured data can infringe these rights.

Practical rule: scrape facts, summarise rather than copy, and never republish substantial protected content without a licence.

Country-Specific Considerations

United Kingdom

The Computer Misuse Act 1990 criminalises unauthorised access to computer systems. Scraping public data is generally not "unauthorised access," but bypassing technical controls or ignoring explicit prohibitions can be. UK GDPR (post-Brexit) applies to personal data scraping. UK courts have shown willingness to enforce ToS-based injunctions against scrapers. UK has consulted on a text and data mining exception for AI training with opt-out mechanisms.

United States

The CFAA criminalises "unauthorised access to a protected computer." Post-Van Buren (Supreme Court 2021) and hiQ (Ninth Circuit), the bar for unauthorised access is high - you typically need to bypass a technical access control. Most public-data scraping is not criminal in the US. State law (especially California) adds privacy obligations. Common law claims (trespass to chattels, tortious interference) can still bite.

European Union

GDPR is the dominant concern for personal data. Database rights protect substantial extraction from databases. The Digital Services Act adds obligations for very large platforms but does not directly regulate scraping. The EU AI Act (in force 2024-2026) requires disclosure of training data sources for general-purpose AI models. National implementations vary across member states.

Germany

Germany has some of the strictest computer misuse laws in Europe. Section 202a StGB criminalises unauthorised access even to data not protected by passwords if the data is "specially protected against unauthorised access." Courts have interpreted this broadly. Scraping in Germany requires extra care - companies have been criminally prosecuted for scraping that would be entirely legal in the US.

Ethical Web Scraping Practices

Beyond what is strictly legal, ethical scraping practices reduce legal risk, protect target sites, and help maintain a sustainable scraping ecosystem.

  • Rate limit aggressively. Send no more than one request every 5 to 10 seconds to any given site unless you have explicit permission. Burst traffic causes operational problems for the target site and triggers blocks.
  • Identify your scraper in the User-Agent header. Include a contact email so site operators can reach you if there is a problem. Anonymous scraping looks like malicious behaviour.
  • Respect robots.txt. Even when not legally required, treating robots.txt as binding is the strongest signal of good faith.
  • Honour rate limits and error responses. If you get 429 (rate limited) or 503 (service unavailable), back off exponentially. Do not pound a struggling site.
  • Cache aggressively. If you have already scraped a page, do not scrape it again unless you need fresh data. Wasted requests are wasted on both sides.
  • Use official APIs when available. If a site offers an API for the data you need, use it even if scraping would be faster or cheaper. APIs come with explicit terms you can comply with.
  • Do not bypass CAPTCHAs or technical controls. If a site has actively deployed countermeasures, that is a clear signal scraping is unwelcome. Pushing past those controls escalates legal risk dramatically.

Industry-Specific Guidance

E-commerce Price Monitoring

Generally low risk. Prices and product information are factual, public, and not personal data. Most ToS prohibit automated access but enforcement is rare for non-aggressive scrapers. Rate limit, identify yourself, and avoid scraping behind logins.

Real Estate Listings

Moderate risk. Listings are public but often have explicit ToS prohibitions, and aggregator sites are litigious. Database rights claims are credible in the EU. Many real estate platforms now offer licensed APIs - use them where available.

Lead Generation (B2B Contact Data)

Higher risk because of personal data. Even business email addresses (firstname.lastname@company.com) are personal data under GDPR. You need a lawful basis (typically legitimate interests for B2B), a documented LIA, and transparency to data subjects. Licensed providers like Apollo, ZoomInfo, and Cognism are operationally safer than DIY scraping.

News and Media Monitoring

Moderate to high risk because of copyright. Headlines and short summaries are typically fine. Republishing full articles requires a licence. Many publishers offer monitoring API licences - the cost is usually justified versus the litigation risk.

Social Media

High risk. Personal data is everywhere, platforms actively litigate, and post-hiQ rulings (especially Meta v Bright Data) have set important boundaries. Default to using official APIs. If you must scrape, stick to fully public data, do not log in, and consult a lawyer.

Recent Court Rulings 2024-2026

The legal landscape continues to evolve. Key recent rulings:

  • Meta v Bright Data (US, 2024): Public Facebook/Instagram data scraping without login does not breach Meta ToS.
  • Ryanair v Booking.com (EU CJEU, 2024): ToS-based no-scraping clauses are enforceable across EU.
  • Clearview AI ICO ruling (UK, 2022, upheld 2024): Mass face scraping for biometric identification without lawful basis is a serious GDPR breach.
  • Getty Images v Stability AI (UK, 2025): Ongoing case on training image models on copyrighted images scraped from Getty.
  • EU AI Act enforcement (2025-2026): Training data disclosure obligations now in force for GPAI providers.

Compliance Checklist Before You Scrape

Before commissioning any scraping project, work through this checklist:

  • Is the data public (no login or paywall required)?
  • Have you read the site's robots.txt and Terms of Service?
  • Does the data include personal information? If so, do you have a GDPR/CCPA-compliant lawful basis?
  • Are you republishing or just analysing? Is the content copyrighted?
  • Have you set up rate limiting, User-Agent identification and contact information?
  • Is there a licensed API alternative? Have you priced it against the legal risk of scraping?
  • For scaled scraping projects, have you obtained legal advice for your specific use case and jurisdictions?

When to Consult a Lawyer

Most one-off small scrapes of public business data do not warrant formal legal review. The cost of a lawyer outweighs the risk. However, consult a lawyer before scraping when:

  • The data includes personal information and you plan to use it commercially.
  • The target site has explicit anti-scraping ToS and you intend to scrape at scale (thousands of pages or sustained scraping over weeks).
  • You are building a product that directly competes with the source site and depends on their data.
  • You are scraping content for AI training where copyright exposure is significant.
  • You are scraping from sites in or about residents of Germany, France, or other strict jurisdictions.
  • You are bypassing or considering bypassing technical access controls.

How SpiderHunts Approaches Compliance

Our web scraping service is built around a compliance-first methodology. Before we build anything, we map the four legal layers against the specific project: what data, from which sites, in which jurisdictions, for what purpose.

For every project we deliver, we document the lawful basis for any personal data processing, the rate-limiting and User-Agent identification practices we apply, the robots.txt and ToS posture for each target, and the data retention and security controls we implement. We do not bypass CAPTCHAs or scrape behind logins without explicit client confirmation that they have authorisation.

For clients in regulated industries or with sensitive data needs, we work alongside the client's legal counsel to design scraping operations that survive scrutiny. We have delivered scraping projects in financial services, healthcare, real estate, and B2B SaaS where compliance was a hard requirement, not an afterthought.

Whether you are scraping competitor prices, building a market intelligence platform, or generating B2B leads, we build the operation right - with full documentation of why each step is lawful and how compliance is maintained over time.

Need a Compliance-First Scraping Build?

SpiderHunts Technologies builds production-grade Python scraping systems with compliance, rate limiting, and data protection baked in. Free 30-minute consultation - no commitment required.

WhatsApp Us Now Book a Free Meeting

Relevant Services

Services related to this article

Web Scraping Data Science Business Automation