Skip to content
North America

US · 8 metros

US exits for training corpora, eval, and regional benchmarks

Datacenter throughput from Ashburn and San Jose, residential diversity across 50 states, and carrier-anchored mobile for cellular evaluation. Purpose-built for AI teams collecting corpus at scale.

Carrier ASNs we size against

  • Comcast (AS7922)
  • Charter Spectrum (AS20115)
  • AT&T (AS7018)
  • Verizon FiOS (AS701)
  • Cox Communications (AS22773)
  • Amazon AWS (AS16509)
  • Google Cloud (AS15169)
  • Microsoft Azure (AS8075)
statusmarkets10exit classes5edge rtt<50msuptime99.9%ops24/7

The United States network, up close

What a United States session actually looks like

Why the US matters for AI data collection

A large share of the long-form English content on the open web resolves to a US-hosted origin, and most frontier model providers — OpenAI, Anthropic, Google, Meta, xAI, AWS Bedrock — route production inference through US regions. That makes the United States the default collection surface for three workflows at once: training-corpus scraping, RAG source ingestion, and regional evaluation of commercial LLM APIs.

The SquadProxy US pool is split across three exit classes that share one gateway:

  • Datacenter (primary) — AWS us-east-1 (Ashburn), us-west-2 (Oregon), and GCP us-central1 (Iowa). Used for Common Crawl mirror access, GitHub clones, arXiv scrapes, and HuggingFace dataset downloads where the target does not penalise origin subnet.
  • Residential — real home connections across all 50 states, with city-level routing in the top 30 MSAs. Used for eval workloads where regional IP matters (recommendations, pricing tests, ad rendering) and for RAG sources that block cloud ASNs.
  • Mobile — Verizon, AT&T, T-Mobile LTE/5G exits for cellular-anchored evaluation where mobile-first experiences differ from desktop.

ASN coverage and edge proximity

The asnHighlights list above is the set we actively scale against because they cover the largest slice of US residential traffic and the three hyperscaler ASNs most commonly whitelisted by AI infrastructure vendors. Latency from our east-coast edge to OpenAI, Anthropic, and Google model endpoints typically lands under 40ms at p50 based on our internal measurements.

What the US pool is good for

  • Pre-training corpus at TB scale against non-hostile targets (arXiv, GitHub, Stack Exchange dumps, Wikimedia, Common Crawl indices)
  • RAG source ingestion where regional IP affects content (local news, state government, US-only knowledge bases)
  • Regional evaluation of model APIs — measuring how GPT, Claude, Gemini respond when the request originates from Kansas vs. New York
  • Safety and red-team workflows that require plausible US-resident framing

What it is not good for

  • Scraping behind authentication on platforms whose terms prohibit it
  • Anything that looks like credential-stuffing, inventory-hoarding, or ticket resale automation — the AUP lists these explicitly
  • PII-heavy collection against US consumer services — HIPAA, GLBA, and state privacy statutes make this risky regardless of proxy class

ASN depth: who actually carries US residential traffic

The ASN list on this page is not a marketing feature list; it is the set we actively scale the residential pool against because those ASNs carry the majority of US broadband subscribers. Why each one matters for AI workloads:

  • Comcast (AS7922) — largest US residential ISP by subscriber count. The Comcast pool resolves most national-news geoblocks and a large share of state-level government sites that filter on cloud ASN.
  • Charter Spectrum (AS20115) — second-largest residential ISP, concentrated in the Northeast, Midwest, and California. Important for regional news sites whose primary audience is a Spectrum footprint metro.
  • AT&T (AS7018) — dominant in the South and parts of California. Also carries a non-trivial share of US business broadband at the lower tiers.
  • Verizon FiOS (AS701) — East Coast fiber footprint, particularly in New York, New Jersey, and Washington DC. Highest residential uplink bandwidth of the majors, which matters for LFS-heavy workloads.
  • Cox Communications (AS22773) — regional strength in the Southwest (Arizona, Nevada, Southern California), Virginia, and New England. Useful for US-regional bias measurement.

On the mobile side, Verizon (AS6167), T-Mobile (AS21928), and AT&T Mobility drive essentially all US 4G/5G carrier traffic. The SquadProxy 4G/5G pools are split across all three for independent-carrier evaluation workflows.

On the cloud side, the AWS, GCP, and Azure ASNs in the list are where most AI inference endpoints announce from. Datacenter exits under those ASNs are the right path for scraping cloud-hosted AI infrastructure without tripping anti-scrape tiers that block residential or unknown-ASN traffic.

Top US metros: what each one is useful for

The majorCities list is ordered by pool depth. Each has a distinct use profile for AI workloads:

  • New York — dominant financial services data, strong regional news footprint, heaviest CDN infrastructure. Use for regional news scraping, US East AI inference evaluation, and financial data workloads.
  • San Francisco / Bay Area — the largest concentration of AI infrastructure in the world as of 2026. Local news and regional content in the Bay metro is itself a non-trivial training-corpus source; San Francisco residentials are the anchored origin for eval workloads that specifically test "AI industry home region."
  • Seattle — AWS us-west-2 home region plus Microsoft / Azure epicentre. Useful for AI workloads testing how model APIs route when the request origin is in the same region as the inference POP.
  • Boston — academic research anchor (MIT, Harvard, Northeastern), meaningful regional biotech coverage. Specific use for research-grade eval from academic-region IPs.
  • Austin — growing AI / tech concentration, regional Texas content. Useful for Southern US regional eval without the political / demographic confound of the New York - Washington axis.
  • Los Angeles — large regional news, Spanish-language content footprint, media industry data. Useful for bi-lingual training corpus collection and entertainment-industry competitive intel.
  • Chicago — Midwest anchor, regional business publications, logistics-industry data surface. Good for geographic diversity in evaluation corpora.
  • Washington DC — dense .gov and policy-adjacent content, legislative tracking, regulatory-adjacent AI intel. Specific compliance caveats apply for scraping any .gov or contractor surface.

US legal landscape for AI data collection

The US legal footprint for proxy-routed AI data collection is more settled than it was in 2022-2023 but remains jurisdictionally complex. The frameworks that matter:

  • CFAA (Computer Fraud and Abuse Act) — the long-standing anti-circumvention statute. The hiQ Labs v. LinkedIn saga (settled 2022, subsequent Ninth Circuit rulings) clarified that scraping publicly-accessible content does not constitute "unauthorized access" under CFAA. Scraping behind authentication remains on the wrong side.
  • DMCA Section 1201 — anti-circumvention of technical measures. Relevant when a target uses specific technical controls (token-gated content, rate-limit bypass mechanisms). SquadProxy's AUP explicitly prohibits Section 1201 circumvention workflows.
  • CCPA / CPRA (California) — consumer-privacy regime. Data-subject rights attach to personal information; collecting PII-heavy content through proxies inherits those rights even when the routing is technically neutral.
  • State-level privacy regimes — Virginia, Colorado, Texas, Connecticut, Utah, and a growing list of states have passed consumer-privacy statutes. AI training corpora that include regional content inherit the regime of the content's origin state.
  • Copyright Office guidance on generative AI — the 2023-2025 Copyright Office review produced guidance on "publicly available" training data use. Publication-grade research should track the Office's current guidance.
  • State AI statutes (2025-2026) — Colorado's SB24-205, California's SB 1047 successors, and similar state-level AI legislation are beginning to attach disclosure and impact-assessment requirements to AI systems deployed in those states. The upstream data-collection layer is increasingly in scope.

Your internal legal team owns the bright lines; SquadProxy's role is to provide infrastructure that supports lawful use cases and declines the unlawful ones via the AUP.

Latency and edge routing from the US pool

Our US East edge routes through AWS us-east-1 (Ashburn, Virginia) and the US West edge through us-west-2 (Oregon). Latency from the edge to major AI inference providers on internal synthetics:

  • OpenAI API (us-east-1): ~15-25ms p50 from Ashburn
  • Anthropic API (us-east-2 and us-west-2): ~20-30ms p50
  • Google Gemini (global): ~30-50ms p50 depending on the specific POP your request lands on
  • AWS Bedrock (us-east-1): ~10-20ms p50 from Ashburn

Residential routing adds 50-200ms depending on carrier and time of day; mobile routing adds 80-300ms. For eval workloads that need consistent latency measurement, pin exit class and time-of-day window in the methodology.

Related

Pricing

United States proxies — same price, same network

Every plan covers the United States pool and every other market we operate.

Solo

For individual researchers running evaluation scripts and prototype RAG pipelines.

$149/ month

or $1,430/year (save 20%)

50 GB residential · unlimited datacenter · 200 concurrent sessions

  • Access to all 5 exit classes · 10 focus countries
  • 50 GB residential · unlimited datacenter
  • 5 static ISP IPs · 5 GB 4G mobile
  • 1 seat · 200 concurrent sessions
  • Python + Node SDK + REST API
  • Per-request metering (not time-based)
  • Email support (24h response, business days)
  • Overage: $3/GB residential · $6/GB mobile

Best for

  • Solo researchers
  • Evaluation scripts
  • Prototype RAG

Team

Most popular

For AI startups and mid-size labs splitting capacity between training and evaluation.

$699/ month

or $6,710/year (save 20%)

500 GB residential · unlimited datacenter · 1,000 concurrent sessions

  • Access to all 5 exit classes · 10 focus countries
  • 500 GB residential · unlimited datacenter
  • 25 static ISP IPs · 25 GB 4G mobile
  • 10 seats ($29/mo per extra seat) · 1,000 concurrent sessions
  • City-level geo-routing + ASN targeting
  • 99.9% uptime SLA
  • Priority Slack support (4h response, business hours)
  • Python + Node SDK + REST API + webhooks
  • Overage: $3/GB residential · $6/GB mobile

Best for

  • AI startups
  • Mid-size labs
  • Model eval teams

Lab

For academic labs, eval consortia, and frontier model companies running sustained workloads.

$2,999/ month

or $28,790/year (save 20%)

2 TB residential · unlimited DC · 50 GB 4G + 20 GB 5G · 3,000 concurrent sessions

  • Access to all 5 exit classes · 10 countries on 4 continents
  • 2 TB residential · unlimited datacenter
  • 100 static ISP IPs · 50 GB 4G + 20 GB 5G mobile
  • 50 seats ($19/mo per extra seat) · 3,000 concurrent sessions
  • Dedicated gateway lane (bypasses shared-pool queues on us-east-1 + eu-west-1)
  • 99.95% uptime SLA
  • Dedicated Slack channel (1h response, business hours)
  • Custom BGP prefix on request (additional fees apply)
  • Overage: $2.50/GB residential · $5/GB mobile

Best for

  • Academic labs
  • Large eval consortia
  • Frontier model companies

Enterprise

Custom contracts with dedicated infrastructure, volume pricing, and research-grade SLAs.

Custom pricing

Custom (from 5 TB/mo residential) · unlimited concurrent sessions

  • Volume pricing from 5 TB/mo residential
  • Dedicated BGP prefix + ASN announcement
  • Unlimited concurrent sessions · unlimited seats
  • 99.99% uptime SLA with financial credits
  • Named Technical Account Manager + 24/7 on-call paging
  • Custom AUP, DPA, on-site deployment option
  • Research / academic discount (30–50% off Team or Lab)
  • Annual contract · wire, ACH, USDC/USDT/BTC settlement

Best for

  • Frontier labs
  • Eval consortia
  • Enterprise AI

All plans include 14-day refund, single endpoint with regional failover, HTTP(S) + SOCKS5 on every exit class, access to all 5 exit classes and all 10 focus countries, and Python + Node SDKs. Concurrent sessions = simultaneous TCP sessions through the gateway. Overage warnings fire at 80% and 100%; traffic continues only if overage billing is enabled on your account.

FAQ

United States proxies FAQ

  • Which ASNs are in your United States residential pool?
    Our United States pool is sized against Comcast (AS7922), Charter Spectrum (AS20115), AT&T (AS7018), with secondary coverage on the remaining carriers we list. ASN targeting is available on Team and up.
  • Can I target specific cities in United States?
    Yes — city-level targeting is available via the X-Squad-City header, with a pool size appropriate to each city. Our largest United States pools are in New York, San Francisco, Seattle.
  • What's the legal footprint of using your United States proxies?
    Scraping through US residential and datacenter exits sits within a settled legal landscape around CFAA (post-hiQ v. LinkedIn), copyright (Authors Guild v. Google), and state-level consumer privacy regimes (CCPA/CPRA, Virginia, Colorado, Texas). AI training-data use cases — particularly those relying on "publicly available" content — should be reviewed under the Copyright Office's evolving guidance on generative AI. Our AUP prohibits circumvention of technical access controls and scraping behind authentication.
  • What latency should I expect to United States targets?
    Regional edge latency to major United States metros runs 40–90ms on ISP and 150–300ms on rotating residential. Mobile 4G/5G lands around 80–200ms depending on carrier and time of day.

Ready to route through United States?

Self-serve checkout delivers your US credentials in under a minute.