Skip to content
Datacenter

Datacenter throughput for training corpus work at TB scale

Unmetered bandwidth, deterministic IPs, sub-10ms latency to the nearest edge. The SquadProxy core product for bulk training-corpus collection, open-source dataset downloads, and evaluation against non-hostile AI research infrastructure.

IP allocation
Shared pool or dedicated
Latency
< 10ms to regional AI edge
Bandwidth
Unmetered (dedicated)
Protocols
HTTP, HTTPS, SOCKS5
Rotation
None (static) or gateway-level
Uptime SLA
99.99%

Why SquadProxy treats datacenter as a first-class product

Most proxy vendors de-emphasise datacenter because their marketing centres on hostile-target scraping, where residential and mobile win. AI data collection is the opposite. The majority of training- corpus volume comes from sources that are:

  • open by design (Common Crawl, Wikimedia dumps, arXiv bulk downloads, HuggingFace dataset hosting)
  • tolerant of cloud ASN origins (GitHub, Stack Exchange dumps, Reddit public dumps, academic preprint servers)
  • explicitly served from CDN edges optimised for large parallel reads

Common Crawl alone publishes roughly 2–3 billion pages per monthly crawl (around 250–450 TiB uncompressed), and for teams working off that baseline the bandwidth economics of residential don't close. Datacenter is the right answer on cost, latency, and — importantly — provenance: an AWS Ashburn exit has a clean, auditable identity. For corpus documented in a model card, that matters.

The AI-infrastructure-adjacent case

Beyond bulk collection, datacenter is the right tool when you are running AI workloads that talk to other AI infrastructure:

  • Model API evaluation. Calling OpenAI, Anthropic, Bedrock, or Vertex at sustained QPS needs low, deterministic latency that only a peer-of-peer datacenter can deliver. Running that traffic through residential adds 100–300ms of jitter and poisons your eval timing.
  • Embeddings and vector DB loads. When ingesting scraped corpus into Pinecone, Weaviate, Qdrant, or pgvector, the outbound side (embedding API) and inbound side (DB write) are both hyperscaler-hosted. Running the collection pipeline on a datacenter exit in the same region keeps the whole loop on fast paths.
  • Internal API and partner allowlist. You may be scraping a partner's API with an IP allowlist. Datacenter is the only class of exit that works for that, because it's the only one with deterministic IPs.

How SquadProxy structures datacenter

  • Shared pools — multi-tenant /24s across nine regions, HTTP/SOCKS5. Bills on bandwidth. Best for breadth across many small targets.
  • Dedicated — /29 or larger allocations just for you, unmetered bandwidth, priced per IP per month. Best for TB-scale sustained collection on a short list of targets.
  • Private ASN + BGP. For Lab and Enterprise customers running sustained >10 Gbps, we announce a dedicated BGP prefix with a custom ASN. This is the cleanest provenance chain available — any downstream audit of the data source ties back to an ASN exclusively under the customer's use.

Edges

  • US East (Ashburn) — 40 Gbps uplink, AWS us-east-1 peer.
  • US West (Oregon + San Jose) — 20 Gbps, AWS us-west-2 peer.
  • EU (Frankfurt + Amsterdam + London) — 60 Gbps combined, AWS/GCP/Azure peers.
  • APAC (Tokyo + Singapore + Seoul + Sydney) — 30 Gbps combined.

Honesty about subnet reputation

Datacenter IPs do not look residential to platforms that fingerprint subnet origin. For targets that care — a small fraction of AI-relevant collection — residential is the right answer and we'll tell you so. For the large majority of AI corpus volume that doesn't, paying residential rates is waste.

We run continuous reputation sweeps on shared pools; any /24 that takes a hit on Spamhaus or a major commercial list gets quarantined and returned to rotation only after the underlying issue is fixed.

statusmarkets10exit classes5edge rtt<50msuptime99.9%ops24/7

Pricing

Pricing for datacenter

Every plan includes the datacenter pool across every country we operate.

Solo

For individual researchers running evaluation scripts and prototype RAG pipelines.

$149/ month

or $1,430/year (save 20%)

50 GB residential · unlimited datacenter · 200 concurrent sessions

  • Access to all 5 exit classes · 10 focus countries
  • 50 GB residential · unlimited datacenter
  • 5 static ISP IPs · 5 GB 4G mobile
  • 1 seat · 200 concurrent sessions
  • Python + Node SDK + REST API
  • Per-request metering (not time-based)
  • Email support (24h response, business days)
  • Overage: $3/GB residential · $6/GB mobile

Best for

  • Solo researchers
  • Evaluation scripts
  • Prototype RAG

Team

Most popular

For AI startups and mid-size labs splitting capacity between training and evaluation.

$699/ month

or $6,710/year (save 20%)

500 GB residential · unlimited datacenter · 1,000 concurrent sessions

  • Access to all 5 exit classes · 10 focus countries
  • 500 GB residential · unlimited datacenter
  • 25 static ISP IPs · 25 GB 4G mobile
  • 10 seats ($29/mo per extra seat) · 1,000 concurrent sessions
  • City-level geo-routing + ASN targeting
  • 99.9% uptime SLA
  • Priority Slack support (4h response, business hours)
  • Python + Node SDK + REST API + webhooks
  • Overage: $3/GB residential · $6/GB mobile

Best for

  • AI startups
  • Mid-size labs
  • Model eval teams

Lab

For academic labs, eval consortia, and frontier model companies running sustained workloads.

$2,999/ month

or $28,790/year (save 20%)

2 TB residential · unlimited DC · 50 GB 4G + 20 GB 5G · 3,000 concurrent sessions

  • Access to all 5 exit classes · 10 countries on 4 continents
  • 2 TB residential · unlimited datacenter
  • 100 static ISP IPs · 50 GB 4G + 20 GB 5G mobile
  • 50 seats ($19/mo per extra seat) · 3,000 concurrent sessions
  • Dedicated gateway lane (bypasses shared-pool queues on us-east-1 + eu-west-1)
  • 99.95% uptime SLA
  • Dedicated Slack channel (1h response, business hours)
  • Custom BGP prefix on request (additional fees apply)
  • Overage: $2.50/GB residential · $5/GB mobile

Best for

  • Academic labs
  • Large eval consortia
  • Frontier model companies

Enterprise

Custom contracts with dedicated infrastructure, volume pricing, and research-grade SLAs.

Custom pricing

Custom (from 5 TB/mo residential) · unlimited concurrent sessions

  • Volume pricing from 5 TB/mo residential
  • Dedicated BGP prefix + ASN announcement
  • Unlimited concurrent sessions · unlimited seats
  • 99.99% uptime SLA with financial credits
  • Named Technical Account Manager + 24/7 on-call paging
  • Custom AUP, DPA, on-site deployment option
  • Research / academic discount (30–50% off Team or Lab)
  • Annual contract · wire, ACH, USDC/USDT/BTC settlement

Best for

  • Frontier labs
  • Eval consortia
  • Enterprise AI

All plans include 14-day refund, single endpoint with regional failover, HTTP(S) + SOCKS5 on every exit class, access to all 5 exit classes and all 10 focus countries, and Python + Node SDKs. Concurrent sessions = simultaneous TCP sessions through the gateway. Overage warnings fire at 80% and 100%; traffic continues only if overage billing is enabled on your account.

FAQ

Datacenter Proxies FAQ

  • When should an AI team route through datacenter proxies instead of the other classes?
    Datacenter exits are the right tool when bulk training corpus collection, arxiv / github / huggingface scraping, common crawl mirror access. For other AI workloads, check the residential, ISP, datacenter, or mobile pages — each exit class sits on a different tradeoff.
  • What rotation and session settings do datacenter proxies support?
    Rotation is None (static) or gateway-level. Sticky windows are set via the X-Squad-Session header.
  • Which countries are covered in the datacenter pool?
    Datacenter coverage is live across our full country list. The US, GB, DE, FR, JP, NL, CA, SG, KR, AU pools are our most instrumented.
  • Are datacenter proxies suitable for continuous LLM evaluation pipelines?
    Yes — the datacenter pool is used in production for evaluation workloads against GPT, Claude, and Gemini APIs, particularly when the eval depends on bulk training corpus collection. Our Team and Lab plans include the concurrency needed for sustained runs.

Start routing through datacenter

Real ASNs, real edge capacity, and an engineer who answers your Slack the first time.