Skip to content
AI & Machine Learning

Proxies for safety testing and red-team evaluation of models

Testing model guardrails across jurisdictions, accessing geoblocked content to build adversarial sets, and running safety evaluations that reflect the real geographic distribution of end-users. Used by safety teams at labs, third-party auditors, and compliance review.

Why safety evaluation needs geographic diversity

Model guardrails are not uniform across the globe. A well- documented and public behaviour is that commercial LLM providers apply different content policy to prompts that carry different regional signals — sometimes by provider-attached metadata, sometimes by IP-layer geography, sometimes by user-supplied locale. Any serious safety evaluation has to reflect this.

Three concrete workflows that SquadProxy supports:

1. Jurisdictional guardrail mapping

For a given model and a given prompt, measure the guardrail response from residential IPs in 10+ countries. The resulting map shows which content the model refuses globally, which it refuses only in specific jurisdictions, and which it approves everywhere. This is now a standard input to responsible-disclosure reports and, increasingly, to procurement audits.

2. Adversarial example collection

Red-team corpora — datasets of adversarial prompts and model outputs — are often gated behind access controls on academic research platforms. Some of those platforms geoblock outside their host country's academic network. Residential exits in the hosting country (usually US or UK) unlock collection without resorting to credential abuse.

3. Geoblocked content for adversarial evaluation

Testing whether a model refuses to produce content that is illegal in the evaluator's jurisdiction but legal elsewhere requires access to legal-elsewhere content for the eval dataset construction. This is a research workflow that has specific legal handling — SquadProxy customers doing this work typically have documented approval from their institutional review board or equivalent.

What SquadProxy does not support

The AUP explicitly prohibits:

  • Generating content that would be illegal under UK law (SquadProxy Ltd. is UK-incorporated) regardless of the user's jurisdiction
  • Testing that is framed as "red team" but whose actual purpose is generating illegal content for downstream use
  • Targeting models used to protect vulnerable populations (child safety classifiers, crisis-line triage systems) without explicit coordination with the operator

Safety and red-team customers are reviewed at onboarding. We ask for the institutional context, the evaluation scope, and the disclosure plan. We do not sell proxy access to "red-team" customers whose actual workflow is extraction of policy-violating model output for sale.

A reference setup

# Map guardrail responses for one prompt across 10 countries
from dataclasses import dataclass
@dataclass
class GuardrailResult:
    country: str
    response: str
    refused: bool
    latency_ms: float

COUNTRIES = ["us","gb","de","fr","jp","nl","ca","sg","kr","au"]
results = []
for country in COUNTRIES:
    r = call_model(
        prompt=ADVERSARIAL_PROMPT,
        proxy_class="residential",
        proxy_country=country,
        rotation="per-request",
    )
    results.append(GuardrailResult(country, r.text, classify_refusal(r), r.latency))

Capture per-country results with full timestamps, exit IPs, and response bodies. Publish methodology. Disclose findings to the affected provider through their standard channel before anything else.

The jurisdictional framework

Safety red-team work crosses jurisdictional lines by design — that's the methodological point. The countries SquadProxy operates in each impose their own legal framework on the activity:

  • UK (where SquadProxy Ltd. is registered) — the Online Safety Act, Computer Misuse Act, and the UK's implementation of the GDPR. Our AUP is written against UK law first.
  • EU-member countries (DE, FR, NL) — GDPR plus the Digital Services Act's safety-testing carve-out for "authorised researchers." The DSA carve-out is narrower than most researchers assume; it requires a documented relationship with a regulator or a Digital Services Coordinator.
  • US — CFAA (post-hiQ v. LinkedIn), plus state-level privacy regimes. The CFAA framing around "authorised access" matters: red-team work against a public API is different from red-team work behind authentication, and the legal footprint differs.
  • JP, KR, SG, AU, CA — each has its own content-safety framework and each treats cross-border research differently.

A safety red-team workload that operates across multiple jurisdictions needs to be scoped against the most restrictive applicable framework, not the most permissive. SquadProxy's AUP and DPA document the constraints on the infrastructure side; the research constraints on the customer side are a separate matter.

Reproducibility for safety evaluation

Safety evaluation published as a formal assessment (responsible disclosure, red-team report to a provider, compliance audit) needs a higher reproducibility bar than general model evaluation:

  • Exit class + country + session settings per row (same as LLM evaluation)
  • Full prompt text preserved — safety evals often operate on text that triggers the target's guardrails, so preserving the exact prompt at the time of evaluation is critical. Hash both the prompt and the response and store both hashes alongside the raw text.
  • Run timestamp to the second — providers change safety stacks on deployment-level timelines, so a safety eval run is as much a snapshot of a moment as of a model.
  • Exit IP — captured for forensic purposes, though the pool rotates. Some platforms honor "please route this request to the same IP" for the purpose of disclosure follow-up; sticky sessions serve this purpose.
  • Provider model version pin where the API exposes it.
  • Disclosure timeline — who was notified, when, and with what scope.

Without these, a year-later re-run of the same eval can't be compared to the original, and a regulator asking "did the provider fix this?" can't be answered.

Disclosure and coordination

Responsible disclosure is where safety red-team work differs most from general evaluation. Standard disclosure flow for SquadProxy customers:

  1. Before the evaluation starts: notify the provider of the intended scope if the scope is more than lightly exploratory. Most providers have a designated security contact (e.g., security@ or a VDP on HackerOne).
  2. During the evaluation: keep rate-of-requests within a polite envelope. A safety red-team run that looks like a DoS is not a red-team run; it's abuse.
  3. On finding a material issue: stop the evaluation on that specific vector. Write it up. Submit to the provider through their disclosure channel. Agree on a disclosure window (typically 45-90 days).
  4. After disclosure: publish after the window closes or after provider confirms they're ready. Cite SquadProxy as the infrastructure used; we don't publish a customer list but we support customers who disclose us as their routing vendor.

We do not broker unauthorized red-team access to third-party systems. The infrastructure supports legitimate safety research; the research scope and authorization are the customer's responsibility.

Audit trail requirements

For safety red-team customers, SquadProxy retains the following metadata against the account for the minimum period required under applicable regulation (and no longer than our DPA specifies):

  • Request timestamp
  • Source IP (customer-side)
  • Destination domain (not URL — we do not log URLs by default)
  • Exit class, country, session identifier
  • Response status code (not body)

This allows a later inquiry ("on day X, did customer Y route traffic to provider Z?") to be answered without the proxy operator holding content-level logs. The audit shape is deliberately thin enough to protect customer confidentiality while allowing compliance response when legally compelled.

Customers who need richer audit (research ethics board documentation, institutional audit trails) should run their own request-level logging on their side — our logs are infrastructure metadata, not research artifacts.

Who we work with

SquadProxy safety customers fall into three categories:

  1. Frontier lab internal safety teams — red-team eval of their own models pre-release, or post-release incident response
  2. Third-party safety auditors — organisations like METR, Apollo, Model Evaluation & Threat Research, academic safety labs — running evaluation as part of contracted work
  3. Compliance and regulatory bodies — for evaluations mandated under the EU AI Act, UK AI Safety Institute remit, or equivalent

We do not onboard customers whose stated or implied workflow is generation of policy-violating content for downstream use that is unrelated to disclosure or methodology. The distinction is consistently clear at the scoping call; ambiguity gets flagged and we'd rather decline than be associated with a misclassified workload.

Pool sizing for safety work

Safety red-team evaluations are bandwidth-light (each prompt + response is small) but concurrency-moderate (a jurisdiction- mapping run against 10 countries ×500 prompts = 5,000 calls). The Team plan ceiling of 1,000 concurrent covers most jurisdictional mapping runs in under an hour. Lab plan's BGP dedicated prefix is useful for customers who need to cite infrastructure stability in the published report. See pricing.

Further reading

Pricing

Pricing for safety and red-team testing

Every plan carries every exit class — pick the one whose bandwidth envelope fits your workload.

Solo

For individual researchers running evaluation scripts and prototype RAG pipelines.

$149/ month

or $1,430/year (save 20%)

50 GB residential · unlimited datacenter · 200 concurrent sessions

  • Access to all 5 exit classes · 10 focus countries
  • 50 GB residential · unlimited datacenter
  • 5 static ISP IPs · 5 GB 4G mobile
  • 1 seat · 200 concurrent sessions
  • Python + Node SDK + REST API
  • Per-request metering (not time-based)
  • Email support (24h response, business days)
  • Overage: $3/GB residential · $6/GB mobile

Best for

  • Solo researchers
  • Evaluation scripts
  • Prototype RAG

Team

Most popular

For AI startups and mid-size labs splitting capacity between training and evaluation.

$699/ month

or $6,710/year (save 20%)

500 GB residential · unlimited datacenter · 1,000 concurrent sessions

  • Access to all 5 exit classes · 10 focus countries
  • 500 GB residential · unlimited datacenter
  • 25 static ISP IPs · 25 GB 4G mobile
  • 10 seats ($29/mo per extra seat) · 1,000 concurrent sessions
  • City-level geo-routing + ASN targeting
  • 99.9% uptime SLA
  • Priority Slack support (4h response, business hours)
  • Python + Node SDK + REST API + webhooks
  • Overage: $3/GB residential · $6/GB mobile

Best for

  • AI startups
  • Mid-size labs
  • Model eval teams

Lab

For academic labs, eval consortia, and frontier model companies running sustained workloads.

$2,999/ month

or $28,790/year (save 20%)

2 TB residential · unlimited DC · 50 GB 4G + 20 GB 5G · 3,000 concurrent sessions

  • Access to all 5 exit classes · 10 countries on 4 continents
  • 2 TB residential · unlimited datacenter
  • 100 static ISP IPs · 50 GB 4G + 20 GB 5G mobile
  • 50 seats ($19/mo per extra seat) · 3,000 concurrent sessions
  • Dedicated gateway lane (bypasses shared-pool queues on us-east-1 + eu-west-1)
  • 99.95% uptime SLA
  • Dedicated Slack channel (1h response, business hours)
  • Custom BGP prefix on request (additional fees apply)
  • Overage: $2.50/GB residential · $5/GB mobile

Best for

  • Academic labs
  • Large eval consortia
  • Frontier model companies

Enterprise

Custom contracts with dedicated infrastructure, volume pricing, and research-grade SLAs.

Custom pricing

Custom (from 5 TB/mo residential) · unlimited concurrent sessions

  • Volume pricing from 5 TB/mo residential
  • Dedicated BGP prefix + ASN announcement
  • Unlimited concurrent sessions · unlimited seats
  • 99.99% uptime SLA with financial credits
  • Named Technical Account Manager + 24/7 on-call paging
  • Custom AUP, DPA, on-site deployment option
  • Research / academic discount (30–50% off Team or Lab)
  • Annual contract · wire, ACH, USDC/USDT/BTC settlement

Best for

  • Frontier labs
  • Eval consortia
  • Enterprise AI

All plans include 14-day refund, single endpoint with regional failover, HTTP(S) + SOCKS5 on every exit class, access to all 5 exit classes and all 10 focus countries, and Python + Node SDKs. Concurrent sessions = simultaneous TCP sessions through the gateway. Overage warnings fire at 80% and 100%; traffic continues only if overage billing is enabled on your account.

Ship on a proxy network you can actually call your ops team about

Real ASNs, real edge capacity, and an engineer who answers your Slack the first time.