Architecture.

BackPro is a single-tenant deployment that sits inside the client's own cloud account. There is no shared inference plane, no shared vector store, and no cross-tenant data path. The architecture is built so the regulator sees one boundary and the client controls it.

The shape

One boundary. The client controls it.

Every BackPro deployment fits inside a single client tenancy. Read connectors pull data into the tenancy; write surfaces push approved outputs back into the firm's systems of record. The only egress is to the contracted LLM providers, gated by an enterprise contract and scoped by an allowlist of FQDNs. Everything else is internal.

┌────────────────────────────── CLIENT TENANCY ──────────────────────────────┐
│                                                                            │
│   ┌─────────────┐    ┌──────────────────┐    ┌─────────────────────────┐   │
│   │ Read        │    │  BackPro         │    │  Vector store           │   │
│   │ connectors  ├───▶│  inference plane ├───▶│  (per tenancy)          │   │
│   │ (read-only) │    │                  │◀───┤                         │   │
│   └─────────────┘    │  · RAG retriever │    └─────────────────────────┘   │
│         ▲            │  · Reranker      │                                  │
│         │            │  · Model router  │    ┌─────────────────────────┐   │
│   ┌─────┴─────┐      │  · Audit logger  ├───▶│  Cryptographic audit    │   │
│   │ SharePoint│      └────────┬─────────┘    │  log + SIEM export      │   │
│   │ OneDrive  │               │              └─────────────────────────┘   │
│   │ Dropbox   │               │                                            │
│   │ Drive     │               │              ┌─────────────────────────┐   │
│   │ Xero/MYOB │               └─────────────▶│  Write surfaces         │   │
│   │ CRM ...   │                              │  (named-approver gated) │   │
│   └───────────┘                              └─────────────────────────┘   │
│                                                                            │
│                  ▲ identity from client IdP (SAML/OIDC) ▲                  │
│                                                                            │
└────────────────────────────────────┬───────────────────────────────────────┘
                                     │
                                     ▼  enterprise contract egress only
                          ┌──────────────────────┐
                          │  Anthropic / OpenAI  │
                          │  / Google Gemini API │
                          └──────────────────────┘

The boundary

Zero egress, by default.

“Zero egress” is a network-layer claim, not a marketing claim. The production namespaces have a default-deny egress policy. The only outbound FQDNs allowed are the contracted model providers, and even those traverse an enterprise gateway so the client controls the allowlist, observes the calls, and kills the connection at the network layer if a provider is ever withdrawn.

No data is sent to any vendor BackPro has not contracted with. No telemetry leaves the tenancy. The client's own observability stack is the source of truth.

Kubernetes NetworkPolicy, default deny + LLM allowlist

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backpro-default-deny-egress
  namespace: backpro-prod
spec:
  podSelector: {}
  policyTypes: ["Egress"]
  egress:
    # Intra-cluster DNS only
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - { protocol: UDP, port: 53 }
        - { protocol: TCP, port: 53 }

    # Internal services (vector store, audit log) — same namespace
    - to:
        - podSelector: {}

    # Enterprise LLM gateway — single CIDR, observed + auditable
    - to:
        - ipBlock:
            cidr: 10.42.0.0/24   # Allocated to the egress gateway VPC
      ports:
        - { protocol: TCP, port: 443 }

Inference plane

Model choice happens at the workload.

BackPro holds enterprise contracts with Anthropic, OpenAI, and Google (Gemini). Plus the BackPro proprietary models for the workloads where a smaller, AFSL-tuned model outperforms the generalists on accuracy and cost.

The router picks per workload, not per tenancy. A DDQ response (long-context comprehension over 200-question questionnaires) routes differently from an SoA draft (template adherence and best-interests-duty trail). Routing is configuration, not code; the firm reviews and approves the router map at deployment and can override it.

Router map (excerpt)

# Per-workload model assignment. Override at the per-firm level.
workloads:
  ddq.draft_response:
    primary: anthropic/claude-opus-4-7
    fallback: openai/gpt-5
    max_context: 200000
    temperature: 0.2
    require_citations: true

  soa.draft:
    primary: backpro/soa-tuned-v3
    fallback: anthropic/claude-opus-4-7
    max_context: 80000
    temperature: 0.15
    require_template_adherence: true

  audit.evidence_extract:
    primary: backpro/extract-v2
    fallback: openai/gpt-5
    temperature: 0.0
    enable_chunk_provenance: true

III

Retrieval-augmented generation

Every claim traces back to a chunk.

The retriever ingests the firm's source documents (policies, fact-finds, DDQ libraries, audit evidence, board packs). Each document is chunked, embedded, and stored in the per-tenancy vector store. At generation time, the relevant chunks are fetched, passed to the model, and, crucially, the chunk identifiers travel with the output.

Every AI claim BackPro produces carries the chunk reference it was drawn from. The audit trail records the prompt, the retrieval set, the model used, the response, and the human approver. A compliance officer can reproduce any output and verify the source.

Citation envelope around an AI response

{
  "response_id": "rsp_01HZX...",
  "workload": "ddq.draft_response",
  "model": "anthropic/claude-opus-4-7",
  "prompt_hash": "sha256:b3a9...",
  "retrieved_chunks": [
    {
      "chunk_id": "ck_01HZX1a2",
      "doc_id": "doc_2024_odd_q3",
      "doc_title": "Q3 2024 Operational Due Diligence Pack",
      "page": 14,
      "char_range": [4012, 4318],
      "embedding_score": 0.847
    },
    { "chunk_id": "ck_01HZX1b4", "doc_id": "doc_2024_bcp", "page": 3, ... }
  ],
  "response_segments": [
    {
      "text": "Yes. Critical operations are reviewed quarterly...",
      "cited_chunk_ids": ["ck_01HZX1a2"]
    },
    {
      "text": "Substitutability assessment is documented in the BCP...",
      "cited_chunk_ids": ["ck_01HZX1b4"]
    }
  ],
  "approver": null,            // populated when a human signs off
  "approved_at": null
}

Vector store

One store per tenancy. No cross-tenant index.

Each client gets a dedicated vector store inside their own tenancy. No cross-tenant index, no shared embeddings, no cross-firm leakage by retrieval. If the firm hosts on Azure, the store sits in Azure (typically a managed pgvector or equivalent). On AWS, Postgres + pgvector or OpenSearch with kNN. On GCP, AlloyDB or Vertex AI vector search.

The store is treated as an information asset under CPS 234. It is backed up under the same RPO/RTO as the rest of the firm's primary datastores, and the BCP pattern covers its failure modes.

Failure modes

Degraded-mode behaviour.

The system is engineered so that every failure mode collapses to manual, never to silent error. Three classes worth knowing about:

When	What happens	How it recovers
LLM provider unreachable	Router fails the primary, falls to the secondary in <500ms. If both are unreachable, the workload queues. No silent errors, the user sees a Failed state with the reason.	When connectivity restores, the queue drains in order. Idempotency keys on the audit log prevent duplicate side effects.
Vector store unavailable	RAG retrieval fails. The model is not called, generating without retrieval would violate the citation-grounded contract.	Standard pgvector/OpenSearch HA pattern. The BCP runbook covers failover and the integrity check after recovery.
Hallucination signal threshold breached	The response is held in a quarantine bucket and never reaches the user surface. A compliance reviewer is paged.	Human review approves or rejects. The decision becomes training data for the threshold tuner. The original prompt + retrieval set is preserved for repro.

Under NDA

The full architecture runbook, specific model versions, the exact router policy, the threat-modelling write-ups, releases under NDA on request. contact@backpro.ai.

Documentation index NextDeployment