Tenant Usage Quotas¶

Tenant usage quotas let you enforce per-tenant limits on the number of actions dispatched within a rolling time window. This is useful for:

Billing enforcement -- cap actions per tenant to match their subscription tier
Abuse prevention -- block runaway automation that floods the pipeline
Fair usage -- ensure no single tenant monopolizes shared infrastructure
Cost control -- degrade to a cheaper provider when a tenant exceeds their budget

How It Works¶

Quota checks run in the gateway dispatch pipeline after the distributed lock is acquired but before rule evaluation. This means quotas take precedence over all rules -- an action that exceeds its tenant's quota is rejected (or warned/degraded) regardless of which rules would have matched.

Dispatch Pipeline:
  1. Acquire distributed lock
  2. *** Quota check ***  <-- here
  3. Rule evaluation
  4. LLM guardrail
  5. Execute / suppress / reroute / ...

Each quota policy defines:

A tenant and namespace scope
An optional provider scope (None = generic catch-all, Some("slack") = per-provider)
A maximum number of actions per time window
An overage behavior that determines what happens when the limit is exceeded

Usage counters are stored in the state backend with epoch-aligned windows, so all gateway instances agree on window boundaries without coordination.

Generic vs. per-provider policies¶

A single (namespace, tenant) pair can hold one generic catch-all policy plus any number of per-provider policies. This lets operators stack a tenant-wide daily cap with per-provider burst caps — e.g. "10,000 actions/day overall and 50 Slack messages/minute." Every dispatch evaluates all policies whose scope matches the outgoing provider, and the strictest applicable outcome wins (Block > Degrade > Warn > Notify). Each policy also maintains its own counter bucket, so a burst on one provider cannot consume another provider's budget.

Policy (`provider` field)	Matches dispatches to	Counter bucket
`None` (generic)	Any provider for the tenant	`{ns}:{tenant}:*:{window}:{idx}`
`Some("slack")`	Only `slack`	`{ns}:{tenant}:slack:{window}:{idx}`
`Some("email")`	Only `email`	`{ns}:{tenant}:email:{window}:{idx}`

When any applicable policy blocks a dispatch, every counter incremented during that call is rolled back — the blocked request does not consume budget on sibling policies.

Configuration¶

Via the Gateway Builder (Rust)¶

use acteon_core::{QuotaPolicy, QuotaWindow, OverageBehavior};
use acteon_gateway::GatewayBuilder;

let gateway = GatewayBuilder::new()
    .state(state)
    .lock(lock)
    // Generic tenant-wide daily cap.
    .quota_policy(QuotaPolicy {
        id: "q-001".into(),
        namespace: "notifications".into(),
        tenant: "acme".into(),
        provider: None, // generic: counts every dispatch
        max_actions: 1000,
        window: QuotaWindow::Daily,
        overage_behavior: OverageBehavior::Block,
        enabled: true,
        created_at: chrono::Utc::now(),
        updated_at: chrono::Utc::now(),
        description: Some("Acme daily limit".into()),
        labels: Default::default(),
    })
    // Per-provider burst cap on Slack — stacks with the daily cap.
    .quota_policy(QuotaPolicy {
        id: "q-002".into(),
        namespace: "notifications".into(),
        tenant: "acme".into(),
        provider: Some("slack".into()),
        max_actions: 50,
        window: QuotaWindow::Custom { seconds: 60 },
        overage_behavior: OverageBehavior::Block,
        enabled: true,
        created_at: chrono::Utc::now(),
        updated_at: chrono::Utc::now(),
        description: Some("Acme Slack burst cap".into()),
        labels: Default::default(),
    })
    .build()?;

Via the REST API¶

Create, read, update, and delete quota policies through the /v1/quotas endpoints. See the API Reference below.

Via TOML Configuration¶

[[quotas]]
id = "q-acme-daily"
namespace = "notifications"
tenant = "acme"
# provider field omitted → generic tenant-wide policy
max_actions = 1000
window = "daily"
overage_behavior = "block"
enabled = true
description = "Acme daily limit"

[[quotas]]
id = "q-acme-slack-burst"
namespace = "notifications"
tenant = "acme"
provider = "slack"            # per-provider burst cap
max_actions = 50
window = { custom = { seconds = 60 } }
overage_behavior = "block"
enabled = true
description = "Acme Slack burst cap"

Quota Windows¶

Window	Duration	Description
`hourly`	1 hour	Rolling 3,600-second window
`daily`	24 hours	Rolling 86,400-second window
`weekly`	7 days	Rolling 604,800-second window
`monthly`	30 days	Rolling 2,592,000-second window
`custom`	N seconds	Arbitrary window duration

All windows are epoch-aligned, meaning the window start is computed as floor(unix_timestamp / window_seconds) * window_seconds. This ensures that all gateway instances agree on when a window starts and ends without any coordination.

Custom Windows¶

For non-standard billing periods, use the custom window with a duration in seconds:

{
  "window": {"custom": {"seconds": 7200}}
}

This creates a rolling 2-hour window.

Overage Behaviors¶

When a tenant's usage reaches the configured max_actions limit, the overage_behavior determines what happens next.

Block¶

The action is rejected immediately. The gateway returns ActionOutcome::QuotaExceeded and the counter is not incremented (the rejected action does not count toward usage).

{
  "overage_behavior": "block"
}

Outcome: QuotaExceeded { tenant, limit, used, overage_behavior: "block" }

Warn¶

The action is allowed to proceed. The counter is incremented past the limit. The gateway emits a warning log and increments the quota_warned metric.

{
  "overage_behavior": "warn"
}

This is useful for soft limits where you want visibility into overages without disrupting tenants.

Degrade¶

The action's target provider is swapped to the configured fallback_provider and the gateway re-enters its dispatch pipeline through that provider. The caller sees a normal Executed outcome if the fallback succeeds — there is no separate "degraded" success state.

{
  "overage_behavior": {"degrade": {"fallback_provider": "log"}}
}

Fallback re-check semantics: after the provider swap, the gateway re-evaluates only provider-scoped policies targeting the new provider. The generic (tenant-wide) policy that triggered the initial degrade is not double-charged, but any per-provider cap on the fallback is enforced. This closes the "degrade-to-bypass" hole where a fallback provider's rate limit would otherwise be silently ignored.

A misconfigured chain of degrade policies (A → B → C → …) is bounded at 3 hops per dispatch; past that limit the gateway returns QuotaExceeded instead of cascading further.

Notify¶

The action is allowed to proceed. The gateway increments the counter and sends a notification to the configured target (e.g., an email address or webhook URL).

{
  "overage_behavior": {"notify": {"target": "admin@example.com"}}
}

API Reference¶

All endpoints live under /v1/quotas. Namespace and tenant are provided as query parameters.

`POST /v1/quotas` -- Create¶

Create a new quota policy.

Request body:

{
  "namespace": "notifications",
  "tenant": "acme",
  "max_actions": 1000,
  "window": "daily",
  "overage_behavior": "block",
  "enabled": true,
  "description": "Acme daily limit",
  "labels": {"tier": "premium"}
}

Field	Type	Required	Description
`namespace`	string	Yes	Namespace scope
`tenant`	string	Yes	Tenant scope
`max_actions`	integer	Yes	Maximum actions per window
`window`	string/object	Yes	`"hourly"`, `"daily"`, `"weekly"`, `"monthly"`, or `{"custom": {"seconds": N}}`
`overage_behavior`	string/object	Yes	`"block"`, `"warn"`, `{"degrade": {"fallback_provider": "..."}}`, or `{"notify": {"target": "..."}}`
`enabled`	bool	No	Whether the policy is active (default: `true`)
`description`	string	No	Human-readable description
`labels`	object	No	Arbitrary key-value labels

Response (201):

{
  "id": "q-019462a1-...",
  "namespace": "notifications",
  "tenant": "acme",
  "max_actions": 1000,
  "window": "daily",
  "overage_behavior": "block",
  "enabled": true,
  "created_at": "2026-02-10T12:00:00Z",
  "updated_at": "2026-02-10T12:00:00Z"
}

`GET /v1/quotas` -- List¶

List all quota policies, optionally filtered by namespace and tenant.

Query parameters:

Parameter	Type	Required	Description
`namespace`	string	No	Filter by namespace
`tenant`	string	No	Filter by tenant

Response (200):

{
  "quotas": [
    {
      "id": "q-019462a1-...",
      "namespace": "notifications",
      "tenant": "acme",
      "max_actions": 1000,
      "window": "daily",
      "overage_behavior": "block",
      "enabled": true,
      "description": "Acme daily limit",
      "created_at": "2026-02-10T12:00:00Z",
      "labels": {"tier": "premium"}
    }
  ]
}

`GET /v1/quotas/{id}` -- Get Detail¶

Retrieve the full definition of a quota policy.

Query parameters: namespace, tenant

Response (200): Full QuotaPolicy object.

Response (404): {"error": "quota policy not found"}

`PUT /v1/quotas/{id}` -- Update¶

Update an existing quota policy. Only provided fields are changed.

Query parameters: namespace, tenant

Request body (partial):

{
  "max_actions": 2000,
  "description": "Upgraded to premium tier"
}

Updatable fields: max_actions, window, overage_behavior, enabled, description, labels.

Response (200): Updated QuotaPolicy object.

`DELETE /v1/quotas/{id}` -- Delete¶

Permanently delete a quota policy. The usage counter is also cleaned up.

Query parameters: namespace, tenant

Response (204): No content.

`GET /v1/quotas/{id}/usage` -- Get Usage¶

Retrieve the current usage for a quota policy within the active window.

Query parameters: namespace, tenant

Response (200):

{
  "tenant": "acme",
  "namespace": "notifications",
  "used": 742,
  "limit": 1000,
  "remaining": 258,
  "window": "daily",
  "resets_at": "2026-02-11T00:00:00Z",
  "overage_behavior": "block"
}

Usage Examples¶

Create a quota policy¶

curl -X POST http://localhost:8080/v1/quotas \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "notifications",
    "tenant": "acme",
    "max_actions": 1000,
    "window": "daily",
    "overage_behavior": "block",
    "description": "Acme daily notification limit"
  }'

Check current usage¶

curl "http://localhost:8080/v1/quotas/q-001/usage?namespace=notifications&tenant=acme"

Update limit (upgrade tier)¶

curl -X PUT "http://localhost:8080/v1/quotas/q-001?namespace=notifications&tenant=acme" \
  -H "Content-Type: application/json" \
  -d '{"max_actions": 5000}'

Disable a quota temporarily¶

curl -X PUT "http://localhost:8080/v1/quotas/q-001?namespace=notifications&tenant=acme" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

Delete a quota¶

curl -X DELETE "http://localhost:8080/v1/quotas/q-001?namespace=notifications&tenant=acme"

Monitoring¶

Prometheus Metrics¶

Metric	Type	Description
`quota_exceeded`	Counter	Actions blocked by quota (Block behavior)
`quota_warned`	Counter	Actions that exceeded quota but were allowed (Warn behavior)
`quota_degraded`	Counter	Actions degraded to a fallback provider (Degrade behavior)

Structured Logging¶

Event	Level	Description
Quota exceeded (block)	info	`quota exceeded — blocking action` with tenant, limit, used
Quota exceeded (warn)	warn	`quota exceeded — warning, allowing action` with tenant, limit, used
Quota exceeded (degrade)	info	`quota exceeded — degrading to fallback provider` with tenant, fallback
Quota exceeded (notify)	info	`quota exceeded — notifying target` with tenant, target

Best Practices¶

Start with Warn: Use Warn behavior initially to understand usage patterns before switching to Block.
Set meaningful descriptions: Always include a description so quota policies are easy to identify in the UI and logs.
Use labels for tier management: Labels like tier: premium or plan: enterprise make it easy to filter and bulk-update policies.
Monitor the quota_warned metric: A rising warning count may indicate a tenant needs an upgrade or that limits need adjustment.
Prefer daily or hourly windows: These align with natural billing cycles and are easier to reason about than custom windows.
Disable before deleting: Disable a policy first to verify there are no unintended side effects before permanently removing it.

Limitations¶

Per-bucket policy cap: Each (namespace, tenant) bucket may hold at most 32 policies (one generic + up to 31 per-provider caps). Creation attempts past that cap return 409 Conflict.
Identifier character set: namespace, tenant, and provider identifiers must not contain : (the state-key separator) or ASCII control characters, and must be ≤128 bytes. This prevents cross-tenant counter-key collisions.
Counter precision: Counters are stored as strings in the state backend and incremented non-atomically (read + write). In extremely high-throughput scenarios, a small number of actions may slip through just above the limit.
Hot-key contention on the generic counter: Every dispatch for a tenant hits the single * (generic) counter key. In very high-throughput scenarios this can become a bottleneck in the state backend. Sharded counters are a planned follow-up — open an issue if you hit this limit in practice.
list_quotas full scan: The list endpoint currently scans every quota record and filters in-memory. For deployments with thousands of policies, the endpoint will become slow or timeout. A secondary index for filtered lookups is planned.
Ghost consumption on rollback failure: When a Block outcome fires, the gateway rolls back every counter it just incremented using best-effort compensating decrements. If a decrement fails (state store blip), the counter stays slightly inflated — the tenant may be blocked marginally earlier than their actual usage warrants. Reconciliation is handled implicitly by window expiry.
No per-action-type quotas: Quotas apply to all action types within a namespace:tenant scope. Use rules for action-type-level control.
Window granularity: The minimum effective window is determined by the state backend's TTL precision (typically 1 second).

Tenant Usage Quotas¶

How It Works¶

Generic vs. per-provider policies¶

Configuration¶

Via the Gateway Builder (Rust)¶

Via the REST API¶

Via TOML Configuration¶

Quota Windows¶

Custom Windows¶

Overage Behaviors¶

Block¶

Warn¶

Degrade¶

Notify¶

API Reference¶

POST /v1/quotas -- Create¶

GET /v1/quotas -- List¶

GET /v1/quotas/{id} -- Get Detail¶

PUT /v1/quotas/{id} -- Update¶

DELETE /v1/quotas/{id} -- Delete¶

GET /v1/quotas/{id}/usage -- Get Usage¶

Usage Examples¶

Create a quota policy¶

Check current usage¶

Update limit (upgrade tier)¶

Disable a quota temporarily¶

Delete a quota¶

Monitoring¶

Prometheus Metrics¶

Structured Logging¶

Best Practices¶

Limitations¶

`POST /v1/quotas` -- Create¶

`GET /v1/quotas` -- List¶

`GET /v1/quotas/{id}` -- Get Detail¶

`PUT /v1/quotas/{id}` -- Update¶

`DELETE /v1/quotas/{id}` -- Delete¶

`GET /v1/quotas/{id}/usage` -- Get Usage¶