Skip to content

Tenant Usage Quotas

Tenant usage quotas let you enforce per-tenant limits on the number of actions dispatched within a rolling time window. This is useful for:

  • Billing enforcement -- cap actions per tenant to match their subscription tier
  • Abuse prevention -- block runaway automation that floods the pipeline
  • Fair usage -- ensure no single tenant monopolizes shared infrastructure
  • Cost control -- degrade to a cheaper provider when a tenant exceeds their budget

How It Works

Quota checks run in the gateway dispatch pipeline after the distributed lock is acquired but before rule evaluation. This means quotas take precedence over all rules -- an action that exceeds its tenant's quota is rejected (or warned/degraded) regardless of which rules would have matched.

Dispatch Pipeline:
  1. Acquire distributed lock
  2. *** Quota check ***  <-- here
  3. Rule evaluation
  4. LLM guardrail
  5. Execute / suppress / reroute / ...

Each quota policy defines:

  • A tenant and namespace scope
  • An optional provider scope (None = generic catch-all, Some("slack") = per-provider)
  • A maximum number of actions per time window
  • An overage behavior that determines what happens when the limit is exceeded

Usage counters are stored in the state backend with epoch-aligned windows, so all gateway instances agree on window boundaries without coordination.

Generic vs. per-provider policies

A single (namespace, tenant) pair can hold one generic catch-all policy plus any number of per-provider policies. This lets operators stack a tenant-wide daily cap with per-provider burst caps — e.g. "10,000 actions/day overall and 50 Slack messages/minute." Every dispatch evaluates all policies whose scope matches the outgoing provider, and the strictest applicable outcome wins (Block > Degrade > Warn > Notify). Each policy also maintains its own counter bucket, so a burst on one provider cannot consume another provider's budget.

Policy (provider field) Matches dispatches to Counter bucket
None (generic) Any provider for the tenant {ns}:{tenant}:*:{window}:{idx}
Some("slack") Only slack {ns}:{tenant}:slack:{window}:{idx}
Some("email") Only email {ns}:{tenant}:email:{window}:{idx}

When any applicable policy blocks a dispatch, every counter incremented during that call is rolled back — the blocked request does not consume budget on sibling policies.

Configuration

Via the Gateway Builder (Rust)

use acteon_core::{QuotaPolicy, QuotaWindow, OverageBehavior};
use acteon_gateway::GatewayBuilder;

let gateway = GatewayBuilder::new()
    .state(state)
    .lock(lock)
    // Generic tenant-wide daily cap.
    .quota_policy(QuotaPolicy {
        id: "q-001".into(),
        namespace: "notifications".into(),
        tenant: "acme".into(),
        provider: None, // generic: counts every dispatch
        max_actions: 1000,
        window: QuotaWindow::Daily,
        overage_behavior: OverageBehavior::Block,
        enabled: true,
        created_at: chrono::Utc::now(),
        updated_at: chrono::Utc::now(),
        description: Some("Acme daily limit".into()),
        labels: Default::default(),
    })
    // Per-provider burst cap on Slack — stacks with the daily cap.
    .quota_policy(QuotaPolicy {
        id: "q-002".into(),
        namespace: "notifications".into(),
        tenant: "acme".into(),
        provider: Some("slack".into()),
        max_actions: 50,
        window: QuotaWindow::Custom { seconds: 60 },
        overage_behavior: OverageBehavior::Block,
        enabled: true,
        created_at: chrono::Utc::now(),
        updated_at: chrono::Utc::now(),
        description: Some("Acme Slack burst cap".into()),
        labels: Default::default(),
    })
    .build()?;

Via the REST API

Create, read, update, and delete quota policies through the /v1/quotas endpoints. See the API Reference below.

Via TOML Configuration

[[quotas]]
id = "q-acme-daily"
namespace = "notifications"
tenant = "acme"
# provider field omitted → generic tenant-wide policy
max_actions = 1000
window = "daily"
overage_behavior = "block"
enabled = true
description = "Acme daily limit"

[[quotas]]
id = "q-acme-slack-burst"
namespace = "notifications"
tenant = "acme"
provider = "slack"            # per-provider burst cap
max_actions = 50
window = { custom = { seconds = 60 } }
overage_behavior = "block"
enabled = true
description = "Acme Slack burst cap"

Quota Windows

Window Duration Description
hourly 1 hour Rolling 3,600-second window
daily 24 hours Rolling 86,400-second window
weekly 7 days Rolling 604,800-second window
monthly 30 days Rolling 2,592,000-second window
custom N seconds Arbitrary window duration

All windows are epoch-aligned, meaning the window start is computed as floor(unix_timestamp / window_seconds) * window_seconds. This ensures that all gateway instances agree on when a window starts and ends without any coordination.

Custom Windows

For non-standard billing periods, use the custom window with a duration in seconds:

{
  "window": {"custom": {"seconds": 7200}}
}

This creates a rolling 2-hour window.

Overage Behaviors

When a tenant's usage reaches the configured max_actions limit, the overage_behavior determines what happens next.

Block

The action is rejected immediately. The gateway returns ActionOutcome::QuotaExceeded and the counter is not incremented (the rejected action does not count toward usage).

{
  "overage_behavior": "block"
}

Outcome: QuotaExceeded { tenant, limit, used, overage_behavior: "block" }

Warn

The action is allowed to proceed. The counter is incremented past the limit. The gateway emits a warning log and increments the quota_warned metric.

{
  "overage_behavior": "warn"
}

This is useful for soft limits where you want visibility into overages without disrupting tenants.

Degrade

The action's target provider is swapped to the configured fallback_provider and the gateway re-enters its dispatch pipeline through that provider. The caller sees a normal Executed outcome if the fallback succeeds — there is no separate "degraded" success state.

{
  "overage_behavior": {"degrade": {"fallback_provider": "log"}}
}

Fallback re-check semantics: after the provider swap, the gateway re-evaluates only provider-scoped policies targeting the new provider. The generic (tenant-wide) policy that triggered the initial degrade is not double-charged, but any per-provider cap on the fallback is enforced. This closes the "degrade-to-bypass" hole where a fallback provider's rate limit would otherwise be silently ignored.

A misconfigured chain of degrade policies (A → B → C → …) is bounded at 3 hops per dispatch; past that limit the gateway returns QuotaExceeded instead of cascading further.

Notify

The action is allowed to proceed. The gateway increments the counter and sends a notification to the configured target (e.g., an email address or webhook URL).

{
  "overage_behavior": {"notify": {"target": "admin@example.com"}}
}

API Reference

All endpoints live under /v1/quotas. Namespace and tenant are provided as query parameters.

POST /v1/quotas -- Create

Create a new quota policy.

Request body:

{
  "namespace": "notifications",
  "tenant": "acme",
  "max_actions": 1000,
  "window": "daily",
  "overage_behavior": "block",
  "enabled": true,
  "description": "Acme daily limit",
  "labels": {"tier": "premium"}
}
Field Type Required Description
namespace string Yes Namespace scope
tenant string Yes Tenant scope
max_actions integer Yes Maximum actions per window
window string/object Yes "hourly", "daily", "weekly", "monthly", or {"custom": {"seconds": N}}
overage_behavior string/object Yes "block", "warn", {"degrade": {"fallback_provider": "..."}}, or {"notify": {"target": "..."}}
enabled bool No Whether the policy is active (default: true)
description string No Human-readable description
labels object No Arbitrary key-value labels

Response (201):

{
  "id": "q-019462a1-...",
  "namespace": "notifications",
  "tenant": "acme",
  "max_actions": 1000,
  "window": "daily",
  "overage_behavior": "block",
  "enabled": true,
  "created_at": "2026-02-10T12:00:00Z",
  "updated_at": "2026-02-10T12:00:00Z"
}

GET /v1/quotas -- List

List all quota policies, optionally filtered by namespace and tenant.

Query parameters:

Parameter Type Required Description
namespace string No Filter by namespace
tenant string No Filter by tenant

Response (200):

{
  "quotas": [
    {
      "id": "q-019462a1-...",
      "namespace": "notifications",
      "tenant": "acme",
      "max_actions": 1000,
      "window": "daily",
      "overage_behavior": "block",
      "enabled": true,
      "description": "Acme daily limit",
      "created_at": "2026-02-10T12:00:00Z",
      "labels": {"tier": "premium"}
    }
  ]
}

GET /v1/quotas/{id} -- Get Detail

Retrieve the full definition of a quota policy.

Query parameters: namespace, tenant

Response (200): Full QuotaPolicy object.

Response (404): {"error": "quota policy not found"}

PUT /v1/quotas/{id} -- Update

Update an existing quota policy. Only provided fields are changed.

Query parameters: namespace, tenant

Request body (partial):

{
  "max_actions": 2000,
  "description": "Upgraded to premium tier"
}

Updatable fields: max_actions, window, overage_behavior, enabled, description, labels.

Response (200): Updated QuotaPolicy object.

DELETE /v1/quotas/{id} -- Delete

Permanently delete a quota policy. The usage counter is also cleaned up.

Query parameters: namespace, tenant

Response (204): No content.

GET /v1/quotas/{id}/usage -- Get Usage

Retrieve the current usage for a quota policy within the active window.

Query parameters: namespace, tenant

Response (200):

{
  "tenant": "acme",
  "namespace": "notifications",
  "used": 742,
  "limit": 1000,
  "remaining": 258,
  "window": "daily",
  "resets_at": "2026-02-11T00:00:00Z",
  "overage_behavior": "block"
}

Usage Examples

Create a quota policy

curl -X POST http://localhost:8080/v1/quotas \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "notifications",
    "tenant": "acme",
    "max_actions": 1000,
    "window": "daily",
    "overage_behavior": "block",
    "description": "Acme daily notification limit"
  }'

Check current usage

curl "http://localhost:8080/v1/quotas/q-001/usage?namespace=notifications&tenant=acme"

Update limit (upgrade tier)

curl -X PUT "http://localhost:8080/v1/quotas/q-001?namespace=notifications&tenant=acme" \
  -H "Content-Type: application/json" \
  -d '{"max_actions": 5000}'

Disable a quota temporarily

curl -X PUT "http://localhost:8080/v1/quotas/q-001?namespace=notifications&tenant=acme" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

Delete a quota

curl -X DELETE "http://localhost:8080/v1/quotas/q-001?namespace=notifications&tenant=acme"

Monitoring

Prometheus Metrics

Metric Type Description
quota_exceeded Counter Actions blocked by quota (Block behavior)
quota_warned Counter Actions that exceeded quota but were allowed (Warn behavior)
quota_degraded Counter Actions degraded to a fallback provider (Degrade behavior)

Structured Logging

Event Level Description
Quota exceeded (block) info quota exceeded — blocking action with tenant, limit, used
Quota exceeded (warn) warn quota exceeded — warning, allowing action with tenant, limit, used
Quota exceeded (degrade) info quota exceeded — degrading to fallback provider with tenant, fallback
Quota exceeded (notify) info quota exceeded — notifying target with tenant, target

Best Practices

  • Start with Warn: Use Warn behavior initially to understand usage patterns before switching to Block.
  • Set meaningful descriptions: Always include a description so quota policies are easy to identify in the UI and logs.
  • Use labels for tier management: Labels like tier: premium or plan: enterprise make it easy to filter and bulk-update policies.
  • Monitor the quota_warned metric: A rising warning count may indicate a tenant needs an upgrade or that limits need adjustment.
  • Prefer daily or hourly windows: These align with natural billing cycles and are easier to reason about than custom windows.
  • Disable before deleting: Disable a policy first to verify there are no unintended side effects before permanently removing it.

Limitations

  • Per-bucket policy cap: Each (namespace, tenant) bucket may hold at most 32 policies (one generic + up to 31 per-provider caps). Creation attempts past that cap return 409 Conflict.
  • Identifier character set: namespace, tenant, and provider identifiers must not contain : (the state-key separator) or ASCII control characters, and must be ≤128 bytes. This prevents cross-tenant counter-key collisions.
  • Counter precision: Counters are stored as strings in the state backend and incremented non-atomically (read + write). In extremely high-throughput scenarios, a small number of actions may slip through just above the limit.
  • Hot-key contention on the generic counter: Every dispatch for a tenant hits the single * (generic) counter key. In very high-throughput scenarios this can become a bottleneck in the state backend. Sharded counters are a planned follow-up — open an issue if you hit this limit in practice.
  • list_quotas full scan: The list endpoint currently scans every quota record and filters in-memory. For deployments with thousands of policies, the endpoint will become slow or timeout. A secondary index for filtered lookups is planned.
  • Ghost consumption on rollback failure: When a Block outcome fires, the gateway rolls back every counter it just incremented using best-effort compensating decrements. If a decrement fails (state store blip), the counter stays slightly inflated — the tenant may be blocked marginally earlier than their actual usage warrants. Reconciliation is handled implicitly by window expiry.
  • No per-action-type quotas: Quotas apply to all action types within a namespace:tenant scope. Use rules for action-type-level control.
  • Window granularity: The minimum effective window is determined by the state backend's TTL precision (typically 1 second).