Adaptive Compression Plan
This plan upgrades Aionis from static compression rollup to a budget-driven adaptive compression engine while keeping auditability and stable production latency.
Why
Current state is solid but limited:
- Compression rollup exists and is non-destructive.
recall_textalready prefers summary-first rendering.- Compression gain is material but modest in baseline docs (
4699 -> 4244context chars).
Target state:
- Per-request token budget control.
- Adaptive context compaction under load/large context.
- Measured compression KPI in CI and release gate.
Scope
In scope:
- Open Core API and recall path.
- Context rendering and compaction policy.
- SDK input type alignment (TypeScript + Python).
- Operator/docs/runbook updates.
Out of scope:
- Hosted-only internal policy engine.
- Model-specific semantic summarization service.
- Breaking response schema changes.
Success Metrics
Primary:
context_compression_ratio >= 0.40on representative long-memory workload.answer_quality_retain >= 0.95versus non-compressed baseline (task-specific eval set).recall_text p95regression <=+10%.
Secondary:
rate_limited_recall_text_embeddoes not regress in steady traffic.consistencychecks remain clean for compression citations.
Rollout Phases
Phase 1 (Now): Budget-Driven Context Compaction
Status: completed
Deliverables:
- Add request knobs:
context_token_budget?: numbercontext_char_budget?: number
- Add server default:
MEMORY_RECALL_TEXT_CONTEXT_TOKEN_BUDGET_DEFAULT(0disables).
- Apply compaction only to
context.text, preservingitemsandcitations. - Add observability fields in recall logs:
- context length, estimated tokens, budget, compaction applied.
Acceptance:
- Existing clients keep working with no request changes.
- With budget set, context text shrinks deterministically.
- Build/contract/docs/sdk checks pass.
Phase 2: Multi-Level Compression Strategy
Status: completed
Deliverables:
- Add section-level policy presets (
balanced,aggressive). - Prefer topic/concept and rule lines before event fanout under tight budgets.
- Add compaction diagnostics in debug block (bounded metadata only).
Acceptance:
- Compression ratio improves beyond Phase 1 on long contexts.
- No citation traceability regression.
Phase 3: Production Gate and Benchmark Standardization
Status: completed
Deliverables:
- Add compression KPI check into
gate:core:prod(non-blocking first, then blocking). - Add production-style benchmark profile for compression.
- Keep LoCoMo/LongMemEval as auxiliary regression only.
Acceptance:
- Release evidence includes compression KPIs.
- Gate fails when compression or quality thresholds breach (after stabilization period).
Risk and Guardrails
- Risk: over-compression hurts answer quality. Control: section-priority policy + quality-retain metric.
- Risk: unpredictable output shape. Control: only compact
context.text; keepitemsandcitationsstable. - Risk: per-model token mismatch. Control: use conservative token estimator first; calibrate by model family later.
Execution Checklist
- Implement Phase 1 code path and API schema.
- Update API contract and operator runbook.
- Align SDK input types.
- Run:
npm run -s buildnpm run -s test:contractnpm run -s docs:checknpm run -s sdk:release-checknpm run -s sdk:py:release-check
- Ship and observe one release cycle before Phase 2.