SmartStore AI — Governance Documentation
Per Volume 4, Chapter 11 of the bootcamp: written, even informal, answers to four questions, before a real customer, investor, or security review asks them as a surprise.
1. Data retention
- User queries and generated answers: retained in PostgreSQL indefinitely as part of the audit trail (Section 3), but the raw text of a query is not displayed back to any user other than the one who asked it.
- Session state (Redis, Phase 9): expires automatically after 1 hour (
ttl_seconds=3600) — this is genuinely ephemeral, not a durable record. - Semantic cache entries (Qdrant, Phase 9): no automatic expiry yet — action item: add a TTL or catalog-update-triggered invalidation before this goes to production with real changing inventory (Volume 6, Ch.9's cache-invalidation exercise, not yet implemented).
2. Vendor data handling
- Questions and retrieved context are sent to Anthropic's API (Claude) and OpenAI's API (embeddings) to generate answers. Both are processed under each provider's API terms — action item before production launch: confirm whether the specific account tier in use has zero-data-retention or no-training-on-API-data terms, and document the answer here explicitly, rather than assuming.
- No customer data is sent to any provider beyond what's necessary for that specific request (the question text and retrieved product context) — full user records (email, role) are never included in a prompt.
3. Audit trail
- Every
/ask,/ask/agent, and/ask/imagerequest is traced (Phase 10, OpenTelemetry) withstore_idand result metadata attached to spans. - Action item: Phase 10's tracing currently exports to console (
ConsoleSpanExporter) — production needs this pointed at a real backend (Grafana/Tempo, or a managed APM) with retention matching the policy stated in Section 1, and a way to look up "what did the assistant tell user X on date Y" by user ID and timestamp, not just by trace ID. - Phase 6's RLS policies and RBAC checks mean that even with full database access, reconstructing "what could user X have seen" is answerable by replaying their role and store scope — this traceability is a direct benefit of Phase 6's design, not an afterthought.
4. Usage policy / what the assistant can do
- The assistant answers only from retrieved, grounded product/store data (Phase 3's system prompt) — it does not generate unsupported claims about pricing, promotions, or stock beyond what's in the database.
- Any action with real side effects (Phase 7's agent tools, and any future tool with write access) requires the confirmation/idempotency guardrails from Volume 3, Chapter 11 of the bootcamp — action item:
check_store_hoursandget_product_location(Phase 7) are both read-only today; the first tool with a real side effect (e.g., "flag low stock") must not ship without this guardrail explicitly implemented and tested, not just documented as a principle.
Standing action items (carried forward, not resolved by writing this document)
- Confirm Anthropic/OpenAI account-level data retention terms — Section 2
- Add semantic cache invalidation — Section 1
- Point OpenTelemetry export at a real, retained backend — Section 3
- Build confirmation/idempotency guardrails before any write-capable tool ships — Section 4
This document should be revisited every time a new tool, data source, or vendor is added — not written once and forgotten.