SmartStore AI — Governance Documentation

Per Volume 4, Chapter 11 of the bootcamp: written, even informal, answers to four questions, before a real customer, investor, or security review asks them as a surprise.

1. Data retention

  • User queries and generated answers: retained in PostgreSQL indefinitely as part of the audit trail (Section 3), but the raw text of a query is not displayed back to any user other than the one who asked it.
  • Session state (Redis, Phase 9): expires automatically after 1 hour (ttl_seconds=3600) — this is genuinely ephemeral, not a durable record.
  • Semantic cache entries (Qdrant, Phase 9): no automatic expiry yet — action item: add a TTL or catalog-update-triggered invalidation before this goes to production with real changing inventory (Volume 6, Ch.9's cache-invalidation exercise, not yet implemented).

2. Vendor data handling

  • Questions and retrieved context are sent to Anthropic's API (Claude) and OpenAI's API (embeddings) to generate answers. Both are processed under each provider's API terms — action item before production launch: confirm whether the specific account tier in use has zero-data-retention or no-training-on-API-data terms, and document the answer here explicitly, rather than assuming.
  • No customer data is sent to any provider beyond what's necessary for that specific request (the question text and retrieved product context) — full user records (email, role) are never included in a prompt.

3. Audit trail

  • Every /ask, /ask/agent, and /ask/image request is traced (Phase 10, OpenTelemetry) with store_id and result metadata attached to spans.
  • Action item: Phase 10's tracing currently exports to console (ConsoleSpanExporter) — production needs this pointed at a real backend (Grafana/Tempo, or a managed APM) with retention matching the policy stated in Section 1, and a way to look up "what did the assistant tell user X on date Y" by user ID and timestamp, not just by trace ID.
  • Phase 6's RLS policies and RBAC checks mean that even with full database access, reconstructing "what could user X have seen" is answerable by replaying their role and store scope — this traceability is a direct benefit of Phase 6's design, not an afterthought.

4. Usage policy / what the assistant can do

  • The assistant answers only from retrieved, grounded product/store data (Phase 3's system prompt) — it does not generate unsupported claims about pricing, promotions, or stock beyond what's in the database.
  • Any action with real side effects (Phase 7's agent tools, and any future tool with write access) requires the confirmation/idempotency guardrails from Volume 3, Chapter 11 of the bootcamp — action item: check_store_hours and get_product_location (Phase 7) are both read-only today; the first tool with a real side effect (e.g., "flag low stock") must not ship without this guardrail explicitly implemented and tested, not just documented as a principle.

Standing action items (carried forward, not resolved by writing this document)

  1. Confirm Anthropic/OpenAI account-level data retention terms — Section 2
  2. Add semantic cache invalidation — Section 1
  3. Point OpenTelemetry export at a real, retained backend — Section 3
  4. Build confirmation/idempotency guardrails before any write-capable tool ships — Section 4

This document should be revisited every time a new tool, data source, or vendor is added — not written once and forgotten.