SmartStore AI — Phase 1 Implementation Guide

Database Design & Core Models

This phase builds the actual schema everything else depends on: Store, Product, User, with the foreign keys and indexes that later phases' RAG filtering (Phase 2-3), RBAC (Phase 6), and auth (Phase 5) all key off of directly.

Everything below was run against a real local PostgreSQL 16 instance — not reviewed for syntax and assumed correct. The migration was autogenerated from the actual models, applied, inspected with \d to confirm the real schema, and tested with real INSERT/SELECT queries. One real bug was caught and fixed during this process — documented below, not hidden, because it's a genuinely useful thing to know before you hit it yourself.


1. What got built

backend/app/models/
├── __init__.py      — registers all models on Base (critical for Alembic, see below)
├── store.py          — Store model + the utc_now() helper every model uses
├── product.py         — Product model, store_id is the multi-tenancy boundary
└── user.py            — User model + UserRole enum (Phase 6's RBAC source of truth)

backend/alembic/
├── env.py             — wired to import app.config.settings and app.models
└── versions/
    └── ..._initial_schema_stores_products_users.py

backend/app/seed.py     — populates realistic sample data for every later phase
backend/tests/test_models.py

2. Key design decisions, and why

Money is stored as price_cents: int, never a float. Floating-point arithmetic on currency accumulates rounding errors over enough operations — this is a well-known footgun, and the fix costs nothing: $8.99 is just price_cents=899, and you divide by 100 only at display time.

store_id on Product is the actual multi-tenancy boundary, not an afterthought bolted on later. Phase 6's Row-Level Security policy and Phase 2's Qdrant metadata filter both key off this exact column — designing it in from the first migration, rather than retrofitting it, is what Volume 4 Ch.4 of the bootcamp called "designed in, not bolted on."

UserRole is a single enum, imported everywhere roles matter. Phase 6 builds the actual permission map against this same enum — there is exactly one place in the entire codebase that defines what a valid role is.

Timestamps are timezone-aware (DateTime(timezone=True) + datetime.now(timezone.utc)), not naive. This is the one real bug this phase caught (see Section 4) — naive UTC timestamps via the now-deprecated datetime.utcnow() are a genuinely common source of subtle, hard-to-debug time bugs once a system has users in multiple timezones, which SmartStore AI will, immediately.

app/models/__init__.py imports every model file explicitly. This looks redundant but isn't: Alembic's autogenerate only sees models that have actually been imported somewhere by the time it inspects Base.metadata. A model file that exists but is never imported is invisible to migrations — this __init__.py is what prevents that exact, easy-to-make mistake.


3. Step-by-step: build this yourself

cd backend

# 1. Set your DATABASE_URL (matches docker-compose.yml's postgres service
#    if you're running everything in Docker; the command below assumes
#    you're connecting to whatever Postgres you have running locally)
export DATABASE_URL="postgresql://smartstore:smartstore@localhost:5432/smartstore"

# 2. Generate the migration from the actual model definitions
python -m alembic revision --autogenerate -m "initial schema: stores, products, users"

# 3. ALWAYS review the generated migration file before applying it —
#    autogenerate is very good, not infallible. Open the new file in
#    alembic/versions/ and confirm it matches what you expect.

# 4. Apply it
python -m alembic upgrade head

# 5. Seed realistic sample data
python -m app.seed

# 6. Run the tests
pytest tests/test_models.py -v

Actual verified output from running this exact sequence:

INFO  [alembic.runtime.migration] Running upgrade  -> 69718ea1e8fa, initial schema: stores, products, users

Seeded 2 stores, 5 products, 3 users.

tests/test_models.py::test_create_store PASSED
tests/test_models.py::test_product_belongs_to_store_and_cascades PASSED
tests/test_models.py::test_user_role_defaults_to_shopper PASSED
tests/test_models.py::test_user_email_must_be_unique PASSED
4 passed in 0.32s

And the real schema, inspected directly in Postgres (\d products), confirms the foreign key and cascade behavior actually match the model definition — ON DELETE CASCADE from products.store_id to stores.id, exactly as Product.store declares it.


4. A real bug this process caught (worth knowing about directly)

The first version of these models used datetime.utcnow() as the default for timestamp columns. Running the test suite produced a real, correct warning:

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled
for removal in a future version. Use timezone-aware objects...

This isn't cosmetic — datetime.utcnow() returns a naive datetime (no timezone attached at all), which is a genuine source of bugs the moment your app has users or stores in different timezones, or compares timestamps across systems that assume different defaults. The fix: a shared utc_now() helper using datetime.now(timezone.utc), and DateTime(timezone=True) columns instead of plain DateTime.

The follow-on issue this caused, also worth knowing in advance: changing a column's type (naive → timezone-aware) means the old migration no longer matches the models. Running alembic downgrade base to roll back and regenerate did not clean up the custom Postgres ENUM type (user_role) — Alembic's autogenerated downgrade for enum columns is a known rough edge. The fix was a direct, manual DROP TYPE user_role; before regenerating. If you ever change an enum-backed column and get a DuplicateObject: type "..." already exists error on re-upgrade, this is almost certainly why — check for a leftover Postgres type the downgrade didn't clean up.

This kind of thing is exactly why "review the generated migration before applying it" in Step 3 above isn't a throwaway suggestion — autogenerate is good at detecting new tables/columns, and noticeably less reliable at detecting and reverting type-level changes cleanly.


5. What's next

Phase 2 — Product Catalog Ingestion builds the real ingest.py that takes the Product rows seeded here and actually embeds and stores them in Qdrant — this is where the database layer built in this phase starts feeding the RAG pipeline.