Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions .agents/skills/db-migrate/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
name: db-migrate
description: Author or review a Drizzle DB migration for zero-downtime safety — expand/contract phasing, backward-compatibility with the deployed app version, and writing the `-- migration-safe` acknowledgment the check:migrations lint requires. Use when adding/editing files under `packages/db/migrations/` or changing `packages/db/schema.ts`.
---

# DB Migrate Skill

You make schema changes that survive a deploy without downtime. The `check:migrations` lint (`scripts/check-migrations-safety.ts`) is the deterministic gate; you are the judgment that decides whether a flagged change is actually safe and writes the annotation that satisfies it.

## The window (why this matters)

A deploy runs the migration, then rolls out the new app image via blue/green. The two are **not atomic and cannot be** — during cutover the old task set keeps serving against the **already-migrated** schema. So:

> Every migration must be backward-compatible with the app version that is *already deployed*.

If a migration drops a column the old code still reads, renames one, or adds a `NOT NULL` the old inserts don't populate, the old code throws until traffic fully shifts — the downtime we're guarding against. You can't fix this by reordering the pipeline; the only fix is discipline.

## Expand / contract

Split every breaking change across **two deploys**:

1. **Expand** (this PR): additive, backward-compatible schema + code that tolerates *both* the old and new shape.
2. **Contract** (a later PR, after expand is fully deployed): remove the old thing, now that nothing reads it.

Never put expand and contract in the same PR. If this PR both removes the code that used a column *and* drops the column, the old code is still live during cutover — split it.

### Per-operation playbook

| You want to | Do (deploy 1 = expand) | Do (deploy 2 = contract) |
|---|---|---|
| Add a required column | `ADD COLUMN` nullable or `DEFAULT`; code writes it | backfill, then `SET NOT NULL` |
| Rename a column/table | add the new name; code dual-writes / reads new-then-old | drop the old name |
| Drop a column/table | stop all reads/writes in code; ship it | `DROP` (annotate) |
| Change a column type | add a new column of the new type; dual-write | backfill, swap reads, drop old |
| Add FK / CHECK | `ADD CONSTRAINT ... NOT VALID` | `VALIDATE CONSTRAINT` separately |
| Index an existing table | `COMMIT;` breakpoint → `SET lock_timeout = 0` → `CREATE INDEX CONCURRENTLY IF NOT EXISTS` (see `packages/db/scripts/migrate.ts`) | — |
| Backfill data | batched + idempotent `UPDATE` (keyset/`WHERE`, bounded) | — |

A `CREATE INDEX`, `ADD COLUMN`, or `ADD CONSTRAINT` against a table **created in the same migration** is always safe (no rows, no live traffic) — the lint already suppresses those.

## The judgment the lint can't do

The lint flags risky *shapes*; it cannot know whether a given drop is *safe right now*. For each flagged statement, do the work it can't:

1. **Is the dependency gone?** Grep the app for the table/column: search `apps/sim` and `packages` for the column name, the Drizzle field (camelCase), and the table object. If any live read/write remains, it is **not** safe — fix the code first.
2. **Did the expand already ship?** The removal of that read/write must be in a deploy that is *already out*, not this same PR. If it's in this PR, split: land the code change now, do the destructive migration in a follow-up after it deploys.
3. **Backfills:** confirm the `UPDATE`/`DELETE` is batched (bounded `WHERE`/keyset, not a single whole-table statement), idempotent (safe to replay — a failed migration re-runs unjournaled files from the top), and safe under concurrent writes from the still-live old app.

## Workflow

1. Edit `packages/db/schema.ts`, then `cd packages/db && bunx drizzle-kit generate` to produce the SQL.
2. Hand-edit the generated SQL where the playbook requires it: `CONCURRENTLY` + `COMMIT;` breakpoint for indexes on existing tables, `NOT VALID` for constraints, batching for backfills.
3. Run `bun run check:migrations` (or `bun run scripts/check-migrations-safety.ts main` locally).
- **Hard errors** (`add-not-null-no-default`, `rename`, `index-not-concurrent`, `constraint-not-valid`, …): rewrite into expand/contract. Do **not** try to annotate them away — the lint won't accept it.
- **Annotate tier** (`drop-table`, `drop-column`, `drop-default`, `set-not-null`, `alter-type`, `drop-index`): only after you've confirmed steps 1–3 above, add a comment on the line directly above the statement:
```sql
-- migration-safe: `secret` read removed in v0.6.1 (#1234), shipped two deploys ago
ALTER TABLE "webhook" DROP COLUMN "secret";
```
The reason must be specific and name the PR/version that removed the dependency. An empty reason fails the lint.
- **Warnings** (`data-backfill`): non-blocking, but confirm the batching/idempotency before merging.
4. Verify locally: `cd packages/db && bun run db:migrate` against a dev DB.

## Hard rule

Never annotate a destructive statement just to make the lint pass. The annotation is a claim that you verified the old code no longer depends on it. If you can't make that claim truthfully, the change belongs in a later deploy — tell the user to split it.
17 changes: 11 additions & 6 deletions .agents/skills/ship/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: ship
description: Commit, push, and open a PR to staging in one shot
description: Commit, push, and open a PR to staging in one shot — runs the cleanup pass and, when migrations changed, the db-migrate safety review first
---

# Ship Command
Expand All @@ -16,12 +16,17 @@ When the user runs `/ship`:
- Types: `fix`, `feat`, `improvement`, `chore`
- Scope: short identifier (e.g., `undo-redo`, `api`, `ui`)
- Keep it concise
3. **Run pre-ship checks** from the repo root before staging:
3. **Run the cleanup pass** on the current changes before anything else: `/cleanup`
- This runs the six code-quality skills (effects, memo, callbacks, state, React Query, emcn) and applies fixes, so they land in this commit.
4. **Run migration safety** — only if the diff touches `packages/db/migrations/**` or `packages/db/schema.ts`:
- Run `/db-migrate` to review the migration for zero-downtime safety (expand/contract phasing, backward-compatibility with the deployed app version).
- `bun run check:migrations` must pass. Do not silence a flagged statement with a `-- migration-safe:` annotation unless `/db-migrate` confirmed the old code no longer depends on it; otherwise split the destructive change into a later deploy.
5. **Run pre-ship checks** from the repo root before staging:
- `bun run lint` to fix formatting issues
- `bun run check:api-validation:strict` to catch boundary contract failures before CI
4. **Stage and commit** the changes with the generated message
5. **Push to origin** using the current branch name
6. **Create a PR** to staging with a description in the user's voice
6. **Stage and commit** the changes with the generated message
7. **Push to origin** using the current branch name
8. **Create a PR** to staging with a description in the user's voice

## Commit Message Format

Expand Down Expand Up @@ -77,7 +82,7 @@ gh pr create --base staging --title "COMMIT_MESSAGE" --body "PR_BODY"

- Short, direct bullet points
- No unnecessary explanation
- "Tested manually" is acceptable for testing section; include lint and boundary validation results when run
- "Tested manually" is acceptable for testing section; include lint, boundary validation, and (when migrations changed) `check:migrations` results when run
- Checkboxes filled in appropriately
- No screenshots section unless UI changes

10 changes: 10 additions & 0 deletions .github/workflows/test-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,16 @@ jobs:
- name: Verify realtime prune graph
run: bun run check:realtime-prune

- name: Migration safety (zero-downtime) audit
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
BASE_REF="origin/${{ github.base_ref }}"
git fetch --depth=1 origin "${{ github.base_ref }}" 2>/dev/null || true
else
BASE_REF="HEAD~1"
fi
bun run check:migrations "$BASE_REF"

- name: Type-check realtime server
run: bunx turbo run type-check --filter=@sim/realtime

Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
"check:zustand-v5": "bun run scripts/check-zustand-v5-selectors.ts",
"check:react-query": "bun run scripts/check-react-query-patterns.ts --check",
"check:utils": "bun run scripts/check-utils-enforcement.ts",
"check:migrations": "bun run scripts/check-migrations-safety.ts",
"mship-contracts:generate": "bun run scripts/sync-mothership-stream-contract.ts",
"mship-contracts:check": "bun run scripts/sync-mothership-stream-contract.ts --check",
"mship-tools:generate": "bun run scripts/sync-tool-catalog.ts",
Expand Down
154 changes: 154 additions & 0 deletions scripts/check-migrations-safety.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
/**
* Run with: bun test scripts/check-migrations-safety.test.ts
* (Root scripts are bun-native and not part of the turbo/vitest workspaces.)
*/
import { describe, expect, test } from 'bun:test'
import { lintSql } from './check-migrations-safety.ts'

const rules = (sql: string) => lintSql(sql).map((f) => `${f.tier}:${f.rule}`)

describe('additive / safe', () => {
test('nullable add column passes', () => {
expect(lintSql('ALTER TABLE "webhook" ADD COLUMN "provider_config" json;')).toEqual([])
})

test('NOT NULL with DEFAULT passes', () => {
expect(lintSql('ALTER TABLE "user" ADD COLUMN "flag" boolean DEFAULT false NOT NULL;')).toEqual(
[]
)
})

test('CREATE TABLE plus index and FK on that new table passes', () => {
const sql = `CREATE TABLE "kb" ("id" text PRIMARY KEY NOT NULL, "user_id" text NOT NULL);
--> statement-breakpoint
CREATE INDEX "kb_user_id_idx" ON "kb" USING btree ("user_id");
--> statement-breakpoint
ALTER TABLE "kb" ADD CONSTRAINT "kb_user_fk" FOREIGN KEY ("user_id") REFERENCES "user"("id");`
expect(lintSql(sql)).toEqual([])
})

test('CONCURRENTLY index after a COMMIT breakpoint passes', () => {
const sql = `COMMIT;
--> statement-breakpoint
SET lock_timeout = 0;
--> statement-breakpoint
CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_x" ON "embedding" ("kb_id");`
expect(lintSql(sql)).toEqual([])
})
})

describe('hard errors', () => {
test('ADD COLUMN NOT NULL without default', () => {
expect(rules('ALTER TABLE "user" ADD COLUMN "email" text NOT NULL;')).toEqual([
'error:add-not-null-no-default',
])
})

test('RENAME column', () => {
expect(rules('ALTER TABLE "marketplace" RENAME COLUMN "executions" TO "views";')).toEqual([
'error:rename',
])
})

test('CREATE INDEX on existing table without CONCURRENTLY', () => {
expect(rules('CREATE INDEX "idx_y" ON "embedding" ("kb_id");')).toEqual([
'error:index-not-concurrent',
])
})

test('CONCURRENTLY index without IF NOT EXISTS', () => {
const sql = `COMMIT;
--> statement-breakpoint
CREATE INDEX CONCURRENTLY "idx_z" ON "embedding" ("kb_id");`
expect(rules(sql)).toEqual(['error:concurrent-index-not-idempotent'])
})

test('CONCURRENTLY index without a preceding COMMIT', () => {
expect(
rules('CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_z" ON "embedding" ("kb_id");')
).toEqual(['error:concurrent-index-no-commit'])
})

test('ADD FOREIGN KEY on existing table without NOT VALID', () => {
expect(
rules(
'ALTER TABLE "session" ADD CONSTRAINT "s_fk" FOREIGN KEY ("uid") REFERENCES "user"("id");'
)
).toEqual(['error:constraint-not-valid'])
})
})

describe('annotate tier', () => {
const drop = 'ALTER TABLE "webhook" DROP COLUMN "secret";'

test('DROP COLUMN unannotated fails', () => {
expect(rules(drop)).toEqual(['error:drop-column'])
})

test('DROP COLUMN annotated passes', () => {
const sql = `-- migration-safe: secret read removed in v0.6.1 (#1234), shipped two deploys ago\n${drop}`
expect(lintSql(sql)).toEqual([])
})

test('annotation tolerates an intervening statement-breakpoint line', () => {
const sql = `ALTER TABLE "webhook" ADD COLUMN "provider_config" json;
--> statement-breakpoint
-- migration-safe: secret read removed in v0.6.1 (#1234)
${drop}`
expect(lintSql(sql)).toEqual([])
})

test('dangling annotation with empty reason fails', () => {
const sql = `-- migration-safe:\n${drop}`
const found = lintSql(sql)
expect(found).toHaveLength(1)
expect(found[0].tier).toBe('error')
expect(found[0].message).toContain('no reason')
})

test('annotation on the wrong statement does not bleed', () => {
const sql = `-- migration-safe: removing secret
ALTER TABLE "webhook" ADD COLUMN "x" json;
--> statement-breakpoint
${drop}`
expect(rules(sql)).toEqual(['error:drop-column'])
})

test('type change and DROP TABLE are annotate-tier', () => {
expect(
rules(
'ALTER TABLE "user_table_rows" ALTER COLUMN "order_key" SET DATA TYPE text COLLATE "C";'
)
).toEqual(['error:alter-type'])
expect(rules('DROP TABLE "marketplace_execution" CASCADE;')).toEqual(['error:drop-table'])
})
})

describe('warnings (non-blocking)', () => {
test('UPDATE backfill warns but does not error', () => {
const found = lintSql('UPDATE "user_table_definitions" SET "schema" = \'{}\' WHERE id = \'1\';')
expect(found.map((f) => f.tier)).toEqual(['warn'])
})

test('UPDATE without WHERE flags the whole-table note', () => {
const found = lintSql('UPDATE "user" SET "active" = true;')
expect(found[0].tier).toBe('warn')
expect(found[0].message).toContain('no WHERE')
})
})

describe('parser robustness', () => {
test('semicolon inside a string literal does not split', () => {
expect(lintSql(`ALTER TABLE "x" ADD COLUMN "y" text DEFAULT 'a;b' NOT NULL;`)).toEqual([])
})

test('dollar-quoted DO block is one statement; FK on a new table is suppressed', () => {
const sql = `CREATE TABLE "jobs" ("id" text PRIMARY KEY NOT NULL, "wid" text NOT NULL);
--> statement-breakpoint
DO $$ BEGIN
ALTER TABLE "jobs" ADD CONSTRAINT "jobs_fk" FOREIGN KEY ("wid") REFERENCES "workspace"("id");
EXCEPTION WHEN duplicate_object THEN null;
END $$;`
expect(lintSql(sql)).toEqual([])
})
})
Loading
Loading