feat(db): zero-downtime migration safety lint + db-migrate skill#5041
feat(db): zero-downtime migration safety lint + db-migrate skill#5041TheodoreSpeaks wants to merge 3 commits into
Conversation
Add scripts/check-migrations-safety.ts (check:migrations), a CI gate that classifies statements in newly-added migrations into hard errors (rewrite), annotate-to-acknowledge contract ops (`-- migration-safe: <reason>`), and backfill warnings. Wire it into test-build.yml. Add the /db-migrate skill as the judgment half (expand/contract phasing, app-code cross-ref, annotation authoring). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR SummaryLow Risk Overview
Unit tests cover the SQL classifier and annotation parsing ( Reviewed by Cursor Bugbot for commit 0b278d3. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.
|
|
||
| if (files.length === 0) { | ||
| console.log('✓ No new migrations to check.') | ||
| process.exit(0) |
There was a problem hiding this comment.
Git diff failure skips audit
High Severity
When git diff fails, changedMigrationFiles returns an empty list while git rev-parse HEAD can still succeed, so resolveFiles treats the run as “no new migrations” and exits 0. A broken or missing BASE_REF (e.g. shallow fetch) can bypass the migration safety gate in CI.
Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.
| let warnings = 0 | ||
| for (const rel of files) { | ||
| const content = await readFile(path.join(ROOT, rel), 'utf8') | ||
| const findings = lintSql(content) |
There was a problem hiding this comment.
Modified migrations lint entire file
Medium Severity
The gate selects migration files with --diff-filter=AM, but lintSql scans the full SQL file. Any edit to a legacy migration re-applies rules to pre-existing unsafe statements (e.g. unannotated RENAME/DROP), contradicting “existing corpus grandfathered” unless the whole history is annotated.
Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.
| let errors = 0 | ||
| let warnings = 0 | ||
| for (const rel of files) { | ||
| const content = await readFile(path.join(ROOT, rel), 'utf8') |
There was a problem hiding this comment.
Deleted migration path crashes lint
Medium Severity
changedMigrationFiles adds paths from git status --porcelain without checking deletion status, then main always readFiles them. A locally deleted migration under packages/db/migrations/ can throw and abort the check instead of skipping or failing clearly.
Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.
Greptile SummaryThis PR adds a zero-downtime migration safety linter (
Confidence Score: 3/5The linter has two rule gaps in the classify engine that would affect real migrations: DROP INDEX without CONCURRENTLY can be annotated past the gate despite taking the same table-locking ACCESS EXCLUSIVE lock as CREATE INDEX, and RENAME CONSTRAINT triggers the hard rename rule even though it is a metadata-only change that does not break running app code. The DROP INDEX locking gap means the tool could provide false confidence — a developer annotates a non-concurrent DROP INDEX, the lint passes, and the migration locks the table during deployment, which is the exact scenario the tool exists to prevent. The RENAME CONSTRAINT false positive would permanently block a safe migration with no annotation escape hatch. Both issues are in the core classify engine, the most load-bearing part of the PR. scripts/check-migrations-safety.ts — the classify function's DROP INDEX branch and the RENAME regex need attention before this gates production migrations. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[PR opens / push event] --> B{Event type?}
B -- pull_request --> C[git fetch depth=1 origin/base_ref]
B -- push --> D[BASE_REF = HEAD~1]
C --> E[bun run check:migrations BASE_REF]
D --> E
E --> F[changedMigrationFiles: git diff --diff-filter=AM vs merge-base]
F --> G{Any new .sql files?}
G -- No --> H[✓ No new migrations - exit 0]
G -- Yes --> I[lintSql each file]
I --> J[parseStatements: splits on ; respecting strings, block comments, dollar-quoting]
J --> K[Build createdTables set from CREATE TABLE stmts]
K --> L[For each statement: classify sql, createdTables, sawCommit]
L --> M{Match kind?}
M -- error --> N[Push finding tier=error]
M -- warn --> O[Push finding tier=warn]
M -- annotate --> P{readAnnotation: check up to 3 preceding non-empty lines}
P -- annotation found with reason --> Q[Skip / pass]
P -- annotation found no reason --> R[Push error: no reason]
P -- no annotation --> S[Push error: ANNOTATE_GUIDANCE]
L --> T{COMMIT seen?}
T -- yes --> U[sawCommit = true for next stmts]
N & O & R & S --> V{errors > 0?}
V -- Yes --> W[exit 1 - CI fails]
V -- No --> X[✓ Migrations backward-compatible / warnings only - exit 0]
Reviews (1): Last reviewed commit: "feat(skills): run cleanup and db-migrate..." | Re-trigger Greptile |
| if (/\bRENAME\b/i.test(s)) { | ||
| matches.push({ | ||
| kind: 'error', | ||
| rule: 'rename', | ||
| message: | ||
| 'RENAME breaks old code reading the old name. Add the new column/table, dual-write in code, then drop the old one in a later deploy.', | ||
| }) | ||
| } |
There was a problem hiding this comment.
RENAME CONSTRAINT triggers the hard rename error
/\bRENAME\b/i matches ALTER TABLE "x" RENAME CONSTRAINT "old_fk" TO "new_fk". Renaming a constraint is a metadata-only operation — app code references columns and rows, not constraint names — so old code keeps running fine. Drizzle does generate RENAME CONSTRAINT when you change a constraint name in the schema, and because this is a hard error (cannot be annotated away), the CI would permanently block a safe migration with no path to unblock it.
| if (/^DROP INDEX\b/i.test(s)) { | ||
| matches.push({ kind: 'annotate', rule: 'drop-index', message: 'DROP INDEX' }) | ||
| } |
There was a problem hiding this comment.
DROP INDEX without CONCURRENTLY passes with just an annotation
CREATE INDEX without CONCURRENTLY on an existing table is a hard error because it takes an ACCESS EXCLUSIVE lock. DROP INDEX acquires the same lock, but the linter only requires an annotation for it — meaning a developer who annotates DROP INDEX "idx"; will lock their table during migration, which is exactly the downtime this linter is meant to prevent. The check should require CONCURRENTLY for DROP INDEX on existing tables (same logic as CREATE INDEX).
| if (/^DROP INDEX\b/i.test(s)) { | |
| matches.push({ kind: 'annotate', rule: 'drop-index', message: 'DROP INDEX' }) | |
| } | |
| if (/^DROP INDEX\b/i.test(s)) { | |
| if (!/\bCONCURRENTLY\b/i.test(s)) { | |
| matches.push({ | |
| kind: 'error', | |
| rule: 'drop-index-not-concurrent', | |
| message: | |
| 'DROP INDEX on an existing table write-locks it for the whole build. Use DROP INDEX CONCURRENTLY after a COMMIT; breakpoint (see packages/db/scripts/migrate.ts).', | |
| }) | |
| } else if (!sawCommit) { | |
| matches.push({ | |
| kind: 'error', | |
| rule: 'concurrent-drop-index-no-commit', | |
| message: | |
| 'DROP INDEX CONCURRENTLY cannot run inside the migration transaction. Precede it with a COMMIT; breakpoint.', | |
| }) | |
| } else { | |
| matches.push({ kind: 'annotate', rule: 'drop-index', message: 'DROP INDEX' }) | |
| } | |
| } |
| if (/\bSET NOT NULL\b/i.test(s)) { | ||
| matches.push({ kind: 'annotate', rule: 'set-not-null', message: 'SET NOT NULL' }) | ||
| } | ||
| if (/\b(?:SET DATA TYPE|ALTER COLUMN [^;]*\bTYPE)\b/i.test(s)) { |
There was a problem hiding this comment.
ALTER COLUMN [^;]*\bTYPE can false-positive on string default values
The regex ALTER COLUMN [^;]*\bTYPE is matched against s, which retains string literal contents (e.g., SET DEFAULT 'type:foo'). Since 'type' is preceded by ', a non-word character, \b fires before type, and the pattern matches — incorrectly flagging a simple default-change as an alter-type. This is an annotate-tier false positive, but it forces developers to add a misleading migration-safe annotation.
| if (/\b(?:SET DATA TYPE|ALTER COLUMN [^;]*\bTYPE)\b/i.test(s)) { | |
| if (/\bSET DATA TYPE\b/i.test(s) || /\bALTER COLUMN\s+"?\w+"?\s+TYPE\b/i.test(s)) { |
| if (!dir) throw new Error('--dir requires a path') | ||
| return (await listSqlFiles(path.resolve(dir))).map((f) => path.relative(ROOT, f)) | ||
| } | ||
| const baseRef = argv.find((a) => !a.startsWith('--')) ?? 'origin/main' |
There was a problem hiding this comment.
Default
baseRef is origin/main, but the project branches from staging
When developers run bun run check:migrations locally without arguments, the script compares against origin/main. Because PRs target staging (not main), staging may have migration files that aren't in main yet, causing the default diff to flag pre-existing migrations as "new" and surfacing spurious errors. Using origin/staging as the default aligns with the actual base branch developers work against.
| const baseRef = argv.find((a) => !a.startsWith('--')) ?? 'origin/main' | |
| const baseRef = argv.find((a) => !a.startsWith('--')) ?? 'origin/staging' |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


Summary
check:migrationslint (scripts/check-migrations-safety.ts) — a CI gate that scans newly-added migrations for backward-incompatible changes that break the previously-deployed app during the blue/green window. Tiers: hard errors (rewrite), annotate-to-acknowledge contract ops (-- migration-safe: <reason>), and backfill warnings. Scoped to new migrations via git-diff; existing corpus grandfathered./db-migrateskill — the judgment half: expand/contract phasing, app-code cross-reference, and authoring the annotation the lint requires.check:migrationsinto test-build.yml, and make/shiprun/cleanup+ the migration-safety review.Type of Change
Testing
bun test scripts/check-migrations-safety.test.ts)DROP COLUMN workspace_idthat caused the SlackTrigger query failure; a context-free agent following the skill independently reached NOT SAFE + the correct expand/contract splitbun run check:api-validation:strictpassed; biome cleanChecklist