Skip to content

feat(db): zero-downtime migration safety lint + db-migrate skill#5041

Open
TheodoreSpeaks wants to merge 3 commits into
stagingfrom
feat/db-migrate-skill
Open

feat(db): zero-downtime migration safety lint + db-migrate skill#5041
TheodoreSpeaks wants to merge 3 commits into
stagingfrom
feat/db-migrate-skill

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Add check:migrations lint (scripts/check-migrations-safety.ts) — a CI gate that scans newly-added migrations for backward-incompatible changes that break the previously-deployed app during the blue/green window. Tiers: hard errors (rewrite), annotate-to-acknowledge contract ops (-- migration-safe: <reason>), and backfill warnings. Scoped to new migrations via git-diff; existing corpus grandfathered.
  • Add /db-migrate skill — the judgment half: expand/contract phasing, app-code cross-reference, and authoring the annotation the lint requires.
  • Wire check:migrations into test-build.yml, and make /ship run /cleanup + the migration-safety review.

Type of Change

  • New feature (tooling / CI)

Testing

  • 20/20 unit tests (bun test scripts/check-migrations-safety.test.ts)
  • Validated against the 0236 org-scope migration (PR improvement(permissions): permission groups scoped to organization level #5035): lint flags 15 blocking issues incl. the DROP COLUMN workspace_id that caused the SlackTrigger query failure; a context-free agent following the skill independently reached NOT SAFE + the correct expand/contract split
  • bun run check:api-validation:strict passed; biome clean

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

TheodoreSpeaks and others added 3 commits June 13, 2026 18:56
Add scripts/check-migrations-safety.ts (check:migrations), a CI gate that
classifies statements in newly-added migrations into hard errors (rewrite),
annotate-to-acknowledge contract ops (`-- migration-safe: <reason>`), and
backfill warnings. Wire it into test-build.yml. Add the /db-migrate skill as
the judgment half (expand/contract phasing, app-code cross-ref, annotation
authoring).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 14, 2026 2:12am

Request Review

@cursor

cursor Bot commented Jun 14, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Tooling and CI only—no runtime schema or application behavior changes in this PR.

Overview
Adds a zero-downtime migration guard for the blue/green deploy window: new migrations must stay backward-compatible with the app version still serving traffic after the schema runs.

check:migrations (scripts/check-migrations-safety.ts) lints new migration SQL (git-diff vs base; existing migrations grandfathered). It enforces hard rules (e.g. NOT NULL without default, renames, non-concurrent indexes on live tables, FK/CHECK without NOT VALID), requires -- migration-safe: <reason> for contract-phase drops/type changes, and warns on backfills. Wired into test-build.yml and package.json.

/db-migrate skill documents expand/contract phasing and when annotations are honest. /ship now runs /cleanup first and, when packages/db/migrations or schema.ts change, runs the db-migrate review and requires check:migrations to pass.

Unit tests cover the SQL classifier and annotation parsing (scripts/check-migrations-safety.test.ts).

Reviewed by Cursor Bugbot for commit 0b278d3. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.


if (files.length === 0) {
console.log('✓ No new migrations to check.')
process.exit(0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git diff failure skips audit

High Severity

When git diff fails, changedMigrationFiles returns an empty list while git rev-parse HEAD can still succeed, so resolveFiles treats the run as “no new migrations” and exits 0. A broken or missing BASE_REF (e.g. shallow fetch) can bypass the migration safety gate in CI.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.

let warnings = 0
for (const rel of files) {
const content = await readFile(path.join(ROOT, rel), 'utf8')
const findings = lintSql(content)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified migrations lint entire file

Medium Severity

The gate selects migration files with --diff-filter=AM, but lintSql scans the full SQL file. Any edit to a legacy migration re-applies rules to pre-existing unsafe statements (e.g. unannotated RENAME/DROP), contradicting “existing corpus grandfathered” unless the whole history is annotated.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.

let errors = 0
let warnings = 0
for (const rel of files) {
const content = await readFile(path.join(ROOT, rel), 'utf8')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted migration path crashes lint

Medium Severity

changedMigrationFiles adds paths from git status --porcelain without checking deletion status, then main always readFiles them. A locally deleted migration under packages/db/migrations/ can throw and abort the check instead of skipping or failing clearly.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0b278d3. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a zero-downtime migration safety linter (scripts/check-migrations-safety.ts) that classifies SQL statements in new Drizzle migration files as hard errors, annotate-tier ops, or backfill warnings — wired into CI via test-build.yml — plus a /db-migrate agent skill and an updated /ship skill that invokes it.

  • Linter core (check-migrations-safety.ts): custom SQL statement parser (handles strings, block comments, dollar-quoting), classify engine covering CREATE/DROP INDEX, ADD COLUMN, RENAME, ADD CONSTRAINT, DROP TABLE/COLUMN/DEFAULT, SET NOT NULL, ALTER TYPE, and UPDATE/DELETE backfills. Annotation mechanism allows contract-phase ops to be acknowledged with -- migration-safe: <reason>.
  • CI gate (.github/workflows/test-build.yml): new step computes the correct base ref for PR vs. push events and fails the build on blocking findings, with a safe fallback when git fetch fails.
  • Skills (.agents/skills/): new db-migrate skill documents expand/contract phasing and the annotation workflow; ship skill updated to run /cleanup and migration safety review before committing.

Confidence Score: 3/5

The linter has two rule gaps in the classify engine that would affect real migrations: DROP INDEX without CONCURRENTLY can be annotated past the gate despite taking the same table-locking ACCESS EXCLUSIVE lock as CREATE INDEX, and RENAME CONSTRAINT triggers the hard rename rule even though it is a metadata-only change that does not break running app code.

The DROP INDEX locking gap means the tool could provide false confidence — a developer annotates a non-concurrent DROP INDEX, the lint passes, and the migration locks the table during deployment, which is the exact scenario the tool exists to prevent. The RENAME CONSTRAINT false positive would permanently block a safe migration with no annotation escape hatch. Both issues are in the core classify engine, the most load-bearing part of the PR.

scripts/check-migrations-safety.ts — the classify function's DROP INDEX branch and the RENAME regex need attention before this gates production migrations.

Important Files Changed

Filename Overview
scripts/check-migrations-safety.ts Core lint engine: well-structured with good coverage of the main cases, but has three logic gaps — RENAME CONSTRAINT false-positive blocks safe migrations, DROP INDEX locking symmetry with CREATE INDEX is broken, and the alter-type regex can match string literal contents.
scripts/check-migrations-safety.test.ts Good test coverage for core rules and parser edge cases; missing tests for DROP INDEX CONCURRENTLY enforcement and RENAME CONSTRAINT false-positive.
.github/workflows/test-build.yml CI integration is correct: PR path uses the proper base ref, non-PR path falls back to HEAD~1, and fetch failure is non-fatal with
.agents/skills/db-migrate/SKILL.md New agent skill with clear expand/contract guidance, per-operation playbook, and annotation workflow; no issues found.
.agents/skills/ship/SKILL.md Updated to add cleanup pass and migration safety review steps before shipping; step numbering is consistent and instructions are clear.
package.json Adds check:migrations script entry, no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[PR opens / push event] --> B{Event type?}
    B -- pull_request --> C[git fetch depth=1 origin/base_ref]
    B -- push --> D[BASE_REF = HEAD~1]
    C --> E[bun run check:migrations BASE_REF]
    D --> E
    E --> F[changedMigrationFiles: git diff --diff-filter=AM vs merge-base]
    F --> G{Any new .sql files?}
    G -- No --> H[✓ No new migrations - exit 0]
    G -- Yes --> I[lintSql each file]
    I --> J[parseStatements: splits on ; respecting strings, block comments, dollar-quoting]
    J --> K[Build createdTables set from CREATE TABLE stmts]
    K --> L[For each statement: classify sql, createdTables, sawCommit]
    L --> M{Match kind?}
    M -- error --> N[Push finding tier=error]
    M -- warn --> O[Push finding tier=warn]
    M -- annotate --> P{readAnnotation: check up to 3 preceding non-empty lines}
    P -- annotation found with reason --> Q[Skip / pass]
    P -- annotation found no reason --> R[Push error: no reason]
    P -- no annotation --> S[Push error: ANNOTATE_GUIDANCE]
    L --> T{COMMIT seen?}
    T -- yes --> U[sawCommit = true for next stmts]
    N & O & R & S --> V{errors > 0?}
    V -- Yes --> W[exit 1 - CI fails]
    V -- No --> X[✓ Migrations backward-compatible / warnings only - exit 0]
Loading

Reviews (1): Last reviewed commit: "feat(skills): run cleanup and db-migrate..." | Re-trigger Greptile

Comment on lines +246 to +253
if (/\bRENAME\b/i.test(s)) {
matches.push({
kind: 'error',
rule: 'rename',
message:
'RENAME breaks old code reading the old name. Add the new column/table, dual-write in code, then drop the old one in a later deploy.',
})
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 RENAME CONSTRAINT triggers the hard rename error

/\bRENAME\b/i matches ALTER TABLE "x" RENAME CONSTRAINT "old_fk" TO "new_fk". Renaming a constraint is a metadata-only operation — app code references columns and rows, not constraint names — so old code keeps running fine. Drizzle does generate RENAME CONSTRAINT when you change a constraint name in the schema, and because this is a hard error (cannot be annotated away), the CI would permanently block a safe migration with no path to unblock it.

Comment on lines +289 to +291
if (/^DROP INDEX\b/i.test(s)) {
matches.push({ kind: 'annotate', rule: 'drop-index', message: 'DROP INDEX' })
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 DROP INDEX without CONCURRENTLY passes with just an annotation

CREATE INDEX without CONCURRENTLY on an existing table is a hard error because it takes an ACCESS EXCLUSIVE lock. DROP INDEX acquires the same lock, but the linter only requires an annotation for it — meaning a developer who annotates DROP INDEX "idx"; will lock their table during migration, which is exactly the downtime this linter is meant to prevent. The check should require CONCURRENTLY for DROP INDEX on existing tables (same logic as CREATE INDEX).

Suggested change
if (/^DROP INDEX\b/i.test(s)) {
matches.push({ kind: 'annotate', rule: 'drop-index', message: 'DROP INDEX' })
}
if (/^DROP INDEX\b/i.test(s)) {
if (!/\bCONCURRENTLY\b/i.test(s)) {
matches.push({
kind: 'error',
rule: 'drop-index-not-concurrent',
message:
'DROP INDEX on an existing table write-locks it for the whole build. Use DROP INDEX CONCURRENTLY after a COMMIT; breakpoint (see packages/db/scripts/migrate.ts).',
})
} else if (!sawCommit) {
matches.push({
kind: 'error',
rule: 'concurrent-drop-index-no-commit',
message:
'DROP INDEX CONCURRENTLY cannot run inside the migration transaction. Precede it with a COMMIT; breakpoint.',
})
} else {
matches.push({ kind: 'annotate', rule: 'drop-index', message: 'DROP INDEX' })
}
}

if (/\bSET NOT NULL\b/i.test(s)) {
matches.push({ kind: 'annotate', rule: 'set-not-null', message: 'SET NOT NULL' })
}
if (/\b(?:SET DATA TYPE|ALTER COLUMN [^;]*\bTYPE)\b/i.test(s)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ALTER COLUMN [^;]*\bTYPE can false-positive on string default values

The regex ALTER COLUMN [^;]*\bTYPE is matched against s, which retains string literal contents (e.g., SET DEFAULT 'type:foo'). Since 'type' is preceded by ', a non-word character, \b fires before type, and the pattern matches — incorrectly flagging a simple default-change as an alter-type. This is an annotate-tier false positive, but it forces developers to add a misleading migration-safe annotation.

Suggested change
if (/\b(?:SET DATA TYPE|ALTER COLUMN [^;]*\bTYPE)\b/i.test(s)) {
if (/\bSET DATA TYPE\b/i.test(s) || /\bALTER COLUMN\s+"?\w+"?\s+TYPE\b/i.test(s)) {

if (!dir) throw new Error('--dir requires a path')
return (await listSqlFiles(path.resolve(dir))).map((f) => path.relative(ROOT, f))
}
const baseRef = argv.find((a) => !a.startsWith('--')) ?? 'origin/main'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Default baseRef is origin/main, but the project branches from staging

When developers run bun run check:migrations locally without arguments, the script compares against origin/main. Because PRs target staging (not main), staging may have migration files that aren't in main yet, causing the default diff to flag pre-existing migrations as "new" and surfacing spurious errors. Using origin/staging as the default aligns with the actual base branch developers work against.

Suggested change
const baseRef = argv.find((a) => !a.startsWith('--')) ?? 'origin/main'
const baseRef = argv.find((a) => !a.startsWith('--')) ?? 'origin/staging'

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant