fix: npm ingest packages deadlock#4234
Conversation
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
|
|
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull request overview
This PR targets reliability issues in the npm packages ingestion pipeline by reducing lock contention (deadlocks), improving handling of edge-case registry responses, and normalizing license data before persistence.
Changes:
- Reworked repo “get-or-create” to avoid always attempting an insert (reduces lock contention in the common “already exists” case).
- Enforced deterministic maintainer processing order to avoid cyclic row-lock acquisition across concurrent ingest lanes.
- Improved npm packument handling for fully-unpublished packages and made “MALFORMED” responses non-poisoning for Temporal lanes; added per-version license normalization.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/data-access-layer/src/packages/repos.ts | Avoids insert-first pattern for shared repos; handles race by re-reading after conflict. |
| services/libs/data-access-layer/src/packages/maintainers.ts | Sorts maintainer upserts to prevent deadlocks when multiple packages share maintainers. |
| services/apps/packages_worker/src/npm/workflows.ts | Reduces ingest rounds per workflow run (more frequent continue-as-new). |
| services/apps/packages_worker/src/npm/upsertPackage.ts | Uses new versionLicense() normalization for per-version license persistence. |
| services/apps/packages_worker/src/npm/normalize.ts | Adds versionLicense() and helper for normalizing version license shapes. |
| services/apps/packages_worker/src/npm/fetchPackument.ts | Detects “unpublished stub” responses and converts them into an empty packument. |
| services/apps/packages_worker/src/npm/activities.ts | Treats MALFORMED packuments as permanent (fast-retry then skip) instead of throwing and poisoning a lane. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // A version's license is sometimes an object ({ type, url }) | ||
| // or the legacy array form ([{ type, file }, ...]). Passing those raw | ||
| export function versionLicense(raw: unknown): string | null { |
| function isLicenseObject(v: unknown): v is { type?: string } { | ||
| return typeof v === 'object' && v !== null | ||
| } |
Signed-off-by: anilb <epipav@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ecbc91c. Configure here.
| for (const [key, value] of Object.entries(t)) { | ||
| if (typeof value === 'string') time[key] = value | ||
| } | ||
| return { name: o.name, 'dist-tags': {}, versions: {}, time, unpublished: t.unpublished } |
There was a problem hiding this comment.
Loose unpublished stub detection
Medium Severity
asUnpublishedStub treats any time object that merely contains an unpublished key as a registry unpublish stub, without checking that the value matches npm’s unpublish metadata (an object with a time field). Values such as null (seen on some third-party registries) or a string timestamp still produce an empty packument, and ingest can mark the package active instead of unpublished or skipping malformed data.
Reviewed by Cursor Bugbot for commit ecbc91c. Configure here.
| isLatest: number === latestVersion, | ||
| isPrerelease: isPrerelease(number), | ||
| license: v.license ?? licenses[0] ?? null, | ||
| license: versionLicense(v.license) ?? licenses[0] ?? null, |
There was a problem hiding this comment.
Unpublished ingest orphans version rows
Medium Severity
When ingest succeeds with an unpublished stub (versions empty, status unpublished), upsertNpmVersions is called with no entries and returns immediately, so existing versions rows for that package are never removed while versions_count on the package row is updated to zero.
Reviewed by Cursor Bugbot for commit ecbc91c. Configure here.


Note
Medium Risk
Changes concurrent DB upsert patterns and ingest error handling across live npm workers; low blast radius but touches shared
repos/maintainerswrites under parallel lanes.Overview
Deadlock / concurrency:
getOrCreateRepoByUrlno longer uses a single CTE upsert; it selects first, then inserts withON CONFLICT DO NOTHING, and re-reads if another lane wins the race—so parallel ingest lanes sharing the same repo URL avoid unique-violation/deadlock patterns. Maintainer upserts now sort by username before insert/delete so lock order is consistent across lanes.Ingest reliability:
MALFORMEDpackuments (bad JSON/shape) follow the 4xx fast-retry-then-skip path instead of throwing into Temporal’s slow retry loop, which was stalling lanes. Unpublished npm stubs (HTTP 200 withtime.unpublishedonly) are normalized to an empty packument so packages getunpublishedstatus instead of shape errors.Data quality: Version
licensefields are collapsed to a text string viaversionLicensebefore persistence (objects/arrays no longer hit the DB).Throughput:
INGEST_ROUNDS_PER_RUNdrops 25 → 5 per workflow continuation (smaller batches percontinueAsNew).Reviewed by Cursor Bugbot for commit ecbc91c. Bugbot is set up for automated code reviews on this repo. Configure here.