Skip to content

fix: npm ingest packages deadlock#4234

Open
epipav wants to merge 7 commits into
mainfrom
fix/npm-ingest-packages-deadlock
Open

fix: npm ingest packages deadlock#4234
epipav wants to merge 7 commits into
mainfrom
fix/npm-ingest-packages-deadlock

Conversation

@epipav

@epipav epipav commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Note

Medium Risk
Changes concurrent DB upsert patterns and ingest error handling across live npm workers; low blast radius but touches shared repos/maintainers writes under parallel lanes.

Overview
Deadlock / concurrency: getOrCreateRepoByUrl no longer uses a single CTE upsert; it selects first, then inserts with ON CONFLICT DO NOTHING, and re-reads if another lane wins the race—so parallel ingest lanes sharing the same repo URL avoid unique-violation/deadlock patterns. Maintainer upserts now sort by username before insert/delete so lock order is consistent across lanes.

Ingest reliability: MALFORMED packuments (bad JSON/shape) follow the 4xx fast-retry-then-skip path instead of throwing into Temporal’s slow retry loop, which was stalling lanes. Unpublished npm stubs (HTTP 200 with time.unpublished only) are normalized to an empty packument so packages get unpublished status instead of shape errors.

Data quality: Version license fields are collapsed to a text string via versionLicense before persistence (objects/arrays no longer hit the DB).

Throughput: INGEST_ROUNDS_PER_RUN drops 25 → 5 per workflow continuation (smaller batches per continueAsNew).

Reviewed by Cursor Bugbot for commit ecbc91c. Bugbot is set up for automated code reviews on this repo. Configure here.

epipav added 5 commits June 9, 2026 18:25
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Copilot AI review requested due to automatic review settings June 18, 2026 13:29
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@epipav epipav changed the title Fix/npm ingest packages deadlock fix: npm ingest packages deadlock Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventional Commits FTW!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets reliability issues in the npm packages ingestion pipeline by reducing lock contention (deadlocks), improving handling of edge-case registry responses, and normalizing license data before persistence.

Changes:

  • Reworked repo “get-or-create” to avoid always attempting an insert (reduces lock contention in the common “already exists” case).
  • Enforced deterministic maintainer processing order to avoid cyclic row-lock acquisition across concurrent ingest lanes.
  • Improved npm packument handling for fully-unpublished packages and made “MALFORMED” responses non-poisoning for Temporal lanes; added per-version license normalization.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/packages/repos.ts Avoids insert-first pattern for shared repos; handles race by re-reading after conflict.
services/libs/data-access-layer/src/packages/maintainers.ts Sorts maintainer upserts to prevent deadlocks when multiple packages share maintainers.
services/apps/packages_worker/src/npm/workflows.ts Reduces ingest rounds per workflow run (more frequent continue-as-new).
services/apps/packages_worker/src/npm/upsertPackage.ts Uses new versionLicense() normalization for per-version license persistence.
services/apps/packages_worker/src/npm/normalize.ts Adds versionLicense() and helper for normalizing version license shapes.
services/apps/packages_worker/src/npm/fetchPackument.ts Detects “unpublished stub” responses and converts them into an empty packument.
services/apps/packages_worker/src/npm/activities.ts Treats MALFORMED packuments as permanent (fast-retry then skip) instead of throwing and poisoning a lane.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/libs/data-access-layer/src/packages/maintainers.ts Outdated
Comment on lines +59 to +61
// A version's license is sometimes an object ({ type, url })
// or the legacy array form ([{ type, file }, ...]). Passing those raw
export function versionLicense(raw: unknown): string | null {
Comment on lines +74 to +76
function isLicenseObject(v: unknown): v is { type?: string } {
return typeof v === 'object' && v !== null
}
Signed-off-by: anilb <epipav@gmail.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ecbc91c. Configure here.

for (const [key, value] of Object.entries(t)) {
if (typeof value === 'string') time[key] = value
}
return { name: o.name, 'dist-tags': {}, versions: {}, time, unpublished: t.unpublished }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loose unpublished stub detection

Medium Severity

asUnpublishedStub treats any time object that merely contains an unpublished key as a registry unpublish stub, without checking that the value matches npm’s unpublish metadata (an object with a time field). Values such as null (seen on some third-party registries) or a string timestamp still produce an empty packument, and ingest can mark the package active instead of unpublished or skipping malformed data.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ecbc91c. Configure here.

isLatest: number === latestVersion,
isPrerelease: isPrerelease(number),
license: v.license ?? licenses[0] ?? null,
license: versionLicense(v.license) ?? licenses[0] ?? null,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unpublished ingest orphans version rows

Medium Severity

When ingest succeeds with an unpublished stub (versions empty, status unpublished), upsertNpmVersions is called with no entries and returns immediately, so existing versions rows for that package are never removed while versions_count on the package row is updated to zero.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ecbc91c. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants