feat(pt_expt): graph-native dpa1 — .pt2 export + compiled training + C++ inference single & multi-rank (NeighborGraph PR-B) by wanghan-iapcm · Pull Request #5604 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-06-29T14:46:42Z

NeighborGraph PR-B — graph `.pt2` export, compiled training, and C++ inference (single & multi-rank)

This PR spans the full PR-B: B1 (Python: graph .pt2 export + compiled training on the graph lower), B2 (C++ single-rank inference of the graph .pt2, dynamic edge axis), and B3 (C++/LAMMPS multi-rank). Built on the merged PR-A (#5583). Scope: dpa1, attn_layer=0, pt_expt.

B1 — graph `.pt2` export + compiled training (Python)

forward_common_lower_graph_exportable trace target; serialization.py graph export branch (lower_kind="graph", lower_input_kind metadata); _eval_model_graph DeepEval dispatch (parity vs eager dpa1 1e-10 pbc+nopbc).
Compiled training retargeted to the graph lower so eager == compiled (the MUST-FIX) → force_legacy_descriptor deleted. Root cause was a real dpa1 call_graph autograd detach bug (xp.asarray(tebd, device=) drops the tebd-net gradient under torch); fixed.

B2 — C++ graph ingestion (dynamic edge axis, single-rank)

Graph .pt2 uses a dynamic edge axis (Dim("nedge", min=2)) — one artifact evals any system size (proven across 56- and 380-edge systems at 1e-10), no C++ capacity ceiling.
C++ DeepPotPTExpt: lower_input_is_graph_ + run_model_graph (NeighborGraph ABI: atype, n_node, edge_index, edge_vec, edge_mask, …) + buildGraphTensors (mirrors the refactor(pt/dpa4): use edge schema for dpa4 and ignore sel #5562 edge path; node types from atype_ext); remap_graph_outputs_to_dense_keys (single-rank).
gtest: 5 cases × {double,float} = 10/10 (build-nlist parity, dynamic-E 2nd size, ago>0, tiny system, atomic-overload). The review process caught two bugs that would otherwise have shipped: an ago>0 heap-OOB (by inspection) and a public-vs-internal output-key mismatch (at runtime).

B3 — multi-rank C++ / LAMMPS (non-MP)

dpa1 is non-message-passing ⇒ multi-rank needs NO border_op/with-comm artifact (that is a message-passing concern, deferred to PR-G). Multi-rank reuses the same single-rank graph .pt2, fed an extended-region graph (buildGraphTensors(fold_to_local=false), N=nall, ghost node types from atype_ext incl. halo), with owned energy = sum(atom_energy[0:nloc]) and the extended force folded to owners through the existing dense select_map reverse-comm. The fail-fast for graph && multi_rank && has_message_passing is retained.
Validated locally on multi-CPU (no GPU needed for correctness): test_lammps_dpa1_graph_pt2.py — single-rank vs reference, mpirun -n 2 ≡ single-rank (energy + per-atom force + virial, atol 1e-8), plus an empty-subdomain (nloc=0) corner. Single-rank gtests stay 10/10 (multi-rank is purely additive). Multi-rank matched single-rank on the first run.

Tests / known limitations

Per-task + whole-phase reviews all Ready-to-merge.
pt_expt-only; dpa1 (non-MP) only. Follow-ons: PR-C vesin/nv O(N) builders (carry-all builders still use nonzero, eager-only), PR-D attention, PR-E angles, PR-F jax graph force, PR-G dpa2/3 message-passing (forward halo + with-comm). CUDA multi-rank unvalidated locally. Carried code-cleanup follow-ups: a ~60-line DRY duplication in training.py; the multi-rank atomic output branch has no direct gtest (covered indirectly by the mpirun per-atom-virial assertion, since a single-process gtest can't set nprocs>1).

Summary by CodeRabbit

Summary

New Features
- Added support for graph-schema (NeighborGraph) model archives with a selectable lower_kind="graph" export path, including CLI support and new graph-form inference handling.
- Added static edge-capacity support during graph construction.
Bug Fixes
- Improved gradient continuity for type embeddings in graph mode.
- Enhanced trace/export stability by preventing out-of-range graph indices/frame IDs and making scatter/frame sizing more consistent.
Tests
- Added/extended parity, export metadata, training, and LAMMPS single-/multi-rank validation for graph-form .pt2, plus metadata checks for lower_input_kind.

… graph .pt2 export - Add make_model.forward_common_lower_graph_exportable: make_fx trace of forward_common_lower_graph with edge_vec as the autograd leaf; symbolic tracing enabled via two data-dependent shape fixes. - Add EnergyModel.forward_lower_graph_exportable: wraps the above with a second make_fx pass that translates internal keys to public names (atom_energy, energy, force, virial, atom_virial). - Fix edge_transform_output.py: replace int(n_node.sum()) with next(iter(fit_ret.values())).shape[0] (static shape attr, trace-safe); thread node_capacity through edge_energy_deriv -> edge_force_virial so that segment_sum sizes are static under make_fx symbolic mode. - Add edge_energy_deriv node_capacity param (None = eager fallback). - Test: source/tests/pt_expt/model/test_graph_export.py — TDD RED->GREEN, verifies traced module reproduces eager energy_redu (rtol=1e-10).

… do_atomic_virial branches), dedup key-translation, drop redundant detach

…r_graph Adds test_graph_static_capacity.py (4 tests): shape (2,64), real-edge count matches dynamic, real prefix identity + masked tail, overflow raises. No source change needed -- builder.py already threads layout.edge_capacity into pad_and_guard_edges.

…raph .pt2 metadata for the C++ hub (B2)

…pbc, 1e-10) Add a graph-form .pt2 DeepEval dispatch and parity test (NeighborGraph PR-B Phase B1.4). DeepEval: route lower_input_kind=="graph" archives through a new _eval_model_graph that builds a carry-all NeighborGraph (padded to the static edge_capacity baked in metadata), feeds the positional schema (atype, n_node, edge_index, edge_vec, edge_mask, fparam, aparam, charge_spin) to the AOTI forward, and reshapes the LOCAL public outputs by category (no communicate_extended_output, since the graph path is ghost-free). The dense (nlist) path is untouched. Fix dynamic-shape generalization of the graph .pt2 export: under symbolic make_fx/torch.export, int() on a SymInt SPECIALIZES the axis, baking the trace-time sample size (N=14, nf=2) into the autograd-derived force/energy/ virial outputs. Drop int() on node_capacity (edge_force_virial) and on n_node.shape[0] (fit_output_to_model_output_graph), and derive nf-1 in frame_id_from_n_node as a runtime 0-d tensor instead of xp.asarray(shape-1). The exported graph now generalizes across arbitrary N and nf. Test: dpa1(attn_layer=0) energy model exported with lower_kind="graph", evaluated through the pt_expt DeepPot and compared to the eager dense (sel-capped, neighbor_graph_method="legacy") reference at rtol=atol=1e-10 for pbc and nopbc. The fixture is a sparse 8-atom cluster and asserts non-binding sel (max neighbors < 30) so the carry-all/sel-capped neighbor sets coincide (anti-vacuity).

…); drop force_legacy_descriptor Retarget _CompiledModel to compile forward_common_lower_graph for graph-eligible descriptors (dpa1 attn_layer==0), gated by the same mixed_types()+uses_graph_lower() predicate the eager default-flip uses; se_e2_a/dpa2/dpa3 keep compiling the dense forward_lower. _trace_and_compile_graph builds a synthetic NeighborGraph with prime-distinct nf/N/E axes (no make_fx duck-shape merge) and edge_vec as the autograd leaf; _forward_graph builds the carry-all graph eagerly and unravels flat (N,*) node outputs to (nf,nloc,*). cpp.simdlen=0 for the graph compile avoids an inductor CPU scatter-vectorizer crash on the per-frame virial atomic_add. Also fixes an eager autograd bug in dpa1 call_graph: xp.asarray(type_embedding, device=dev) DETACHES under torch, so the type-embedding weights never trained in the graph path (grad None despite a real finite-diff dependency). make_fx traced through it, so compiled != eager and the optimizer diverged after step 0. Use type_embedding directly (mirrors the dense path); the tebd net now trains and eager==compiled to 1e-10 across the varying-natoms trajectory. Drops the force_legacy_descriptor workaround + uses_graph_lower monkeypatch.

… + legacy-gate assumption in the graph compile path

for more information, see https://pre-commit.ci

coderabbitai · 2026-06-29T14:49:11Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a graph-native NeighborGraph lowering path across export, inference, training compilation, C++ runtime, and tests. It also updates dpmodel and neighbor-graph helpers to preserve gradients and symbolic shapes during graph tracing.

Changes

Graph-form lower-forward export and runtime

Layer / File(s)	Summary
Trace-safe graph utilities `deepmd/dpmodel/descriptor/dpa1.py`, `deepmd/dpmodel/utils/neighbor_graph/graph.py`	`type_embedding` is used directly in graph descriptor paths, and `frame_id_from_n_node` uses a runtime clamp value for padding-node frame IDs.
Trace-stable edge transforms `deepmd/pt_expt/model/edge_transform_output.py`, `deepmd/pt_expt/model/make_model.py`, `deepmd/pt_expt/model/ener_model.py`	`edge_energy_deriv` and graph fitting accept `node_capacity`, graph export entry points pass the node-axis capacity through the traced lower path, and graph/dense export key translation is centralized.
Graph export metadata and DeepEval dispatch `deepmd/pt_expt/utils/serialization.py`, `deepmd/pt_expt/infer/deep_eval.py`	`deserialize_to_file` accepts `lower_kind`, graph exports build dynamic graph inputs and record `lower_input_kind`, and DeepEval routes graph archives through `_eval_model_graph` with graph-key output mapping.
GRAPH compiled lower `deepmd/pt_expt/train/training.py`	Graph eligibility detection, graph tracing and compilation helpers, a cached graph forward path, and graph-shaped output unraveling are added to compiled training execution.
C++ graph runtime path `source/api_cc/include/commonPT.h`, `source/api_cc/include/DeepPotPTExpt.h`, `source/api_cc/src/DeepPotPTExpt.cc`	C++ support adds graph tensor packing, graph model execution, graph metadata dispatch, dense-key remapping, and graph-path cache handling.
Graph export and runtime tests `source/tests/common/dpmodel/test_graph_static_capacity.py`, `source/tests/pt_expt/model/test_graph_export.py`, `source/tests/pt_expt/infer/test_graph_deepeval.py`, `source/tests/pt_expt/utils/test_graph_pt2_metadata.py`, `source/tests/pt_expt/test_training.py`, `source/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.cc`, `source/tests/infer/gen_dpa1.py`, `source/lmp/tests/test_lammps_dpa1_graph_pt2.py`, `source/tests/pt_expt/test_dp_freeze.py`	Tests cover static edge capacity, graph export tracing, metadata, DeepEval parity, C++ inference, graph artifact generation, the compiled varying-natoms graph path, LAMMPS graph `.pt2` inference, and freeze-path validation.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

deepmodeling/deepmd-kit#5284: Extends the pt_expt DeepEval inference pipeline that this PR routes through _eval_model_graph for graph-form archives.
deepmodeling/deepmd-kit#5367: Touches the same deepmd/pt_expt/utils/serialization.py export path used here for lower_kind="graph".
deepmodeling/deepmd-kit#5583: Adds the DPA1 graph-native lowering path that this PR builds on in the export and runtime layers.

Suggested labels

Core

Suggested reviewers

OutisLi
iProzd

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 69.70% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: graph-native dpa1 pt2 export, compiled training, and C++ inference across single and multi-rank paths.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

+    DeepPot,
+)
+from deepmd.pt_expt.utils.env import (
+    DEVICE,


coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

deepmd/pt_expt/utils/serialization.py (1)
750-789: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject unknown lower_kind values instead of falling back to nlist.

A typo such as lower_kind="grpah" currently reaches the dense path and records lower_input_kind="nlist", producing the wrong artifact without any signal.
Proposed fix
 def deserialize_to_file(
@@
     """
+    if lower_kind not in {"nlist", "graph"}:
+        raise ValueError(
+            f"Unknown lower_kind {lower_kind!r}; expected 'nlist' or 'graph'."
+        )
     if model_file.endswith(".pt2"):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/utils/serialization.py` around lines 750 - 789, The
deserialize entrypoint in serialization.py currently accepts any lower_kind
value and silently falls back to the nlist path, which can produce the wrong
artifact for typos. Update deserialize_to_file and the lower-path selection
logic in _deserialize_to_file_pt2 / _deserialize_to_file_pte to validate
lower_kind explicitly against the supported values ("nlist" and "graph") and
raise a clear error for anything else instead of defaulting to nlist, so the
recorded lower_input_kind always matches the requested schema.
source/tests/pt_expt/test_training.py (1)
1355-1374: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert that the DPA1 no-attn case really takes the graph lower.

These changes now document graph-lower routing, but the test still only proves eager/compiled parity. If _CompiledModel silently falls back to the dense lower and the sampled systems stay non-binding, this case still passes, so the dispatch change introduced by this PR is untested. Please add an explicit assertion for the DPA1 no-attn branch that the compiled model is graph-eligible / using the graph lower, while keeping the dense expectation for the other descriptors.

Also applies to: 1391-1392, 1466-1479
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt_expt/test_training.py` around lines 1355 - 1374, The current
test only checks eager/compiled parity in _check_varying_natoms, so the DPA1
no-attn routing to the graph lower is not explicitly verified. Add an assertion
in the DPA1 branch that the compiled model is graph-eligible and actually uses
forward_common_lower_graph, while preserving the dense forward_lower expectation
for the non-eligible descriptors. Extend the same explicit dispatch check in the
related test sections that cover the same descriptor routing.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Around line 968-970: The graph-compiled forward path is dropping the
do_atomic_virial request, so atom_virial cannot be returned from compiled
models. Update forward and _forward_graph in training.py to thread
do_atomic_virial through the graph-eligible dispatch, and make
_trace_and_compile_graph cache/trace separately based on that flag so the
compiled lower path matches forward_common_lower_graph behavior.
- Around line 1283-1293: The reshaping logic in the result loop currently relies
on the `N != nframes` heuristic, which fails for `nloc == 1` and leaves
node-level outputs flat. Update the `training.py` postprocessing in the
`result.items()` loop to use explicit key categories from the graph/lower-level
public keys instead of shape inference, and reshape known node-level outputs
like `atom_energy` and `force` to `(nf, 1, *)` even when `N == nframes`.

In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 447-448: The dynamic-shape spec in _build_graph_dynamic_shapes
still only marks the frame axis for aparam, so its atom axis remains fixed to
the sample nloc. Update the aparam entry in the shape mapping to include both
the frame axis and the atom axis as dynamic, matching the intended (nf, nloc,
nda) behavior. Keep the change localized to the graph shape spec alongside the
existing fparam and aparam handling so exported graphs no longer specialize to a
single atom count.

---

Outside diff comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 750-789: The deserialize entrypoint in serialization.py currently
accepts any lower_kind value and silently falls back to the nlist path, which
can produce the wrong artifact for typos. Update deserialize_to_file and the
lower-path selection logic in _deserialize_to_file_pt2 /
_deserialize_to_file_pte to validate lower_kind explicitly against the supported
values ("nlist" and "graph") and raise a clear error for anything else instead
of defaulting to nlist, so the recorded lower_input_kind always matches the
requested schema.

In `@source/tests/pt_expt/test_training.py`:
- Around line 1355-1374: The current test only checks eager/compiled parity in
_check_varying_natoms, so the DPA1 no-attn routing to the graph lower is not
explicitly verified. Add an assertion in the DPA1 branch that the compiled model
is graph-eligible and actually uses forward_common_lower_graph, while preserving
the dense forward_lower expectation for the non-eligible descriptors. Extend the
same explicit dispatch check in the related test sections that cover the same
descriptor routing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 58e2933c-1a05-4d30-bdfa-0cec147b8f6a

📥 Commits

Reviewing files that changed from the base of the PR and between 58bef11 and b046874.

📒 Files selected for processing (14)

deepmd/dpmodel/descriptor/dpa1.py
deepmd/dpmodel/utils/neighbor_graph/derivatives.py
deepmd/dpmodel/utils/neighbor_graph/graph.py
deepmd/pt_expt/infer/deep_eval.py
deepmd/pt_expt/model/edge_transform_output.py
deepmd/pt_expt/model/ener_model.py
deepmd/pt_expt/model/make_model.py
deepmd/pt_expt/train/training.py
deepmd/pt_expt/utils/serialization.py
source/tests/common/dpmodel/test_graph_static_capacity.py
source/tests/pt_expt/infer/test_graph_deepeval.py
source/tests/pt_expt/model/test_graph_export.py
source/tests/pt_expt/test_training.py
source/tests/pt_expt/utils/test_graph_pt2_metadata.py

codecov · 2026-06-29T15:47:54Z

Codecov Report

❌ Patch coverage is 82.12389% with 101 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.85%. Comparing base (5082854) to head (70b02fe).
⚠️ Report is 12 commits behind head on master.

Files with missing lines	Patch %	Lines
...rce/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.cc	74.34%	39 Missing ⚠️
deepmd/pt_expt/train/training.py	80.26%	30 Missing ⚠️
source/api_cc/src/DeepPotPTExpt.cc	65.45%	13 Missing and 6 partials ⚠️
deepmd/pt_expt/utils/serialization.py	90.62%	6 Missing ⚠️
source/api_cc/include/commonPT.h	92.72%	2 Missing and 2 partials ⚠️
deepmd/pt_expt/infer/deep_eval.py	91.17%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5604      +/-   ##
==========================================
- Coverage   82.37%   81.85%   -0.52%     
==========================================
  Files         902      960      +58     
  Lines      101527   106256    +4729     
  Branches     4056     4134      +78     
==========================================
+ Hits        83630    86976    +3346     
- Misses      16432    17787    +1355     
- Partials     1465     1493      +28

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… static edge_capacity The graph-form .pt2 export now marks the edge axis dynamic (Dim("nedge", min=2)) instead of baking a static E_max=ceil(1.25*nloc*nnei) capacity. The AOTI artifact accepts any system size with no capacity ceiling. - _build_graph_dynamic_shapes: edge_index dim1 / edge_vec dim0 / edge_mask dim0 are now dynamic; mirrors the dense Dim("nnei", min=...) precedent. - _trace_and_export graph branch: drop the metadata["edge_capacity"] write; apply _strip_shape_assertions to neutralise the SIGFPE-prone deferred shape guards on the dynamic E axis (same handling the spin dense path uses). - deep_eval._eval_model_graph: build the carry-all graph at its tight edge count (no edge_capacity padding). - test_graph_deepeval: eval TWO different-size systems (8- and 20-atom, 56 vs 380 real edges) through the SAME exported artifact; both match eager dense dpa1 at 1e-10 pbc+nopbc, both non-binding. The 20-atom system (380 > 263) would have overflowed the B1 static artifact -> the RED. - test_graph_pt2_metadata: graph metadata no longer carries edge_capacity.

…e graph (dynamic-E) caller + edge-axis safety

…) generator Section B of gen_dpa1.py produces deeppot_dpa1_graph.pt2 (lower_kind="graph", do_atomic_virial=True) and the accompanying deeppot_dpa1_graph.expected sidecar. Reference values come from an independent nlist .pt2 eval (NOT the graph .pt2) so the C++ gtest (B2.5) validates the graph AOTI path against a known-good reference. Sanity check: graph .pt2 vs nlist ref force diff 1.3e-18 (machine precision; sel=30 >> actual neighbors, so both paths see identical neighbor sets). Forces non-degenerate: max |F| ~5.3e-4 (PBC). Config: type_map=[O,H], sel=30, rcut=6.0, attn_layer=0, neuron=[2,4,8], axis_neuron=4, fitting neuron=[5,5,5], resnet_dt=True, seed=1 — mirrors DPA1_CONFIG in source/tests/pt_expt/utils/test_graph_pt2_metadata.py.

B2.2: read lower_input_kind="graph" -> lower_input_is_graph_. B2.3: run_model_graph with NeighborGraph AOTI input order (atype, n_node, edge_index, edge_vec, edge_mask, [fparam], [aparam], [charge_spin]); no coord / edge_scatter_index. B2.4: GraphTensorPack + buildGraphTensors (delegates to createEdgeTensors+compactEdgeTensors for the rcut filter, dynamic edge count and 2 masked dummy edges; drops edge_index_ext, adds n_node=[nloc], node types from atype_ext[0:nloc]) + compute_inner & standalone dispatch branches. Multi-rank graph fails fast (PR-B3).

… path Local std::vector<int64_t> mapping was declared in compute_inner and populated only inside if(ago==0). The graph branch called buildGraphTensors with this local vector on every step, causing an OOB heap read on ago>0 (mapping.size()==0). Fix: promote mapping to a member mapping_ (parallel to mapping_tensor) so it persists across steps. Edge-path (createEdgeTensors) and dense-path (mapping_tensor) are unaffected in behavior; only the vector source changes from local to member.

…xtraction Add test_deeppot_dpa1_graph_ptexpt.cc — the first runtime exercise of the NeighborGraph .pt2 C++ ingestion (B2.2-4). Four cases x {double,float}: build-nlist parity vs the committed .expected, a second 12-atom system through the same model (dynamic edge axis), the LAMMPS InputNlist+ago=0/ago=1 path (compute_inner + cached mapping_), and a tiny no-edge system (nedge_min=2 guard). The first run surfaced a B2.2-4 bug: the graph forward emits LOCAL flat-N PUBLIC keys (atom_energy/energy/force/virial/atom_virial) but compute()'s output extraction read the dense INTERNAL keys (energy_redu/energy_derv_r/...), so output_map["energy_redu"].view() threw on an undefined tensor. The graph branch had only ever been compiled, never run. Fix: remap_graph_outputs_to_dense_keys() in commonPT.h, called after extract_outputs in both compute overloads (gated on lower_input_is_graph_). Rewrites the public keys into the dense internal-key layout; per-atom force/atom_virial are local (nloc) and zero-padded up to nall so the existing fold-back is a no-op on ghost rows (local rows already carry the folded ghost contributions). Dense/edge paths untouched. gen_dpa1.py now persists the dense nlist-ref .pt2 (deeppot_dpa1_graph_nlist_ref.pt2) as a live graph-vs-dense oracle for the dynamic-edge cases.

…ring (nlist .pt2, 1e-5)

for more information, see https://pre-commit.ci

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

source/api_cc/src/DeepPotPTExpt.cc (1)

508-539: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Require lmp_list.mapping for graph artifacts with ghosts.

The graph branch uses mapping_ in buildGraphTensors to fold ghost neighbors onto local owners, but the current guard only requires an atom map for has_message_passing_ models. A DPA1 graph artifact can therefore run PBC LAMMPS without atom_modify map yes, fall back to identity mapping, and feed invalid/wrong folded graph indices.

Proposed fix

   if (lower_input_is_graph_ && multi_rank) {
     throw deepmd::deepmd_exception(
         "Multi-rank graph (NeighborGraph) .pt2 inference is not yet "
         "supported (PR-B3). Run single-rank, or use a dense/edge .pt2 for "
         "multi-rank LAMMPS.");
   }
+  if (lower_input_is_graph_ && nghost > 0 && !atom_map_present) {
+    throw deepmd::deepmd_exception(
+        "Graph (NeighborGraph) .pt2 inference with ghost atoms requires "
+        "`atom_modify map yes` so ghost neighbors can be folded onto their "
+        "local owners. Re-run with an atom map enabled, or use a dense .pt2.");
+  }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 508 - 539, The
graph-artifact path in DeepPotPTExpt::buildGraphTensors is using mapping_ to
fold ghost neighbors, but the current validation only enforces atom-map presence
for has_message_passing_ models. Add a guard for lower_input_is_graph_ with
ghosts so graph artifacts also require lmp_list.mapping (or atom_modify map yes)
before compute(), and fail fast when it is missing instead of falling back to
identity mapping.

deepmd/pt_expt/utils/serialization.py (1)

793-799: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject unknown lower_kind values before dispatch.

A typo currently falls through to the dense nlist export while recording nlist metadata, producing a valid-looking artifact with the wrong ABI.

Proposed fix

     """
+    if lower_kind not in {"nlist", "graph"}:
+        raise ValueError(
+            f"Unsupported lower_kind={lower_kind!r}; expected 'nlist' or 'graph'."
+        )
     if model_file.endswith(".pt2"):
         _deserialize_to_file_pt2(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/utils/serialization.py` around lines 793 - 799, The dispatch
in serialization.py currently lets unknown lower_kind values fall through to the
dense nlist export path, which can produce a misleading artifact with incorrect
ABI metadata. Add validation before the model_file.endswith(".pt2") branch so
only the expected lower_kind values are accepted, and raise an error for
anything else. Keep the check close to the dispatch logic in the
deserialization/export flow that calls _deserialize_to_file_pt2 and
_deserialize_to_file_pte.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 793-799: The dispatch in serialization.py currently lets unknown
lower_kind values fall through to the dense nlist export path, which can produce
a misleading artifact with incorrect ABI metadata. Add validation before the
model_file.endswith(".pt2") branch so only the expected lower_kind values are
accepted, and raise an error for anything else. Keep the check close to the
dispatch logic in the deserialization/export flow that calls
_deserialize_to_file_pt2 and _deserialize_to_file_pte.

In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 508-539: The graph-artifact path in
DeepPotPTExpt::buildGraphTensors is using mapping_ to fold ghost neighbors, but
the current validation only enforces atom-map presence for has_message_passing_
models. Add a guard for lower_input_is_graph_ with ghosts so graph artifacts
also require lmp_list.mapping (or atom_modify map yes) before compute(), and
fail fast when it is missing instead of falling back to identity mapping.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c4b21156-2b33-4392-b6b0-4819a492ea77

📥 Commits

Reviewing files that changed from the base of the PR and between b046874 and b25fdfc.

📒 Files selected for processing (9)

deepmd/pt_expt/infer/deep_eval.py
deepmd/pt_expt/utils/serialization.py
source/api_cc/include/DeepPotPTExpt.h
source/api_cc/include/commonPT.h
source/api_cc/src/DeepPotPTExpt.cc
source/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.cc
source/tests/infer/gen_dpa1.py
source/tests/pt_expt/infer/test_graph_deepeval.py
source/tests/pt_expt/utils/test_graph_pt2_metadata.py

🚧 Files skipped from review as they are similar to previous changes (2)

deepmd/pt_expt/infer/deep_eval.py
source/tests/pt_expt/infer/test_graph_deepeval.py

…st (B2.5 follow-ups)

…-comm; no with-comm)

…-assert on graph .pt2 export) The graph forward_common_lower_graph .pt2 AOTI export device-asserted on CUDA ('index out of bounds: 0 <= tmp8 < ks0') in the edge_force_virial scatter. A padding/guard edge's node index reaches the index_add address computation BEFORE edge_mask zeroes its (zero) contribution; CPU does not bounds-check the address (so all dev + CI, which are CPU-only, were green), CUDA device-asserts. Clamp src/dst into [0, n_out) and edge_frame into [0, nf) before the scatters -- padding edges carry w_edge==0, so a clamped out-of-range index scatters zero (numerically harmless). Verified on Tesla T4: graph .pt2 export + DeepEval parity (small_8/large_20) + full graph suite + consistency all pass; parity confirms the clamp does not corrupt (N bound was correct, the stray index was genuine padding). Also bind the per-node scatter to the input atype.shape[0] (node_capacity) rather than the re-derived fit_ret.shape[0] -- hardening.

wanghan-iapcm · 2026-06-30T06:03:36Z

Fix: CUDA device-assert in graph `.pt2` export (surfaced by the first GPU CI run)

The "Test Python and C++ on CUDA" job (skipped on earlier runs, first executed on e2e07f160) hit a CUDA-only failure: the graph forward_common_lower_graph .pt2 AOTI export device-asserts (index out of bounds: 0 <= tmp8 < ks0) in the edge_force_virial scatter, which also aborts gen_dpa1.py and cascade-poisons the CUDA context for the rest of pytest source/tests.

Cause: a padding/guard edge's node index reaches the index_add address computation before edge_mask zeroes its (zero-valued) contribution. CPU does not bounds-check the scatter address, so all prior dev + CI (CPU-only) were green; CUDA device-asserts.

Fix (7658091dd): clamp the scatter indices into range in edge_force_virial — src/dst → [0, n_out), edge_frame → [0, nf). Padding edges carry w_edge == 0, so a clamped out-of-range index scatters zero (numerically harmless). Also binds the per-node scatter bound to the input atype.shape[0] (hardening). Plus 942de1f4f: a small edge_energy_deriv arg-order cleanup.

Validated on a Tesla T4 (same GPU class as the CI runner): graph .pt2 export ✅, DeepEval parity small_8 + large_20 ✅ (the correctness arbiter — confirms the clamp does not corrupt: the N bound was correct and the stray index was genuine padding), full graph Python suite + test_dpa1 consistency ✅, C++ TestInferDpa1GraphPtExpt 10/10 ✅, LAMMPS test_lammps_dpa1_graph_pt2 5/5 (single + virial + mpirun -n 2 + multi≡single + empty-subdomain) ✅.

The xp.clip clamp added in the previous commit breaks under torch.export: the bound n_out is a SymInt and array_api_compat's clip reads .shape on it ('SymInt' object has no attribute 'shape'), failing every graph .pt2 export on CI/local (it only passed on the GPU box's newer array_api_compat). Replace with modulo (src % n_out, dst % n_out, edge_frame % nf) -- pure arithmetic, so torch.export-safe, and a no-op on in-range real indices. Also fixes the case a mask-multiply missed: the out-of-range index occurs on an edge_mask==1 edge (ks0=n_out binds to a smaller symbol than the live node count at AOTI runtime), which only a clamp-every-index (clip or modulo), not a mask-zero, neutralizes. Out-of-range edges carry ~zero w_edge, so wrapping them is numerically harmless. Verified on Tesla T4: graph .pt2 export (gen_dpa1) + DeepEval parity (small_8/large_20) + C++ TestInferDpa1GraphPtExpt 10/10 all pass; CPU export suite 12/12 (the SymInt failure the prior commit would have hit in CI).

…#5604 - training.py: add explanatory comments to four pass-only except clauses (CodeQL empty-except); replace the nloc==1-fragile 'N != nframes' shape heuristic in the compiled-graph unravel with an explicit node-level key set so single-atom-per-frame outputs reshape to (nf, 1, *) (CodeRabbit). - serialization.py: mark aparam's atom axis dynamic in the graph export dynamic-shape spec ({0: nframes, 1: nloc}), matching the dense path, so a dim_aparam>0 graph export no longer specializes nloc (CodeRabbit). - test_lammps_dpa1_graph_pt2.py: skip the two reference-comparison tests when the gen_dpa1 .expected fixture is absent (clean skip vs TypeError on None); force 'processors 2 1 1' in the empty-subdomain MPI test so the empty-rank branch is genuinely exercised (CodeRabbit). Skipped as invalid: CodeQL 'import DEVICE' (house pattern, benign in test); CodeRabbit do_atomic_virial-in-compiled-graph (the dense compiled path drops it identically; training never requests atom_virial). Validated: CPU export 8/8 + compiled-varying-natoms 5/5 (incl. dpa1_no_attn).

…port Add a user-facing entry point to the graph C++ inference path. Before this, the graph lower was reachable only via the internal API / gen_dpa1.py test fixture, so a user-frozen dpa1 .pt2 always used the dense (nlist) lower and the tested C++ graph path was unreachable from the CLI. - main.py: add --lower-kind {nlist,graph} to the freeze subparser (default nlist; PyTorch-Exportable backend only, same convention as --head/--node-names). - entrypoints/main.py: thread lower_kind into freeze() -> deserialize_to_file. Fail fast (ValueError) when 'graph' is requested for a non-graph-eligible model (reuses _model_uses_graph_lower; currently dpa1 attn_layer==0 only). Enable do_atomic_virial for the graph form -- near-free there (one extra scatter off the shared single backward). - test_dp_freeze.py: graph-eligibility rejection (se_e2_a) + a public-CLI graph freeze of dpa1(attn_layer=0) asserting metadata lower_input_kind=graph. Opt-in by design; auto-selecting graph for eligible models (mirroring training) is deferred until the graph path covers attention/angles/MP. Both tests pass.

wanghan-iapcm · 2026-06-30T07:30:37Z

Pushed: CUDA-fix correction + AI-review responses + `freeze --lower-kind graph`

0f4373172 — corrects the prior clamp. The xp.clip I added in 7658091dd broke under torch.export (its SymInt bound trips array_api_compat's clip → 'SymInt' object has no attribute 'shape'); it only passed on the GPU box's newer array_api_compat and would have failed CI's graph-export tests. Replaced with modulo (% n_out/% nf) — pure arithmetic, export-safe, no-op on in-range indices. GPU-revalidated (gen + DeepEval parity small_8/large_20 + C++ TestInferDpa1GraphPtExpt 10/10) and CPU export suite 12/12.

afda4c7b3 — AI review. Addressed: CodeQL empty-except ×4 (comments); CodeRabbit nloc==1 unravel (explicit node-level key set), aparam atom-axis dynamic in graph export (matches dense), LAMMPS test robustness (skip-on-missing-fixture + forced processors 2 1 1 for the empty-subdomain branch). Declined (with reasons): CodeQL import DEVICE (house pattern, benign in test); CodeRabbit do_atomic_virial in compiled-graph (the dense compiled path drops it identically; training never requests atom_virial).

0ea2c3435 — dp --pt_expt freeze --lower-kind {nlist,graph}. Closes the gap that the graph C++ path was tested only via the gen_dpa1.py fixture and unreachable from the CLI. Default nlist; graph fails fast for non-eligible models and enables the near-free per-atom virial. New tests: eligibility rejection + a public-CLI dpa1(attn_layer=0) graph freeze asserting metadata.lower_input_kind=='graph'. Opt-in by design; auto-select (mirroring training) deferred until graph covers attention/angles/MP.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/pt_expt/entrypoints/main.py`:
- Around line 734-741: The output suffix handling in main() still forces
unsuffixed exports to .pte even when freeze() is running with lower_kind set to
graph. Update the logic around the FLAGS.output normalization so it chooses the
default suffix based on the effective lower_kind (graph should default to .pt2,
otherwise .pte), while still preserving explicit user-provided .pte/.pt2
outputs. Use the freeze() call and the lower_kind lookup as the key symbols to
locate the fix.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5a56748f-b853-426d-96fb-a1bb13a56f0f

📥 Commits

Reviewing files that changed from the base of the PR and between 7658091 and 0ea2c34.

📒 Files selected for processing (7)

deepmd/dpmodel/utils/neighbor_graph/derivatives.py
deepmd/main.py
deepmd/pt_expt/entrypoints/main.py
deepmd/pt_expt/train/training.py
deepmd/pt_expt/utils/serialization.py
source/lmp/tests/test_lammps_dpa1_graph_pt2.py
source/tests/pt_expt/test_dp_freeze.py

🚧 Files skipped from review as they are similar to previous changes (3)

deepmd/dpmodel/utils/neighbor_graph/derivatives.py
source/lmp/tests/test_lammps_dpa1_graph_pt2.py
deepmd/pt_expt/utils/serialization.py

Wire the dpa1 graph .pt2 (attn_layer=0, lower_input_kind=graph) into the parametrized VariantDeepPotTest as case dpa1_graph_ptexpt (Backend::PTExpt, ref deeppot_dpa1_graph.expected, pbc/nopbc, tol 1e-10/1e-4). Flags mirror dpa1_pytorch_pt2 exactly; all 19 enabled subtests pass on remote GPU (FiniteDifference, LmpNlist*, CutoffTwice, TypeSel, NoPbc*). The dedicated test_deeppot_dpa1_graph_ptexpt.cc is retained for graph-unique coverage.

… modulo-clamp invariant test - entrypoints/main.py: 'dp --pt_expt freeze --lower-kind graph' now defaults a suffix-less output to .pt2 (the AOTI archive the C++ graph path consumes) instead of .pte; explicit .pte/.pt2 suffixes are preserved for both kinds. New test (freeze mocked) covers all four suffix/kind combinations. - test_edge_force_virial.py: add an invariant test that the in-bounds index clamp (% n_out) NEVER alters a real (edge_mask==True) edge -- a boundary real edge on node n_out-1 scatters unwrapped, and a masked guard edge with deliberately out-of-range indices contributes nothing; the result equals the real-edges-only reference. Directly answers iProzd's ask to prove the modulo cannot silently remap real edges (only zero-weight guard edges). Both new tests pass.

for more information, see https://pre-commit.ci

…t fix + root cause Reframe the comment on the % n_out index guard: it is the PERMANENT fix, not an interim workaround. Records the GPU-confirmed root cause -- the dynamic-edge graph torch.export path traces the node count as several equal-but-distinct symbols tied only by aten._assert_scalar(Eq(...)) nodes, which _strip_shape_assertions neutralises wholesale (needed for export to trace), dropping those equalities so inductor can no longer prove idx < n_out and emits a device_assert. The upstream alternative (making the shared, spin-critical _strip_shape_assertions selective) is deliberately not taken -- it risks the torch.export bugs that helper bypasses and the spin .pt2 path. Comment-only.

OutisLi

One performance issue on the C++ graph LAMMPS path (per-step host-side topology rebuild). Details inline.

OutisLi

One maintainability issue: the graph trace/export sample builders are duplicated across training.py and serialization.py. Please consolidate. Details inline.

…eview) Merge the two near-identical synthetic-graph builders _make_graph_sample_inputs (serialization.py) and _make_graph_trace_inputs (training.py) into one build_synthetic_graph_inputs in serialization.py, parameterized by dtype (export=float64 ABI, training=GLOBAL_PT precision) and device (export passes cpu explicitly instead of mutating env.DEVICE) plus the want_fparam/aparam/charge_spin gating. Also factor the duplicated prime-collision _forbidden scan into _forbidden_dims_from_model, shared by the dense and graph _trace_and_compile. Removes the desync risk between the training and export graph traces flagged in review.

…nk (OutisLi review) Comment 1 (perf): the graph LAMMPS path called buildGraphTensors every timestep, whose createEdgeTensors stage is an O(E) host loop + H2D copy that rebuilds the edge topology from scratch even when LAMMPS has not refreshed the neighbor list. Mirror the edge path: at ago==0 cache the skin topology via createEdgeTensors(with_geometry=false, fold_to_local=!multi_rank) into edge_index_tensor / edge_index_ext_tensor; each step run only the on-device compactEdgeTensors (geometry recompute + rcut filter) and assemble the cheap n_node / node-atype tensors. Topology now rebuilds only on a neighbor-list rebuild, consistent with the edge path. Comment 2 (empty-rank nit): a truly-empty rank (nall_real == 0, no local atoms AND no ghosts) would feed N == 0 into the graph, and edge_force_virial's edge_index % node_capacity would SIGFPE (div-by-zero). Early-return zero outputs for that rank. The tested nloc==0 empty-subdomain case has nall_real>0 (ghosts within rcut) and still runs normally. Validated: 29/33 graph gtests (4 pre-existing NoPBC skips) + 5/5 test_lammps_dpa1_graph_pt2 (single-rank multi-step ref, multi-rank, mpirun -n 2 == single-rank, empty-subdomain).

for more information, see https://pre-commit.ci

iProzd

LGTM now.

Han Wang added 8 commits June 29, 2026 18:42

fix(pt_expt): B1.1 review — test forward_lower_graph_exportable (both…

ee8db1b

… do_atomic_virial branches), dedup key-translation, drop redundant detach

feat(pt_expt): graph .pt2 export branch + lower_input_kind metadata

1051a0d

fix(pt_expt): B1.3 review — persist static edge_capacity (E_max) in g…

148fa0e

…raph .pt2 metadata for the C++ hub (B2)

docs(pt_expt): B1 final-review minors — document nloc==1 unravel-skip…

47fb700

… + legacy-gate assumption in the graph compile path

dosubot Bot added bug enhancement new feature labels Jun 29, 2026

github-actions Bot added the Python label Jun 29, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

b046874

for more information, see https://pre-commit.ci

github-advanced-security AI found potential problems Jun 29, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread deepmd/pt_expt/train/training.py

Comment thread deepmd/pt_expt/train/training.py

Comment thread deepmd/pt_expt/utils/serialization.py Outdated

Han Wang added 7 commits June 30, 2026 01:21

docs(pt_expt): B2.0 review — _strip_shape_assertions now documents th…

40487c4

…e graph (dynamic-E) caller + edge-axis safety

docs(infer): B2 final-review — correct gen_dpa1 graph-reference docst…

af92be1

…ring (nlist .pt2, 1e-5)

github-actions Bot added the C++ label Jun 29, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

b25fdfc

for more information, see https://pre-commit.ci

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

wanghan-iapcm changed the title ~~feat(pt_expt): export the graph lower to .pt2 + compiled training on the graph lower (NeighborGraph PR-B1)~~ feat(pt_expt): graph-native dpa1 — .pt2 export + compiled training + C++ inference (NeighborGraph PR-B1+B2) Jun 29, 2026

Han Wang added 2 commits June 30, 2026 08:36

fix(api_cc): guard graph remap single-rank-only + atomic-overload gte…

10d82a6

…st (B2.5 follow-ups)

feat(api_cc): non-MP multi-rank graph path (extended region + reverse…

7d37319

…-comm; no with-comm)

wanghan-iapcm requested review from OutisLi and iProzd June 30, 2026 06:13

Han Wang added 3 commits June 30, 2026 15:24

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread deepmd/pt_expt/entrypoints/main.py

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Jun 30, 2026

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jun 30, 2026

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Jun 30, 2026

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jun 30, 2026

iProzd requested changes Jun 30, 2026

View reviewed changes

Comment thread deepmd/pt_expt/entrypoints/main.py

iProzd requested changes Jun 30, 2026

View reviewed changes

Comment thread deepmd/dpmodel/utils/neighbor_graph/derivatives.py Outdated

Han Wang and others added 2 commits July 1, 2026 08:43

[pre-commit.ci] auto fixes from pre-commit.com hooks

4a76f6d

for more information, see https://pre-commit.ci

wanghan-iapcm requested a review from iProzd July 1, 2026 03:41

OutisLi requested changes Jul 1, 2026

View reviewed changes

Comment thread source/api_cc/src/DeepPotPTExpt.cc Outdated

OutisLi reviewed Jul 1, 2026

View reviewed changes

Comment thread source/api_cc/src/DeepPotPTExpt.cc Outdated

OutisLi requested changes Jul 1, 2026

View reviewed changes

Comment thread deepmd/pt_expt/utils/serialization.py Outdated

Han Wang and others added 3 commits July 1, 2026 14:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

70b02fe

for more information, see https://pre-commit.ci

wanghan-iapcm requested a review from OutisLi July 1, 2026 06:56

OutisLi approved these changes Jul 1, 2026

View reviewed changes

iProzd approved these changes Jul 1, 2026

View reviewed changes

wanghan-iapcm enabled auto-merge July 1, 2026 07:09

Uh oh!

Conversation

wanghan-iapcm commented Jun 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NeighborGraph PR-B — graph .pt2 export, compiled training, and C++ inference (single & multi-rank)

B1 — graph .pt2 export + compiled training (Python)

B2 — C++ graph ingestion (dynamic edge axis, single-rank)

B3 — multi-rank C++ / LAMMPS (non-MP)

Tests / known limitations

Summary by CodeRabbit

Summary

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

wanghan-iapcm commented Jun 30, 2026

Fix: CUDA device-assert in graph .pt2 export (surfaced by the first GPU CI run)

Uh oh!

wanghan-iapcm commented Jun 30, 2026

Pushed: CUDA-fix correction + AI-review responses + freeze --lower-kind graph

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OutisLi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

OutisLi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

iProzd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wanghan-iapcm commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

NeighborGraph PR-B — graph `.pt2` export, compiled training, and C++ inference (single & multi-rank)

B1 — graph `.pt2` export + compiled training (Python)

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading

codecov Bot commented Jun 29, 2026 •

edited

Loading

Fix: CUDA device-assert in graph `.pt2` export (surfaced by the first GPU CI run)

Pushed: CUDA-fix correction + AI-review responses + `freeze --lower-kind graph`