fix(data): preserve global dipole/polarizability in raw_to_set.sh#5696
Conversation
raw_to_set.sh splits global dipole.raw/polarizability.raw into per-set chunks and the in-set conversion block expects dipole.raw/polarizability.raw inside set.<pi>/, but the move block never moved those chunks in. Datasets with global dipole/polarizability labels silently lost them: the .npy files were never generated and the split chunks were orphaned in the raw dir. Add the missing moves, mirroring the existing split/convert order, and add a test covering the move/convert symmetry for every tensor label the script splits (the script previously had no test at all). Fix deepmodeling#5692
📝 WalkthroughWalkthroughUpdates raw_to_set.sh to move split global dipole.raw and polarizability.raw chunks into per-set directories, fixing a previously omitted move step. Adds a new pytest test module verifying move/convert symmetry for dipole, polarizability, atomic_dipole, and atomic_polarizability labels. ChangesPreserve global dipole/polarizability handling in raw_to_set.sh
Estimated code review effort: 2 (Simple) | ~10 minutes Sequence Diagram(s)sequenceDiagram
participant Test as test_raw_to_set_preserves_tensor_labels
participant TmpDir as tmp_path raw files
participant Script as raw_to_set.sh
participant SetDirs as set.$pi directories
Test->>TmpDir: write box.raw, coord.raw, label.raw
Test->>Script: run raw_to_set.sh with nline_per_set
Script->>TmpDir: split label.raw into label.raw$pi chunks
Script->>SetDirs: move label.raw$pi into set.$pi/label.raw
Script->>SetDirs: convert set.$pi/label.raw to label.npy
Test->>SetDirs: read label.npy per set
Test->>Test: concatenate and compare to original values
Test->>TmpDir: assert no leftover label.raw[0-9]* chunks
Related issues: Suggested labels: bug, test Suggested reviewers: njzjz 🐰 A dipole once wandered, lost in the split, 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5696 +/- ##
==========================================
- Coverage 81.97% 81.83% -0.14%
==========================================
Files 959 959
Lines 105748 105747 -1
Branches 4102 4105 +3
==========================================
- Hits 86684 86541 -143
- Misses 17573 17711 +138
- Partials 1491 1495 +4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Problem
data/raw/raw_to_set.shsplits globaldipole.raw/polarizability.rawinto per-set chunks (dipole.raw000, …) and the in-set conversion block expectsdipole.raw/polarizability.rawto exist insideset.<pi>/. However the per-set move block moved every other split file except the globaldipole.raw$pi/polarizability.raw$pichunks. As a result, datasets carrying global dipole/polarizability labels silently lost them during conversion:dipole.npy/polarizability.npywere never generated, and the split chunks were left orphaned in the raw directory.Fix
Add the two missing
mvlines, placed to mirror the existing split/convert order.Test
The script previously had no test at all, which is why the omission survived. This PR adds
source/tests/common/test_raw_to_set.py, a parametrized test that runs the script and asserts every split tensor label (dipole,polarizability,atomic_dipole,atomic_polarizability) is converted toset.<pi>/<label>.npywith round-tripped contents across multiple sets, and that no split chunks are orphaned in the raw dir. Verified failing-then-passing: on the unmodified script thedipoleandpolarizabilitycases fail (.npynot generated) while the atomic variants pass; after the fix all four pass.Known limitation
The test requires
bash,split, andpythonon PATH (all present in CI).Fix #5692