Skip to content

rbindlist: unprotect longestLevels inside the loop#7794

Open
aitap wants to merge 6 commits into
masterfrom
fix7793
Open

rbindlist: unprotect longestLevels inside the loop#7794
aitap wants to merge 6 commits into
masterfrom
fix7793

Conversation

@aitap

@aitap aitap commented Jun 18, 2026

Copy link
Copy Markdown
Member

Similar to how other protected objects are managed inside the per-column loop (cn, getAttrib(..., R_ClassSymbol)), marks, thisCol), keep longestLevels at a known position in the protection stack and unprotect at the end of the loop iteration when it reaches the top of the stack. This requires moving coercedForFactor deeper in the protection stack, because it needs to remain protected between the loop iterations. As a result, nprotect remains independent from the input size.

Fixes: #7793

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.85%. Comparing base (0b57d38) to head (89b3b5d).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7794   +/-   ##
=======================================
  Coverage   98.85%   98.85%           
=======================================
  Files          87       87           
  Lines       17137    17140    +3     
=======================================
+ Hits        16941    16944    +3     
  Misses        196      196           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
  • HEAD=fix7793 slower P<0.001 for memrecycle regression fixed in #5463
    Comparison Plot

Generated via commit 89b3b5d

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 41 seconds
Installing different package versions 24 seconds
Running and plotting the test cases 5 minutes and 25 seconds

Comment thread inst/tests/tests.Rraw
test(2375.6, print(data.table(x=rep("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 1e6)), topn=1), output="1000000: ABCDEFGHIJKLM...", options=list(width=25, datatable.prettyprint.char=NULL))

# rbindlist() used to put O(ncol(...)) > R_PPStackSize = 50000 items on the protect stack, #7793
x = as.data.table(as.list(1:50001))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per ?Memory, max-ppsize can be set up to 500,000, and I'm not aware of a way to poll it (at a glance R_PPStackSize appears to be non-API)

Probably best to just point to ?Memory and mention here "this test assumes the default value of 50,000", which has been the default for 15+ years, but if it does change in the future, it would be good to know the reason 50,001 was chosen.

@MichaelChirico

Copy link
Copy Markdown
Member

Does atime find a performance / memory improvement from the change? Does bench::mark() find less gc() thrashing from a loop like for (j in 10000:20000) rbind(<j-col DT>, <j-col DT>)?

@MichaelChirico MichaelChirico left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change LGTM

@aitap

aitap commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

Unless the columns are ordered factors with progressively longer levels attributes, the protection stack is filled with R_NilValues. The effect from having to mark and sweep ~50000 additional R_NilValues can be detected at the margins of the time distributions:

parse('.ci/atime/tests.R') |> _[[6]][[3]][[3]] |> eval() -> pkg.edit.fun
atime_versions(
 '.', seq(5000, 49900, 500),
 { gc(full=TRUE); x <- list(as.list(1:N)) }, # need a gc() between runs
 data.table::rbindlist(list(x)),
 sha.vec=c(fix7793='fix7793', master='master'),
 pkg.edit.fun = pkg.edit.fun, seconds.limit=10, times = 100
) -> res

Co-Authored-By: Michael Chirico <chiricom@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rbindlist throws errors when combining tables with many columns

2 participants