buffer: more copy optimizations by ronag · Pull Request #63892 · nodejs/node

ronag · 2026-06-13T17:02:39Z

config	baseline	view-level API	median Δ
partial, 8 B	42.0 M/s	62.8 M/s	+49%
full, 8 B	42.2 M/s	62.4 M/s	+48%
partial, 128 B	42.0 M/s	63.1 M/s	+50%
full, 128 B	42.2 M/s	61.7 M/s	+46%
partial, 1024 B	37.8 M/s	55.1 M/s	+46%
full, 1024 B	35.1 M/s	47.3 M/s	+35%

Add v8::ArrayBufferView::CopyArrayBufferViewBytes, which copies a byte
range from one ArrayBufferView to another. Unlike the existing
v8::ArrayBuffer::CopyArrayBufferBytes, it operates on the views: it
resolves each view's data pointer directly (JSTypedArray::DataPtr /
DataView::data_pointer) and reads the backing buffer's shared/immutable/
detached flags as plain field loads, without ever materializing or
fetching the views' ArrayBuffers.

This exists because materializing the ArrayBuffer is expensive. Profiling
Buffer.prototype.copy (perf, AMD EPYC 9135, x86-64) showed that for small
copies the dominant native cost is ArrayBufferView::Buffer() ->
JSTypedArray::GetBuffer() -- ~25% of total runtime, paid on every call
(not just first materialization) and incurred twice (source and target).
Add ByteOffset() (~7%) and IsSharedArrayBuffer() (~6%) and roughly 38% of
a small copy is spent turning two views into ArrayBuffers and querying
them piecemeal, while the actual memmove is ~4%. Routing the node binding
through CopyArrayBufferBytes forces all of that onto the embedder side; a
view-level entry point folds it into a single call of cheap field reads.

Byte ranges are clamped to both views' byte lengths. Nothing is copied
when the source is detached/out-of-bounds or the target is detached/
out-of-bounds or backed by an immutable ArrayBuffer; the number of bytes
actually copied is returned. When both views are backed by a
SharedArrayBuffer a relaxed-atomic memmove is used, honoring the
SharedArrayBuffer memory model; any other combination performs a plain
memmove on the backing store (matching CopyArrayBufferBytes).

Carried as a floating patch; v8_embedder_string is bumped to -node.22
accordingly. It is the natural sibling of the CopyArrayBufferBytes API
added in the preceding floating patch and touches nothing but the public
ArrayBuffer/ArrayBufferView API.

Add v8::ArrayBufferView::CopyArrayBufferViewBytes, which copies a byte range from one ArrayBufferView to another. Unlike the existing v8::ArrayBuffer::CopyArrayBufferBytes, it operates on the *views*: it resolves each view's data pointer directly (JSTypedArray::DataPtr / DataView::data_pointer) and reads the backing buffer's shared/immutable/ detached flags as plain field loads, without ever materializing or fetching the views' ArrayBuffers. This exists because materializing the ArrayBuffer is expensive. Profiling Buffer.prototype.copy (perf, AMD EPYC 9135, x86-64) showed that for small copies the dominant native cost is ArrayBufferView::Buffer() -> JSTypedArray::GetBuffer() -- ~25% of total runtime, paid on every call (not just first materialization) and incurred twice (source and target). Add ByteOffset() (~7%) and IsSharedArrayBuffer() (~6%) and roughly 38% of a small copy is spent turning two views into ArrayBuffers and querying them piecemeal, while the actual memmove is ~4%. Routing the node binding through CopyArrayBufferBytes forces all of that onto the embedder side; a view-level entry point folds it into a single call of cheap field reads. Byte ranges are clamped to both views' byte lengths. Nothing is copied when the source is detached/out-of-bounds or the target is detached/ out-of-bounds or backed by an immutable ArrayBuffer; the number of bytes actually copied is returned. When both views are backed by a SharedArrayBuffer a relaxed-atomic memmove is used, honoring the SharedArrayBuffer memory model; any other combination performs a plain memmove on the backing store (matching CopyArrayBufferBytes). Carried as a floating patch; v8_embedder_string is bumped to -node.22 accordingly. It is the natural sibling of the CopyArrayBufferBytes API added in the preceding floating patch and touches nothing but the public ArrayBuffer/ArrayBufferView API. Refs: nodejs#55422 Signed-off-by: Robert Nagy <ronagy@icloud.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

nodejs-github-bot · 2026-06-13T17:02:44Z

Review requested:

@nodejs/gyp
@nodejs/security-wg
@nodejs/v8-update

ronag · 2026-06-13T17:18:58Z

@targos @addaleax @joyeecheung This is a relatively trivial v8 patch. Would you be comfortable with floating this? I've already spent money on funding work to upstream CopyArrayBufferBytes to V8 and not sure I can spend much more.

ronag · 2026-06-13T18:28:07Z

I've tried the CopyArrayBufferBytes implementation with LTO enabled and the compiler is not able to inline things properly.

Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy` binding) through the new v8::ArrayBufferView::CopyArrayBufferViewBytes API instead of v8::ArrayBuffer::CopyArrayBufferBytes. The previous binding had to convert both views to ArrayBuffers (ArrayBufferView::Buffer()), read their byte offsets (ByteOffset()) and test shared-ness (IsSharedArrayBuffer()) before the copy -- around half a dozen separate V8 API calls per copy. The view-level API does all of that internally from the views' own fields in a single call, so the binding now just forwards the two views, the view-relative offsets and the length. Profiling on AMD EPYC 9135 (x86-64) attributed the small-copy cost almost entirely to that view->buffer conversion: ArrayBufferView::Buffer() / JSTypedArray::GetBuffer() alone was ~25% of runtime, paid every call and twice per copy. Resolving the buffer in JS instead (passing source.buffer/target.buffer to the binding) was measured and is worse: the typed-array `.buffer` getter is not JIT-inlined and dispatches through the CEntry trampoline to a C++ builtin, costing ~36%. The view-level copy keeps all existing semantics: byte-range clamping, no-op (0 bytes) on a detached or immutable target, relaxed-atomic memmove when both sides are SharedArrayBuffer-backed, plain memmove otherwise. The JS-side view clamping in copyImpl is retained: V8 clamps to the underlying backing store, which for pooled Buffers is the whole shared pool rather than the individual view. buffer-copy.js, median of 30 interleaved runs, AMD EPYC 9135 x86-64 (all changes p < 0.001, Welch t = 19-50): partial=false bytes=8: 42.2 -> 62.4 Mops/s (+48%) partial=true bytes=8: 42.0 -> 62.8 Mops/s (+49%) partial=false bytes=128: 42.2 -> 61.7 Mops/s (+46%) partial=true bytes=128: 42.0 -> 63.1 Mops/s (+50%) partial=false bytes=1024: 35.1 -> 47.3 Mops/s (+35%) partial=true bytes=1024: 37.8 -> 55.1 Mops/s (+46%) The gain is largest for small/medium copies, where per-call overhead dominates, and tapers for 1024-byte copies as the memmove itself grows. Also inlines the former _copyActual helper (only caller was copyImpl) into copyImpl, folding a redundant target.byteLength read. Refs: nodejs#55422 Signed-off-by: Robert Nagy <ronagy@icloud.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ronag · 2026-06-13T20:12:34Z

So this is basically what we tried to upstream to V8 but was rejected and turned into the slower CopyArrayBufferBytes.

joyeecheung · 2026-06-13T20:30:19Z

Can you explain why the upstream rejected this patch and why the reason wouldn't apply here?

ronag · 2026-06-13T20:57:10Z

https://chromium-review.googlesource.com/c/v8/v8/+/7735151

They preferred an implementation that matched the spec.

nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jun 13, 2026

ronag requested a review from bnoordhuis June 13, 2026 17:03

ronag force-pushed the buffer-copy-viewbytes branch from f511275 to afe87da Compare June 13, 2026 20:09

ronag marked this pull request as ready for review June 13, 2026 20:09

ronag requested review from joyeecheung, mcollina and targos June 13, 2026 20:11

ronag added the buffer Issues and PRs related to the buffer subsystem. label Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

buffer: more copy optimizations#63892

buffer: more copy optimizations#63892
ronag wants to merge 2 commits into
nodejs:mainfrom
ronag:buffer-copy-viewbytes

ronag commented Jun 13, 2026 •

edited

Loading

Uh oh!

nodejs-github-bot commented Jun 13, 2026

Uh oh!

ronag commented Jun 13, 2026 •

edited

Loading

Uh oh!

ronag commented Jun 13, 2026

Uh oh!

ronag commented Jun 13, 2026

Uh oh!

joyeecheung commented Jun 13, 2026

Uh oh!

ronag commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ronag commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nodejs-github-bot commented Jun 13, 2026

Uh oh!

ronag commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ronag commented Jun 13, 2026

Uh oh!

ronag commented Jun 13, 2026

Uh oh!

joyeecheung commented Jun 13, 2026

Uh oh!

ronag commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ronag commented Jun 13, 2026 •

edited

Loading

ronag commented Jun 13, 2026 •

edited

Loading