Skip to content

Perf & mem optimizations#179

Open
OBrezhniev wants to merge 12 commits into
masterfrom
feature/sharedArrayBuffers
Open

Perf & mem optimizations#179
OBrezhniev wants to merge 12 commits into
masterfrom
feature/sharedArrayBuffers

Conversation

@OBrezhniev

Copy link
Copy Markdown
Member

No description provided.

…fers to worker threads instead of arrays (make it compatible with SharedArrayBuffer)
Fix nChunks calculation - drastically improve memory usage.
Increase min chunk size to 1<<15 (32k) - speed improvement on smaller circuits.
Serial chunk processing - better mem usage.
Linter fixes
- remove chunking of chunks (removes unneeded copying of the same data to different worker jobs),
- make nChunks multiple of tm.concurrency for optimal load balancing
- switch back to promises from awaits (allows parallel execution of chunks)
- rollback min chunk size
- transfer buffer ownership to worker threads (removes memory copying for large arrays!!!)
…ination

Replace parallel index arrays (workers[], initialized[], working[], etc.) with
a WorkerSlot class that owns all per-worker state. Message handlers close over
the slot reference, so stale messages from replaced workers are detected by a
simple identity check (pool[i] !== slot) rather than generation counters.

2-phase termination protocol:
- Worker fires want_to_terminate when idle timer expires (200ms, down from 1s)
- Main thread nulls pool[i] immediately, sends TERMINATE ack, calls processWorks
  so a replacement worker can start filling the slot right away
- Worker's subsequent terminated message arrives stale and only removes event
  listeners to break the slot→worker→closure reference cycle for prompt GC
- Stale task results (want_to_terminate race with in-flight dispatch) are still
  resolved correctly so callers never hang

Additional fixes:
- scheduleTermination() moved inside init().then() so the 200ms idle timer
  never fires during async WASM compilation
- removeEventListener called on both stale and non-stale terminated paths so
  WASM memory held by old slots is released immediately, not GC-deferred
- processWorks start-new-workers loop no longer calls startWorker() on slots
  that are already occupied (working or initializing)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant