Add --chunk-concurrent-size for parallel row-copy#1688
Conversation
18f91e8 to
de32943
Compare
|
There's a problem with concurrent chunk copying: during the chunk operation, executing SELECT ... INSERT will hold the auto-increment lock on the target table, which puts a ceiling on the concurrency gains you can achieve. |
|
Thanks @shaohk — you're right to flag this, and since most tables do have an The AUTO-INC ceiling depends on
So I'd frame it as: the hard serialization you're describing applies under mode 0/1; under mode 2 it becomes soft contention. That said — even under mode 2, parallel copy doesn't scale linearly, and the dominant cap is actually in gh-ost's own loop rather than MySQL. The next-chunk boundary calculation ( I think the right things to do here are (1) document that Does that match what you were seeing? |
|
MySQL8 INSERTs scale much better than 20-30% when parallel workers count increases from 1 to 4-8. |
Port of PR github#1398 by @shaohk: allows multiple row-copy chunks to execute in parallel within each iteration using errgroup. Key changes: - Add IterationRangeValues struct for thread-safe range passing - Serialize range calculation with CalculateNextIterationRangeEndValuesLock - Rewrite iterateChunks to spawn N goroutines per queue item via errgroup - Return SQL warnings from ApplyIterationInsertQuery (eliminates race on shared MigrationLastInsertSQLWarnings field) - Increase DB connection pool when concurrency > default pool size - Add --chunk-concurrent-size CLI flag (default 1, no behavior change) Co-authored-by: shaohk <shaohk@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r, cut per-chunk round-trips The --chunk-concurrent-size parallel row-copy only ran the INSERTs in parallel; the boundary calculation and the per-chunk transaction overhead serialized work and capped the achievable speedup well below the hardware's parallel-insert ceiling. This addresses three of those caps. Prefetch range producer (overlap serialized boundary calc with INSERTs): - A single dedicated producer goroutine is the sole caller of CalculateNextIterationRangeEndValues and streams pre-computed ranges into a buffered channel, so boundary scans now overlap the parallel INSERTs of earlier work instead of stalling between batches. - Split iterateChunks into iterateChunksSingle (unchanged single-threaded semantics) and iterateChunksConcurrent. - Size the applier pool for concurrentSize + producer + headroom. #1 Per-chunk round-trips (applier.go): - ApplyIterationInsertQuery sent BEGIN / SET SESSION / INSERT / COMMIT as four round-trips per chunk. It now sends "SET SESSION ...; INSERT ..." as a single autocommit, multi-statement round-trip on one pinned connection. The applier pool already enables multiStatements + interpolateParams + autocommit; RowsAffected() reports the INSERT (last statement), and the optional SHOW WARNINGS runs on the same pinned connection. 4 round-trips -> 1. #2 Persistent worker pool (migrator.go): - Replace the per-batch errgroup+g.Wait barrier (which stalled N workers on the slowest chunk every N chunks) with continuous dispatch to an errgroup bounded by SetLimit(concurrentSize) for a 200ms time quantum. Workers stay saturated; the only barrier is at the quantum boundary. The time bound keeps executeWriteFuncs returning to apply binlog events and re-check throttling, preserving row-copy/event mutual exclusion. Checkpoints record the last contiguous completed range (not the producer's prefetched cursor), so resume restarts from fully-copied data. Benchmarked on MySQL 8.0.46 (innodb_autoinc_lock_mode=2), 2.1M rows: copy time vs the prior parallel impl improved up to 32% (chunk=200, conc=4: 22s->15s; chunk=1000, conc=8: 8s->6s). Data integrity verified by row count + checksum. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ecaeb56 to
e989096
Compare
Summary
Port of #1398 by @shaohk to current master, with correctness improvements.
Adds
--chunk-concurrent-sizeflag that allows multiple row-copy chunks to execute in parallel within each iteration usingerrgroup. Default is 1 (no behavior change).Motivation
On large tables with fast storage (NVMe/SSD), the single-threaded row-copy loop can become a bottleneck. This flag enables parallel chunk copying to improve migration throughput.
Performance Results
1M rows,
ADD COLUMN extra_col INT DEFAULT 0, Docker MySQL 8.0, chunk-size=1000:Benefits scale with table size and storage throughput.
Key Design Decisions
concurrency=1matches master's retry semantics exactly (range calc inside retry loop for hook-based chunk size reduction);concurrency>1pre-calculates ranges under mutex for safe parallel executionCalculateNextIterationRangeEndValues(advanceCursor bool)protected by mutex, returns*IterationRangeValuesstruct with isolated Min/Max per goroutineMigrationLastInsertSQLWarningsfield)--chunk-concurrent-sizeexceeds default pool sizeChanges from original #1398
ApplyIterationInsertQueryreturns SQL warnings instead of writing to shared fieldIncludeMinValueshandling for first iterationcontext.Background())Testing
TestRetryBatchCopyWithHookspasses (hook-based chunk size reduction works correctly)Checklist
doc/command-line-flags.md)Based on work by @shaohk in #1398.