PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549
Draft
bewithgaurav wants to merge 9 commits into
Draft
PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549bewithgaurav wants to merge 9 commits into
bewithgaurav wants to merge 9 commits into
Conversation
Move parameter type detection from Python into C++ using raw CPython type checks (PyLong_CheckExact, PyFloat_CheckExact, etc.). Merge the DetectParamTypes → BindParameters → SQLExecute pipeline into a single DDBCSQLExecuteFast call so ParamInfo never crosses the pybind11 boundary. - DetectParamTypes: handles int (range-detected), float, bool, str (unicode + geometry sniffing), bytes, datetime/date/time, Decimal (MONEY range + generic numeric), UUID, None, with fallback to string - SQLExecuteFast_wrap: single pipeline with GIL release, always uses SQLPrepare for parameterized queries - cursor.py: fast path routing when no setinputsizes overrides present; old DDBCSQLExecute path preserved for setinputsizes callers - Named constants: MAX_INLINE_CHAR, MAX_INLINE_BINARY, MAX_NUMERIC_PRECISION, MONEY/SMALLMONEY ranges, PARAM_C_TYPE_TEXT platform macro
- Add complete DAE (Data-At-Execution) loop to SQLExecuteFast_wrap: SQL_NEED_DATA → SQLParamData/SQLPutData for large str/bytes/binary, matching the existing SQLExecute_wrap logic exactly - Fix DAE type assignment: non-unicode DAE strings use SQL_C_CHAR (not PARAM_C_TYPE_TEXT which maps to SQL_C_WCHAR on macOS/Linux) - Fix MONEY range lower bound: use MONEY_MIN not SMALLMONEY_MIN so negative decimals in MONEY range bind as VARCHAR (matches Python path) - Raise TypeError for unknown param types instead of silent str conversion - Add SQLFreeStmt(SQL_RESET_PARAMS) to unbind after execute
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/pybind/ddbc_bindings.cppLines 581-593 581 info.paramSQLType = SQL_BIGINT;
582 info.paramCType = SQL_C_SBIGINT;
583 info.columnSize = 19;
584 }
! 585 } else {
! 586 PyErr_Clear();
! 587 info.paramSQLType = SQL_BIGINT;
! 588 info.paramCType = SQL_C_SBIGINT;
! 589 info.columnSize = 19;
590 }
591 info.decimalDigits = 0;
592 continue;
593 }Lines 2359-2374 2359
2360 // ---------------------------------------------------------------------------
2361 // SQLExecuteFast — single C++ pipeline: DetectParamTypes → BindParameters → SQLExecute
2362 // No ParamInfo objects cross the pybind11 boundary.
! 2363 //
! 2364 // Always uses SQLPrepare (not ExecDirect) because parameterized queries
2365 // benefit from prepared plan reuse, and the fast path is only invoked
2366 // when parameters are present. The use_prepare flag from the caller is
! 2367 // acknowledged but overridden — this is a perf-only code path.
! 2368 // ---------------------------------------------------------------------------
! 2369 SQLRETURN SQLExecuteFast_wrap(const SqlHandlePtr statementHandle,
! 2370 const std::u16string& query,
2371 py::list params,
2372 py::list is_stmt_prepared,
2373 bool /*use_prepare*/,
2374 const py::dict& encoding_settings) {Lines 2384-2409 2384 (SQLPOINTER)SQL_CURSOR_FORWARD_ONLY, 0);
2385 SQLSetStmtAttr_ptr(hStmt, SQL_ATTR_CONCURRENCY,
2386 (SQLPOINTER)SQL_CONCUR_READ_ONLY, 0);
2387 }
! 2388
! 2389 // The encoding-settings dict has the form {"encoding": str, "ctype": int}.
! 2390 // Note: the Python layer's SQL_C_CHAR constant is numerically -8, the same
! 2391 // as ODBC's SQL_C_WCHAR. As a result, the only path that genuinely uses
! 2392 // byte-level character encoding is when the user explicitly opts in via
! 2393 // setencoding(..., ctype=mssql_python.SQL_CHAR) (which sends ctype=1, the
! 2394 // real ODBC SQL_CHAR). We default to utf-8 and only honor the dict's
! 2395 // encoding when ctype == 1 (real ODBC SQL_CHAR). Otherwise the user's
! 2396 // "encoding" value is meant for the wide-char path and we leave it alone.
! 2397 std::string charEncoding = "utf-8";
! 2398 if (encoding_settings.contains("ctype") && encoding_settings.contains("encoding")) {
! 2399 int ctype = encoding_settings["ctype"].cast<int>();
! 2400 if (ctype == SQL_C_CHAR /* real ODBC value: 1 */) {
! 2401 charEncoding = encoding_settings["encoding"].cast<std::string>();
! 2402 }
! 2403 }
! 2404
! 2405 // The cursor.py caller always passes a fresh `list(actual_params)` so this
2406 // function is free to mutate slots in place. Even so, every site below uses
2407 // PyList_SetItem (which decrefs the old slot before stealing the new ref),
2408 // so the function is safe regardless of who owns the list.Lines 2419-2428 2419 }
2420 if (!SQL_SUCCEEDED(rc)) return rc;
2421 is_stmt_prepared[0] = py::bool_(true);
2422 }
! 2423
! 2424 // DetectParamTypes + BindParameters in one shot — ParamInfo stays in C++
2425 std::vector<ParamInfo> paramInfos = DetectParamTypes(params);
2426 std::vector<std::shared_ptr<void>> paramBuffers;
2427 rc = BindParameters(*statementHandle, hStmt, params, paramInfos, paramBuffers, charEncoding);
2428 if (!SQL_SUCCEEDED(rc)) return rc;Lines 2448-2458 2448 const ParamInfo* matchedInfo = nullptr;
2449 for (auto& info : paramInfos) {
2450 if (reinterpret_cast<SQLPOINTER>(const_cast<ParamInfo*>(&info)) == paramToken) {
2451 matchedInfo = &info;
! 2452 break;
! 2453 }
! 2454 }
2455 if (!matchedInfo) {
2456 ThrowStdException("SQLExecuteFast: unrecognized paramToken from SQLParamData");
2457 }
2458 const py::object& pyObj = matchedInfo->dataPtr;Lines 2460-2476 2460 py::gil_scoped_release release;
2461 SQLPutData_ptr(hStmt, nullptr, 0);
2462 continue;
2463 }
! 2464
! 2465 if (py::isinstance<py::str>(pyObj)) {
! 2466 if (matchedInfo->paramCType == SQL_C_WCHAR) {
! 2467 std::u16string u16 = pyObj.cast<std::u16string>();
! 2468 const SQLWCHAR* dataPtr = reinterpretU16stringAsSqlWChar(u16);
2469 size_t totalChars = u16.size();
2470 size_t chunkChars = DAE_CHUNK_SIZE / sizeof(SQLWCHAR);
2471 for (size_t offset = 0; offset < totalChars; offset += chunkChars) {
! 2472 size_t len = std::min(chunkChars, totalChars - offset);
2473 {
2474 py::gil_scoped_release release;
2475 rc = SQLPutData_ptr(hStmt, (SQLPOINTER)(dataPtr + offset),
2476 static_cast<SQLLEN>(len * sizeof(SQLWCHAR)));Lines 2474-2482 2474 py::gil_scoped_release release;
2475 rc = SQLPutData_ptr(hStmt, (SQLPOINTER)(dataPtr + offset),
2476 static_cast<SQLLEN>(len * sizeof(SQLWCHAR)));
2477 }
! 2478 if (!SQL_SUCCEEDED(rc)) return rc;
2479 }
2480 } else if (matchedInfo->paramCType == SQL_C_CHAR) {
2481 std::string encodedStr;
2482 py::object encoded = pyObj.attr("encode")(charEncoding, "strict");Lines 2494-2502 2494 if (!SQL_SUCCEEDED(rc)) return rc;
2495 }
2496 } else {
2497 ThrowStdException("SQLExecuteFast: unsupported C type for str in DAE");
! 2498 }
2499 } else if (py::isinstance<py::bytes>(pyObj) ||
2500 py::isinstance<py::bytearray>(pyObj)) {
2501 py::bytes b = pyObj.cast<py::bytes>();
2502 std::string s = b;Lines 2510-2521 2510 rc = SQLPutData_ptr(hStmt, (SQLPOINTER)(dataPtr + offset),
2511 static_cast<SQLLEN>(len));
2512 }
2513 if (!SQL_SUCCEEDED(rc)) return rc;
! 2514 }
! 2515 } else {
! 2516 ThrowStdException("SQLExecuteFast: DAE only supported for str or bytes");
! 2517 }
2518 }
2519 if (!SQL_SUCCEEDED(rc) && rc != SQL_NO_DATA) return rc;
2520 }Lines 2523-2531 2523
2524 // Preserve the execute return code (e.g. SQL_SUCCESS_WITH_INFO) — don't
2525 // let the SQLFreeStmt return value clobber what the caller needs to see.
2526 SQLRETURN exec_rc = rc;
! 2527 SQLFreeStmt_ptr(hStmt, SQL_RESET_PARAMS);
2528 return exec_rc;
2529 }
2530
2531 SQLRETURN BindParameterArray(SqlHandle& handle, SQLHANDLE hStmt, const py::list& columnwise_params,📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.9%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.connection.connection.cpp: 76.2%
mssql_python.pybind.ddbc_bindings.cpp: 76.4%
mssql_python.row.py: 76.9%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.logging.py: 85.5%
mssql_python.connection.py: 85.6%🔗 Quick Links
|
- Comment out use_prepare parameter name (C4100: unreferenced parameter) - Remove unused catch variable name (C4101: unreferenced local variable)
Add explicit null pointer and zero-length guards before memcpy in build_numeric_data to satisfy DevSkim code scanning rule DS121708.
…or attrs, parity test Six review fixes for SQLExecuteFast_wrap and DetectParamTypes: 1. Encoding key: read 'encoding' from settings dict (was 'charEncoding' which never matched). Only honor when ctype==SQL_C_CHAR so the default utf-16le doesn't corrupt SQL_C_CHAR DAE/inline byte paths. 2. Subclass support: PyLong_Check/PyFloat_Check/PyUnicode_Check/PyBytes_Check instead of *_CheckExact. Fixes user-defined int/str/bytes/float subclasses that were silently rejected with TypeError. Switched PyBytes_GET_SIZE to PyBytes_Size for subclass-safe length. 3. GIL release in DAE loop: SQLParamData and SQLPutData now release the GIL during each ODBC call, matching slow-path concurrency for large blobs/strings. 4. Preserve exec_rc: stash the SQLExecute return code before SQLFreeStmt so SUCCESS_WITH_INFO and other non-success-non-error codes are not clobbered by the unbind call. 5. Shallow-copy params: params = py::list(params) at function entry so DetectParamTypes' in-place PyList_SET_ITEM cannot mutate the caller's list under any future code path that might pass it directly. 6. Cursor attrs: SQLSetStmtAttr(SQL_ATTR_CURSOR_TYPE/CONCURRENCY) at entry to match slow-path semantics regardless of prior hstmt state. Also adds tests/test_023_fast_path_parity.py covering int/str/bytes/float subclasses, caller-list non-mutation, and unsupported-type TypeError.
Eight follow-up fixes after review feedback on c5a827f. 1. Refcount leak (BLOCKER): replace PyList_SET_ITEM (uppercase, no decref of old slot) with PyList_SetItem (decrefs old slot before stealing the new reference) in DetectParamTypes time/Decimal/UUID branches. The previous shallow-copy defense via py::list(params) was a no-op because pybind11s list constructor only inc_refs an already-list argument. 2. Geometry + DAE conflict: gate the geometry-prefix override on the not-DAE branch so a long POLYGON/POINT/LINESTRING string does not end up with isDAE=true, dataPtr set, AND a non-zero columnSize. 3. Decimal NaN/Infinity: throw ValueError instead of silently binding 0 via build_numeric_data on an empty digits tuple. 4. Time format: always emit microseconds (HH:MM:SS.ffffff), matching slow path isoformat(timespec=microseconds). 5. PyObject_IsInstance: explicit equality check so a custom __instancecheck__ that raises (returns -1) does not fall through with a Python error set. 6. Dead code: removed unused SMALLMONEY_MIN/SMALLMONEY_MAX constants and the unused utf16Len assignments in DetectParamTypes. 7. Encoding-key contract: only honor encoding_settings encoding when the user explicitly opted in via setencoding(..., ctype=SQL_C_CHAR=1). The Python layer SQL_C_CHAR constant is numerically -8 (real ODBC SQL_C_WCHAR), so by default the wide-char path is taken and encoding is irrelevant. 8. Parity test rewrite: drop the dead _force_slow_path_roundtrip helper, use the project cursor fixture instead of a hard-coded conn string, and add (a) a real fast-vs-slow parity check via setinputsizes-forced slow path, (b) a refcount-leak regression test using a Decimal subclass + weakref, (c) explicit NaN-rejection coverage.
Resolve conflicts in ddbc_bindings.cpp from main's GH-610 work: - Keep both build_numeric_data (this PR) and ResolveNullParamType (main) - Adopt main's BindParameters/BindParameterArray signatures that take SqlHandle& handle; update the SQLExecuteFast_wrap call site to pass *statementHandle so the fast path uses the per-handle NULL describe cache - Migrate SQLExecuteFast_wrap from std::wstring + WStringToSQLWCHAR to std::u16string + reinterpretU16stringAsSqlWChar (main's uniform 16-bit query/param representation), dropping the platform #ifdef in both the prepare path and the DAE wide-char put-data loop Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| std::memset(&nd.val[0], 0, SQL_MAX_NUMERIC_LEN); | ||
| size_t copy_len = std::min(val_str.size(), static_cast<size_t>(SQL_MAX_NUMERIC_LEN)); | ||
| if (copy_len > 0 && val_str.data() != nullptr) { | ||
| std::memcpy(&nd.val[0], val_str.data(), copy_len); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work Item / Issue Reference
Summary
This pull request refactors the parameter handling logic in the
executemethod ofmssql_python/cursor.pyto introduce a more efficient "fast path" for parameter binding and execution. The new approach allows the code to skip Python-side type detection and binding when there are noinputsizesoverrides, delegating the entire process to the C++ layer for better performance. The slow path is retained for cases whereinputsizesare set.Performance improvements and code simplification:
DDBCSQLExecuteFastto handle parameter type detection, binding, and execution entirely in C++ when there are noinputsizesoverrides, improving performance for common cases. (F783d0c6L1471)inputsizesare present. [1] [2]inputsizesoverrides are specified, ensuring compatibility with advanced usage. (F783d0c6L1471)