Skip to content

PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549

Draft
bewithgaurav wants to merge 9 commits into
mainfrom
bewithgaurav/insertmany-perf-detect-types
Draft

PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549
bewithgaurav wants to merge 9 commits into
mainfrom
bewithgaurav/insertmany-perf-detect-types

Conversation

@bewithgaurav

@bewithgaurav bewithgaurav commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Work Item / Issue Reference

AB#44979

GitHub Issue: #500


Summary

This pull request refactors the parameter handling logic in the execute method of mssql_python/cursor.py to introduce a more efficient "fast path" for parameter binding and execution. The new approach allows the code to skip Python-side type detection and binding when there are no inputsizes overrides, delegating the entire process to the C++ layer for better performance. The slow path is retained for cases where inputsizes are set.

Performance improvements and code simplification:

  • Added a "fast path" execution branch that uses DDBCSQLExecuteFast to handle parameter type detection, binding, and execution entirely in C++ when there are no inputsizes overrides, improving performance for common cases. (F783d0c6L1471)
  • Removed the unconditional creation of the Python-side parameter type list and its associated logic from the main execution path, now only used in the "slow path" when inputsizes are present. [1] [2]
  • Retained the existing Python-side type detection and parameter binding logic as a fallback for cases where inputsizes overrides are specified, ensuring compatibility with advanced usage. (F783d0c6L1471)

Move parameter type detection from Python into C++ using raw CPython
type checks (PyLong_CheckExact, PyFloat_CheckExact, etc.). Merge the
DetectParamTypes → BindParameters → SQLExecute pipeline into a single
DDBCSQLExecuteFast call so ParamInfo never crosses the pybind11 boundary.

- DetectParamTypes: handles int (range-detected), float, bool, str
  (unicode + geometry sniffing), bytes, datetime/date/time, Decimal
  (MONEY range + generic numeric), UUID, None, with fallback to string
- SQLExecuteFast_wrap: single pipeline with GIL release, always uses
  SQLPrepare for parameterized queries
- cursor.py: fast path routing when no setinputsizes overrides present;
  old DDBCSQLExecute path preserved for setinputsizes callers
- Named constants: MAX_INLINE_CHAR, MAX_INLINE_BINARY, MAX_NUMERIC_PRECISION,
  MONEY/SMALLMONEY ranges, PARAM_C_TYPE_TEXT platform macro
Comment thread mssql_python/pybind/ddbc_bindings.cpp Fixed
- Add complete DAE (Data-At-Execution) loop to SQLExecuteFast_wrap:
  SQL_NEED_DATA → SQLParamData/SQLPutData for large str/bytes/binary,
  matching the existing SQLExecute_wrap logic exactly
- Fix DAE type assignment: non-unicode DAE strings use SQL_C_CHAR
  (not PARAM_C_TYPE_TEXT which maps to SQL_C_WCHAR on macOS/Linux)
- Fix MONEY range lower bound: use MONEY_MIN not SMALLMONEY_MIN so
  negative decimals in MONEY range bind as VARCHAR (matches Python path)
- Raise TypeError for unknown param types instead of silent str conversion
- Add SQLFreeStmt(SQL_RESET_PARAMS) to unbind after execute
@github-actions

github-actions Bot commented Apr 29, 2026

Copy link
Copy Markdown

📊 Code Coverage Report

🔥 Diff Coverage

89%


🎯 Overall Coverage

80%


📈 Total Lines Covered: 6967 out of 8640
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/cursor.py (100%)
  • mssql_python/pybind/ddbc_bindings.cpp (88.9%): Missing lines 585-589,2363-2364,2367-2370,2388-2405,2423-2424,2452-2454,2464-2468,2472,2478,2498,2514-2517,2527

Summary

  • Total: 436 lines
  • Missing: 47 lines
  • Coverage: 89%

mssql_python/pybind/ddbc_bindings.cpp

Lines 581-593

  581                     info.paramSQLType = SQL_BIGINT;
  582                     info.paramCType = SQL_C_SBIGINT;
  583                     info.columnSize = 19;
  584                 }
! 585             } else {
! 586                 PyErr_Clear();
! 587                 info.paramSQLType = SQL_BIGINT;
! 588                 info.paramCType = SQL_C_SBIGINT;
! 589                 info.columnSize = 19;
  590             }
  591             info.decimalDigits = 0;
  592             continue;
  593         }

Lines 2359-2374

  2359 
  2360 // ---------------------------------------------------------------------------
  2361 // SQLExecuteFast — single C++ pipeline: DetectParamTypes → BindParameters → SQLExecute
  2362 // No ParamInfo objects cross the pybind11 boundary.
! 2363 //
! 2364 // Always uses SQLPrepare (not ExecDirect) because parameterized queries
  2365 // benefit from prepared plan reuse, and the fast path is only invoked
  2366 // when parameters are present. The use_prepare flag from the caller is
! 2367 // acknowledged but overridden — this is a perf-only code path.
! 2368 // ---------------------------------------------------------------------------
! 2369 SQLRETURN SQLExecuteFast_wrap(const SqlHandlePtr statementHandle,
! 2370                               const std::u16string& query,
  2371                               py::list params,
  2372                               py::list is_stmt_prepared,
  2373                               bool /*use_prepare*/,
  2374                               const py::dict& encoding_settings) {

Lines 2384-2409

  2384                            (SQLPOINTER)SQL_CURSOR_FORWARD_ONLY, 0);
  2385         SQLSetStmtAttr_ptr(hStmt, SQL_ATTR_CONCURRENCY,
  2386                            (SQLPOINTER)SQL_CONCUR_READ_ONLY, 0);
  2387     }
! 2388 
! 2389     // The encoding-settings dict has the form {"encoding": str, "ctype": int}.
! 2390     // Note: the Python layer's SQL_C_CHAR constant is numerically -8, the same
! 2391     // as ODBC's SQL_C_WCHAR. As a result, the only path that genuinely uses
! 2392     // byte-level character encoding is when the user explicitly opts in via
! 2393     // setencoding(..., ctype=mssql_python.SQL_CHAR) (which sends ctype=1, the
! 2394     // real ODBC SQL_CHAR). We default to utf-8 and only honor the dict's
! 2395     // encoding when ctype == 1 (real ODBC SQL_CHAR). Otherwise the user's
! 2396     // "encoding" value is meant for the wide-char path and we leave it alone.
! 2397     std::string charEncoding = "utf-8";
! 2398     if (encoding_settings.contains("ctype") && encoding_settings.contains("encoding")) {
! 2399         int ctype = encoding_settings["ctype"].cast<int>();
! 2400         if (ctype == SQL_C_CHAR /* real ODBC value: 1 */) {
! 2401             charEncoding = encoding_settings["encoding"].cast<std::string>();
! 2402         }
! 2403     }
! 2404 
! 2405     // The cursor.py caller always passes a fresh `list(actual_params)` so this
  2406     // function is free to mutate slots in place. Even so, every site below uses
  2407     // PyList_SetItem (which decrefs the old slot before stealing the new ref),
  2408     // so the function is safe regardless of who owns the list.

Lines 2419-2428

  2419         }
  2420         if (!SQL_SUCCEEDED(rc)) return rc;
  2421         is_stmt_prepared[0] = py::bool_(true);
  2422     }
! 2423 
! 2424     // DetectParamTypes + BindParameters in one shot — ParamInfo stays in C++
  2425     std::vector<ParamInfo> paramInfos = DetectParamTypes(params);
  2426     std::vector<std::shared_ptr<void>> paramBuffers;
  2427     rc = BindParameters(*statementHandle, hStmt, params, paramInfos, paramBuffers, charEncoding);
  2428     if (!SQL_SUCCEEDED(rc)) return rc;

Lines 2448-2458

  2448             const ParamInfo* matchedInfo = nullptr;
  2449             for (auto& info : paramInfos) {
  2450                 if (reinterpret_cast<SQLPOINTER>(const_cast<ParamInfo*>(&info)) == paramToken) {
  2451                     matchedInfo = &info;
! 2452                     break;
! 2453                 }
! 2454             }
  2455             if (!matchedInfo) {
  2456                 ThrowStdException("SQLExecuteFast: unrecognized paramToken from SQLParamData");
  2457             }
  2458             const py::object& pyObj = matchedInfo->dataPtr;

Lines 2460-2476

  2460                 py::gil_scoped_release release;
  2461                 SQLPutData_ptr(hStmt, nullptr, 0);
  2462                 continue;
  2463             }
! 2464 
! 2465             if (py::isinstance<py::str>(pyObj)) {
! 2466                 if (matchedInfo->paramCType == SQL_C_WCHAR) {
! 2467                     std::u16string u16 = pyObj.cast<std::u16string>();
! 2468                     const SQLWCHAR* dataPtr = reinterpretU16stringAsSqlWChar(u16);
  2469                     size_t totalChars = u16.size();
  2470                     size_t chunkChars = DAE_CHUNK_SIZE / sizeof(SQLWCHAR);
  2471                     for (size_t offset = 0; offset < totalChars; offset += chunkChars) {
! 2472                         size_t len = std::min(chunkChars, totalChars - offset);
  2473                         {
  2474                             py::gil_scoped_release release;
  2475                             rc = SQLPutData_ptr(hStmt, (SQLPOINTER)(dataPtr + offset),
  2476                                                 static_cast<SQLLEN>(len * sizeof(SQLWCHAR)));

Lines 2474-2482

  2474                             py::gil_scoped_release release;
  2475                             rc = SQLPutData_ptr(hStmt, (SQLPOINTER)(dataPtr + offset),
  2476                                                 static_cast<SQLLEN>(len * sizeof(SQLWCHAR)));
  2477                         }
! 2478                         if (!SQL_SUCCEEDED(rc)) return rc;
  2479                     }
  2480                 } else if (matchedInfo->paramCType == SQL_C_CHAR) {
  2481                     std::string encodedStr;
  2482                     py::object encoded = pyObj.attr("encode")(charEncoding, "strict");

Lines 2494-2502

  2494                         if (!SQL_SUCCEEDED(rc)) return rc;
  2495                     }
  2496                 } else {
  2497                     ThrowStdException("SQLExecuteFast: unsupported C type for str in DAE");
! 2498                 }
  2499             } else if (py::isinstance<py::bytes>(pyObj) ||
  2500                        py::isinstance<py::bytearray>(pyObj)) {
  2501                 py::bytes b = pyObj.cast<py::bytes>();
  2502                 std::string s = b;

Lines 2510-2521

  2510                         rc = SQLPutData_ptr(hStmt, (SQLPOINTER)(dataPtr + offset),
  2511                                             static_cast<SQLLEN>(len));
  2512                     }
  2513                     if (!SQL_SUCCEEDED(rc)) return rc;
! 2514                 }
! 2515             } else {
! 2516                 ThrowStdException("SQLExecuteFast: DAE only supported for str or bytes");
! 2517             }
  2518         }
  2519         if (!SQL_SUCCEEDED(rc) && rc != SQL_NO_DATA) return rc;
  2520     }

Lines 2523-2531

  2523 
  2524     // Preserve the execute return code (e.g. SQL_SUCCESS_WITH_INFO) — don't
  2525     // let the SQLFreeStmt return value clobber what the caller needs to see.
  2526     SQLRETURN exec_rc = rc;
! 2527     SQLFreeStmt_ptr(hStmt, SQL_RESET_PARAMS);
  2528     return exec_rc;
  2529 }
  2530 
  2531 SQLRETURN BindParameterArray(SqlHandle& handle, SQLHANDLE hStmt, const py::list& columnwise_params,


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.9%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.connection.connection.cpp: 76.2%
mssql_python.pybind.ddbc_bindings.cpp: 76.4%
mssql_python.row.py: 76.9%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.logging.py: 85.5%
mssql_python.connection.py: 85.6%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

- Comment out use_prepare parameter name (C4100: unreferenced parameter)
- Remove unused catch variable name (C4101: unreferenced local variable)
Add explicit null pointer and zero-length guards before memcpy in
build_numeric_data to satisfy DevSkim code scanning rule DS121708.
Comment thread mssql_python/pybind/ddbc_bindings.cpp Fixed
@github-actions github-actions Bot added the pr-size: large Substantial code update label May 7, 2026
…or attrs, parity test

Six review fixes for SQLExecuteFast_wrap and DetectParamTypes:

1. Encoding key: read 'encoding' from settings dict (was 'charEncoding'
   which never matched). Only honor when ctype==SQL_C_CHAR so the default
   utf-16le doesn't corrupt SQL_C_CHAR DAE/inline byte paths.
2. Subclass support: PyLong_Check/PyFloat_Check/PyUnicode_Check/PyBytes_Check
   instead of *_CheckExact. Fixes user-defined int/str/bytes/float
   subclasses that were silently rejected with TypeError. Switched
   PyBytes_GET_SIZE to PyBytes_Size for subclass-safe length.
3. GIL release in DAE loop: SQLParamData and SQLPutData now release the
   GIL during each ODBC call, matching slow-path concurrency for large
   blobs/strings.
4. Preserve exec_rc: stash the SQLExecute return code before SQLFreeStmt
   so SUCCESS_WITH_INFO and other non-success-non-error codes are not
   clobbered by the unbind call.
5. Shallow-copy params: params = py::list(params) at function entry so
   DetectParamTypes' in-place PyList_SET_ITEM cannot mutate the caller's
   list under any future code path that might pass it directly.
6. Cursor attrs: SQLSetStmtAttr(SQL_ATTR_CURSOR_TYPE/CONCURRENCY) at
   entry to match slow-path semantics regardless of prior hstmt state.

Also adds tests/test_023_fast_path_parity.py covering int/str/bytes/float
subclasses, caller-list non-mutation, and unsupported-type TypeError.
Comment thread tests/test_023_fast_path_parity.py Fixed
bewithgaurav and others added 2 commits May 7, 2026 14:46
Eight follow-up fixes after review feedback on c5a827f.

1. Refcount leak (BLOCKER): replace PyList_SET_ITEM (uppercase, no decref of
   old slot) with PyList_SetItem (decrefs old slot before stealing the new
   reference) in DetectParamTypes time/Decimal/UUID branches. The previous
   shallow-copy defense via py::list(params) was a no-op because pybind11s
   list constructor only inc_refs an already-list argument.
2. Geometry + DAE conflict: gate the geometry-prefix override on the not-DAE
   branch so a long POLYGON/POINT/LINESTRING string does not end up with
   isDAE=true, dataPtr set, AND a non-zero columnSize.
3. Decimal NaN/Infinity: throw ValueError instead of silently binding 0 via
   build_numeric_data on an empty digits tuple.
4. Time format: always emit microseconds (HH:MM:SS.ffffff), matching slow
   path isoformat(timespec=microseconds).
5. PyObject_IsInstance: explicit equality check so a custom __instancecheck__
   that raises (returns -1) does not fall through with a Python error set.
6. Dead code: removed unused SMALLMONEY_MIN/SMALLMONEY_MAX constants and the
   unused utf16Len assignments in DetectParamTypes.
7. Encoding-key contract: only honor encoding_settings encoding when the
   user explicitly opted in via setencoding(..., ctype=SQL_C_CHAR=1). The
   Python layer SQL_C_CHAR constant is numerically -8 (real ODBC SQL_C_WCHAR),
   so by default the wide-char path is taken and encoding is irrelevant.
8. Parity test rewrite: drop the dead _force_slow_path_roundtrip helper, use
   the project cursor fixture instead of a hard-coded conn string, and add
   (a) a real fast-vs-slow parity check via setinputsizes-forced slow path,
   (b) a refcount-leak regression test using a Decimal subclass + weakref,
   (c) explicit NaN-rejection coverage.
Resolve conflicts in ddbc_bindings.cpp from main's GH-610 work:
- Keep both build_numeric_data (this PR) and ResolveNullParamType (main)
- Adopt main's BindParameters/BindParameterArray signatures that take
  SqlHandle& handle; update the SQLExecuteFast_wrap call site to pass
  *statementHandle so the fast path uses the per-handle NULL describe cache
- Migrate SQLExecuteFast_wrap from std::wstring + WStringToSQLWCHAR to
  std::u16string + reinterpretU16stringAsSqlWChar (main's uniform 16-bit
  query/param representation), dropping the platform #ifdef in both the
  prepare path and the DAE wide-char put-data loop

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
std::memset(&nd.val[0], 0, SQL_MAX_NUMERIC_LEN);
size_t copy_len = std::min(val_str.size(), static_cast<size_t>(SQL_MAX_NUMERIC_LEN));
if (copy_len > 0 && val_str.data() != nullptr) {
std::memcpy(&nd.val[0], val_str.data(), copy_len);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: large Substantial code update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants