Problem
The SDK currently exposes filter composition with &, |, and repeated .filter() calls, but the behavior for mixed boolean logic does not preserve the semantics users would reasonably expect from the Python expression.
This came up while reviewing #22. That PR correctly enables multiple rules on the same field for flat same-field filters, but it also made the broader filter API easier to trust for range and cohort workflows. The remaining issue is that mixed AND/OR expressions can silently serialize differently from the expression a data scientist wrote.
Repro cases
Repeated .filter() calls after an OR composite
The query docs say multiple .filter() calls are combined with AND logic:
query = (
model.query()
.filter((model.dimensions.status == "active") | (model.dimensions.status == "pending"))
.filter(model.dimensions.status != "deleted")
)
Expected logical shape:
(status = active OR status = pending) AND status != deleted
Current behavior preserves the existing or aggregation when appending a single filter, producing the equivalent of:
status = active OR status = pending OR status != deleted
The relevant path is Query.filter() when self._filters already exists and the new filter is a DimensionFilter.
Nested boolean expressions flatten precedence
The docs currently show a complex filter example like:
f = (
(model.dimensions.country == "USA") &
((model.dimensions.amount > 1000) | (model.dimensions.priority == "high"))
)
Expected logical shape:
country = USA AND (amount > 1000 OR priority = high)
Current behavior flattens the filters into a single aggregation, so the nested precedence is not represented in the serialized payload.
Why this matters
For exploratory analysis and notebook workflows, users often write cohort filters, date windows, and exclusion rules in Python expressions. If the SDK silently changes OR/AND semantics, a query can return a materially different population without an obvious failure. That is especially risky for data scientists using this SDK to build analysis datasets.
Possible resolutions
- Implement nested composite serialization if the Lightdash API supports nested boolean filter groups.
- If the API only supports flat
and/or groups, reject mixed nested expressions early with a clear error.
- Update docs to avoid advertising unsupported complex combinations until the behavior is implemented.
- Fix repeated
.filter() calls so they are actually AND-ed, or document the exact current behavior if preserving aggregation is intentional.
Acceptance criteria
- Mixed
AND/OR filter expressions either serialize with correct precedence or raise an explicit unsupported-operation error.
- Multiple
.filter() calls always behave as documented, especially when the first filter is an OR composite.
- Tests cover:
(a | b).filter(c) style chaining through Query.filter().
a & (b | c) nested expression behavior.
- The documented complex combination example.
docs/SDK_GUIDE.md matches the actual supported filter semantics.
Problem
The SDK currently exposes filter composition with
&,|, and repeated.filter()calls, but the behavior for mixed boolean logic does not preserve the semantics users would reasonably expect from the Python expression.This came up while reviewing #22. That PR correctly enables multiple rules on the same field for flat same-field filters, but it also made the broader filter API easier to trust for range and cohort workflows. The remaining issue is that mixed AND/OR expressions can silently serialize differently from the expression a data scientist wrote.
Repro cases
Repeated
.filter()calls after an OR compositeThe query docs say multiple
.filter()calls are combined with AND logic:Expected logical shape:
Current behavior preserves the existing
oraggregation when appending a single filter, producing the equivalent of:The relevant path is
Query.filter()whenself._filtersalready exists and the new filter is aDimensionFilter.Nested boolean expressions flatten precedence
The docs currently show a complex filter example like:
Expected logical shape:
Current behavior flattens the filters into a single aggregation, so the nested precedence is not represented in the serialized payload.
Why this matters
For exploratory analysis and notebook workflows, users often write cohort filters, date windows, and exclusion rules in Python expressions. If the SDK silently changes OR/AND semantics, a query can return a materially different population without an obvious failure. That is especially risky for data scientists using this SDK to build analysis datasets.
Possible resolutions
and/orgroups, reject mixed nested expressions early with a clear error..filter()calls so they are actually AND-ed, or document the exact current behavior if preserving aggregation is intentional.Acceptance criteria
AND/ORfilter expressions either serialize with correct precedence or raise an explicit unsupported-operation error..filter()calls always behave as documented, especially when the first filter is an OR composite.(a | b).filter(c)style chaining throughQuery.filter().a & (b | c)nested expression behavior.docs/SDK_GUIDE.mdmatches the actual supported filter semantics.