Support readTsFile table function for external TsFiles#17951
Support readTsFile table function for external TsFiles#17951shuwenwei wants to merge 43 commits into
Conversation
Caideyipi
left a comment
There was a problem hiding this comment.
I found two issues that should be fixed before merging:
- External TsFile resources can leak if planning fails after the resource is created.
RelationPlanner.java:1647 creates the ExternalTsFileQueryResource, and ExternalTsFileQueryResource.java:106-108 increments external file-reader references. Those resources are only released through QueryExecution.stopAndCleanup() at QueryExecution.java:395. However, if execution.start() throws during logical planning, distribution planning, or scheduling, Coordinator.java:340-343 only releases frontend memory and schema locks. A timeout or optimizer/distribution exception after planExternalTsFileScan() can therefore leave reader references and temporary directories alive. Please either make QueryExecution.start() transition to failed and clean up on thrown planning exceptions, or have Coordinator.execution() release queryContext.releaseExternalTsFileQueryResources() when execution.start() fails before normal cleanup can run.
read_tsfilesilently drops ATTRIBUTE columns.
TsFileSchemaCollector.java:218-229 only collects TAG and FIELD columns, and build() at TsFileSchemaCollector.java:292-307 only emits time, tags, and fields. If an external TsFile table schema contains ColumnCategory.ATTRIBUTE, those columns disappear from the TVF output schema. The execution path also creates AlignedDeviceEntry(deviceID, new Binary[0]) in ExternalTsFileQueryResource.java:125, so attribute values are not available later either. Since table model supports ATTRIBUTE columns, this should either be implemented end to end or rejected explicitly instead of returning an incomplete schema.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17951 +/- ##
============================================
- Coverage 41.07% 41.03% -0.04%
Complexity 318 318
============================================
Files 5248 5269 +21
Lines 363998 367358 +3360
Branches 47026 47515 +489
============================================
+ Hits 149523 150761 +1238
- Misses 214475 216597 +2122 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
I found one new issue that should be fixed before merging: ead_tsfile does not appear to have an authorization gate for reading external server-side files. It is registered as a built-in table function in TableMetadataImpl.java, and StatementAnalyzer.visitTableFunctionInvocation() directly analyzes it. ReadTsFileTableFunction.analyze() then reads the user-supplied PATHS on the DataNode filesystem to collect TsFile schemas. This is a different capability from selecting an IoTDB table: any SQL user who can invoke the table function can ask the DataNode process to inspect external local paths, except for the current data-dir exclusion. Please add an explicit privilege boundary for this function, for example admin / SYSTEM / MAINTAIN or another clearly chosen system privilege, before any path access or schema collection happens. Also, the two earlier issues still look not fully addressed: external TsFile resources can still leak if planning/start throws after the resource is created, and ATTRIBUTE columns are still silently dropped instead of being supported or rejected explicitly. |
There was a problem hiding this comment.
Pull request overview
This PR adds table-model support for querying external TsFiles via a new built-in table function (read_tsfile) and introduces the corresponding planning, distribution, execution operators, and query-scoped resource management required to scan/aggregate external TsFile content.
Changes:
- Add
read_tsfiletable function analysis/planning plus schema inference/merge from external TsFiles (including directory scanning and validity checks). - Introduce external TsFile scan/aggregation plan nodes and execution operators, with query-scoped resource management (
ExternalTsFileQueryResource) and FileReaderManager support for external readers. - Integrate predicate/aggregation pushdown, ordering handling, and add UT/IT coverage plus i18n messages.
Reviewed changes
Copilot reviewed 50 out of 50 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Bumps TsFile dependency snapshot to pick up required reader/schema APIs. |
| iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/queryengine/execution/MemoryEstimationHelper.java | Exposes ArrayList instance size constant for broader reuse in memory estimation. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/buffer/TimeSeriesMetadataCacheTest.java | Updates mocks for new TsFile reader metadata API signature (offset support). |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ExternalTsFileQueryResourceTest.java | Adds UTs for run merging/ordering and offset mapping behavior. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/read/QueryDataSourceType.java | Adds EXTERNAL_TSFILE_SCAN query data source type. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/read/control/FileReaderManager.java | Adds caching + ref-counting for external TsFile readers. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/buffer/TimeSeriesMetadataCache.java | Adds external-scan/offset-aware metadata loading and disables caching for external scans. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/buffer/ChunkCache.java | Disables chunk caching for external TsFile scans and routes reader acquisition accordingly. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/buffer/BloomFilterCache.java | Adds external-scan bypass to avoid caching bloom filters for external TsFiles. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/RelationPlanner.java | Plans read_tsfile as an ExternalTsFileScanNode with dynamically registered schema. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/UnaliasSymbolReferences.java | Adds rewrite support for ExternalTsFileScanNode. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/TransformSortToStreamSort.java | Extends sort optimization to recognize external TsFile scans. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/PushPredicateIntoTableScan.java | Pushes time/device filters into external TsFile scans and constructs schema filters for tag predicates. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/PushAggregationIntoTableScan.java | Enables aggregation pushdown analysis for external TsFile scans using analyzed schema. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/node/ExternalTsFileScanNode.java | New scan node representing external TsFile scanning with per-query resource linkage. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/node/ExternalTsFileAggregationScanNode.java | New aggregation scan node variant for external TsFiles. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/node/AggregationTableScanNode.java | Combines external scan + aggregation into the new external aggregation scan node. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/iterative/rule/PruneTableScanColumns.java | Adds pruning logic for ExternalTsFileScanNode while preserving time predicate symbols. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/distribute/TableDistributedPlanGenerator.java | Splits external TsFile scans/aggregations into local partitions and applies sort properties. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/metadata/TableMetadataImpl.java | Routes table-function lookup to DataNode built-ins before generic factory lookup. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/TsFileSchemaCollector.java | New schema collection/merge logic for external TsFiles (file/dir scanning + validation). |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ReadTsFileTableFunction.java | New built-in table function: argument parsing, schema analysis, path safety checks, handle creation. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ExternalTsFileQueryResource.java | New query-scoped resource managing external readers, device task partitioning, temp runs, memory accounting. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ExternalTsFileQueryDataSource.java | QueryDataSource wrapper to carry the external query resource into operators. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ExternalTsFileDeviceFilterVisitor.java | Evaluates tag-based SchemaFilter predicates against external TsFile device IDs. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/DataNodeTableBuiltinTableFunction.java | Registers read_tsfile as a DataNode-provided built-in table function. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/StatementAnalyzer.java | Injects MPPQueryContext into ReadTsFileTableFunction during analysis. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/predicate/schema/ConvertSchemaPredicateToFilterVisitor.java | Adds BetweenPredicate conversion into SchemaFilter form (AND of bounds). |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/PlanVisitor.java | Adds visitor hooks for external TsFile scan nodes/operators. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/PlanGraphPrinter.java | Adds external TsFile scan details (e.g., TsFile count) to plan printing. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/DataNodeTableOperatorGenerator.java | Generates external TsFile scan and aggregation scan operators and tracks resource lifetime. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/execution/QueryExecution.java | Uses centralized timeout check and releases external TsFile query resources on cleanup. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/execution/config/metadata/ShowFunctionsTask.java | Includes DataNode built-in table functions in SHOW FUNCTIONS output. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/SeriesScanUtil.java | Refactors TTL filter update into overridable method (needed by external scan util). |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/InformationSchemaContentSupplierFactory.java | Shows combined built-in table functions (commons + DataNode) in information schema iteration. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/ExternalTsFileTableScanOperator.java | New operator to scan external TsFiles using device task runs and per-device offsets. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/ExternalTsFileSeriesScanUtil.java | New scan util: disables TTL and loads metadata using per-device offset ranges. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/ExternalTsFileAggTableScanOperator.java | New operator for aggregation scans over external TsFiles with offset-based metadata loading. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/AbstractTableScanOperator.java | Makes selected fields/methods protected to enable external scan operator specialization. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/AbstractDefaultAggTableScanOperator.java | Makes instance size constant protected for subclasses’ RAM accounting. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/AbstractAggTableScanOperator.java | Refactors “move to next device” flow for reuse/override by external aggregation operator. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/FileLoaderUtils.java | Adds offset-aware aligned metadata load path plumbing. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/AlignedSeriesScanUtil.java | Adapts to new FileLoaderUtils signature by passing null offsets for normal scans. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/fragment/QueryContext.java | Adds isExternalTsFileScan() hook for cache behavior control. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/fragment/FragmentInstanceContext.java | Tracks/query-inits external TsFile QueryDataSource and wires QueryDataSourceType handling. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/common/MPPQueryContext.java | Adds external query resource creation/cleanup and central timeout checking. |
| iotdb-core/datanode/src/main/i18n/zh/org/apache/iotdb/db/i18n/DataNodeQueryMessages.java | Adds Chinese messages for external TsFile planning/execution/runtime errors. |
| iotdb-core/datanode/src/main/i18n/en/org/apache/iotdb/db/i18n/DataNodeQueryMessages.java | Adds English messages for external TsFile planning/execution/runtime errors. |
| integration-test/src/test/java/org/apache/iotdb/relational/it/query/recent/IoTDBReadTsFileTableFunctionIT.java | Adds IT coverage for reading/aggregating external TsFiles, schema merge, directory scanning, and path rejection. |
| integration-test/src/test/java/org/apache/iotdb/relational/it/db/it/udf/IoTDBSQLFunctionManagementIT.java | Updates expected built-in function counts to include the new DataNode table function. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|




Description
This PR adds relational
readTsFiletable function support for querying external TsFiles.Main changes:
readTsFileTVF planning/analyze support and schema collection throughTsFileSchemaCollector..tsfilefiles and skip invalid TsFile contents by magic-number validation.ExternalTsFileQueryResourceto manage external TsFile readers, task partitioning, run-file merge reads, temporary files, and execution-lifetime memory reservation cleanup.readTsFiletable function path.Tests
Not run in this turn.