Skip to content

Support readTsFile table function for external TsFiles#17951

Open
shuwenwei wants to merge 43 commits into
masterfrom
read_tsfile_table_function
Open

Support readTsFile table function for external TsFiles#17951
shuwenwei wants to merge 43 commits into
masterfrom
read_tsfile_table_function

Conversation

@shuwenwei

Copy link
Copy Markdown
Member

Description

This PR adds relational readTsFile table function support for querying external TsFiles.

Main changes:

  • Add readTsFile TVF planning/analyze support and schema collection through TsFileSchemaCollector.
  • Validate explicit file paths by opening them as TsFiles, while directory inputs recursively scan .tsfile files and skip invalid TsFile contents by magic-number validation.
  • Add external TsFile scan and aggregation scan plan nodes/operators.
  • Add ExternalTsFileQueryResource to manage external TsFile readers, task partitioning, run-file merge reads, temporary files, and execution-lifetime memory reservation cleanup.
  • Route related user-facing messages through DataNode query i18n messages.
  • Add UT/IT coverage for external TsFile query resources and the readTsFile table function path.

Tests

Not run in this turn.

@Caideyipi Caideyipi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two issues that should be fixed before merging:

  1. External TsFile resources can leak if planning fails after the resource is created.

RelationPlanner.java:1647 creates the ExternalTsFileQueryResource, and ExternalTsFileQueryResource.java:106-108 increments external file-reader references. Those resources are only released through QueryExecution.stopAndCleanup() at QueryExecution.java:395. However, if execution.start() throws during logical planning, distribution planning, or scheduling, Coordinator.java:340-343 only releases frontend memory and schema locks. A timeout or optimizer/distribution exception after planExternalTsFileScan() can therefore leave reader references and temporary directories alive. Please either make QueryExecution.start() transition to failed and clean up on thrown planning exceptions, or have Coordinator.execution() release queryContext.releaseExternalTsFileQueryResources() when execution.start() fails before normal cleanup can run.

  1. read_tsfile silently drops ATTRIBUTE columns.

TsFileSchemaCollector.java:218-229 only collects TAG and FIELD columns, and build() at TsFileSchemaCollector.java:292-307 only emits time, tags, and fields. If an external TsFile table schema contains ColumnCategory.ATTRIBUTE, those columns disappear from the TVF output schema. The execution path also creates AlignedDeviceEntry(deviceID, new Binary[0]) in ExternalTsFileQueryResource.java:125, so attribute values are not available later either. Since table model supports ATTRIBUTE columns, this should either be implemented end to end or rejected explicitly instead of returning an incomplete schema.

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 27.77409% with 1087 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.03%. Comparing base (72e72dd) to head (85bbcd3).
⚠️ Report is 28 commits behind head on master.

Files with missing lines Patch % Lines
...unction/tvf/read_tsfile/TsFileSchemaCollector.java 0.00% 180 Missing ⚠️
...n/tvf/read_tsfile/ExternalTsFileQueryResource.java 61.80% 157 Missing ⚠️
...ction/tvf/read_tsfile/ReadTsFileTableFunction.java 0.00% 119 Missing ⚠️
...nner/distribute/TableDistributedPlanGenerator.java 22.72% 102 Missing ⚠️
...relational/ExternalTsFileAggTableScanOperator.java 0.00% 59 Missing ⚠️
...ce/relational/ExternalTsFileTableScanOperator.java 0.00% 53 Missing ⚠️
...ational/planner/node/AggregationTableScanNode.java 2.08% 47 Missing ⚠️
...e/plan/planner/DataNodeTableOperatorGenerator.java 2.70% 36 Missing ⚠️
...ngine/plan/relational/planner/RelationPlanner.java 7.89% 35 Missing ⚠️
...elational/planner/node/ExternalTsFileScanNode.java 0.00% 28 Missing ⚠️
... and 29 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17951      +/-   ##
============================================
- Coverage     41.07%   41.03%   -0.04%     
  Complexity      318      318              
============================================
  Files          5248     5269      +21     
  Lines        363998   367358    +3360     
  Branches      47026    47515     +489     
============================================
+ Hits         149523   150761    +1238     
- Misses       214475   216597    +2122     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Caideyipi

Copy link
Copy Markdown
Collaborator

I found one new issue that should be fixed before merging:

ead_tsfile does not appear to have an authorization gate for reading external server-side files. It is registered as a built-in table function in TableMetadataImpl.java, and StatementAnalyzer.visitTableFunctionInvocation() directly analyzes it. ReadTsFileTableFunction.analyze() then reads the user-supplied PATHS on the DataNode filesystem to collect TsFile schemas. This is a different capability from selecting an IoTDB table: any SQL user who can invoke the table function can ask the DataNode process to inspect external local paths, except for the current data-dir exclusion.

Please add an explicit privilege boundary for this function, for example admin / SYSTEM / MAINTAIN or another clearly chosen system privilege, before any path access or schema collection happens.

Also, the two earlier issues still look not fully addressed: external TsFile resources can still leak if planning/start throws after the resource is created, and ATTRIBUTE columns are still silently dropped instead of being supported or rejected explicitly.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds table-model support for querying external TsFiles via a new built-in table function (read_tsfile) and introduces the corresponding planning, distribution, execution operators, and query-scoped resource management required to scan/aggregate external TsFile content.

Changes:

  • Add read_tsfile table function analysis/planning plus schema inference/merge from external TsFiles (including directory scanning and validity checks).
  • Introduce external TsFile scan/aggregation plan nodes and execution operators, with query-scoped resource management (ExternalTsFileQueryResource) and FileReaderManager support for external readers.
  • Integrate predicate/aggregation pushdown, ordering handling, and add UT/IT coverage plus i18n messages.

Reviewed changes

Copilot reviewed 50 out of 50 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pom.xml Bumps TsFile dependency snapshot to pick up required reader/schema APIs.
iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/queryengine/execution/MemoryEstimationHelper.java Exposes ArrayList instance size constant for broader reuse in memory estimation.
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/buffer/TimeSeriesMetadataCacheTest.java Updates mocks for new TsFile reader metadata API signature (offset support).
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ExternalTsFileQueryResourceTest.java Adds UTs for run merging/ordering and offset mapping behavior.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/read/QueryDataSourceType.java Adds EXTERNAL_TSFILE_SCAN query data source type.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/read/control/FileReaderManager.java Adds caching + ref-counting for external TsFile readers.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/buffer/TimeSeriesMetadataCache.java Adds external-scan/offset-aware metadata loading and disables caching for external scans.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/buffer/ChunkCache.java Disables chunk caching for external TsFile scans and routes reader acquisition accordingly.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/buffer/BloomFilterCache.java Adds external-scan bypass to avoid caching bloom filters for external TsFiles.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/RelationPlanner.java Plans read_tsfile as an ExternalTsFileScanNode with dynamically registered schema.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/UnaliasSymbolReferences.java Adds rewrite support for ExternalTsFileScanNode.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/TransformSortToStreamSort.java Extends sort optimization to recognize external TsFile scans.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/PushPredicateIntoTableScan.java Pushes time/device filters into external TsFile scans and constructs schema filters for tag predicates.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/PushAggregationIntoTableScan.java Enables aggregation pushdown analysis for external TsFile scans using analyzed schema.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/node/ExternalTsFileScanNode.java New scan node representing external TsFile scanning with per-query resource linkage.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/node/ExternalTsFileAggregationScanNode.java New aggregation scan node variant for external TsFiles.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/node/AggregationTableScanNode.java Combines external scan + aggregation into the new external aggregation scan node.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/iterative/rule/PruneTableScanColumns.java Adds pruning logic for ExternalTsFileScanNode while preserving time predicate symbols.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/planner/distribute/TableDistributedPlanGenerator.java Splits external TsFile scans/aggregations into local partitions and applies sort properties.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/metadata/TableMetadataImpl.java Routes table-function lookup to DataNode built-ins before generic factory lookup.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/TsFileSchemaCollector.java New schema collection/merge logic for external TsFiles (file/dir scanning + validation).
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ReadTsFileTableFunction.java New built-in table function: argument parsing, schema analysis, path safety checks, handle creation.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ExternalTsFileQueryResource.java New query-scoped resource managing external readers, device task partitioning, temp runs, memory accounting.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ExternalTsFileQueryDataSource.java QueryDataSource wrapper to carry the external query resource into operators.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/tvf/readTsFile/ExternalTsFileDeviceFilterVisitor.java Evaluates tag-based SchemaFilter predicates against external TsFile device IDs.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/function/DataNodeTableBuiltinTableFunction.java Registers read_tsfile as a DataNode-provided built-in table function.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/StatementAnalyzer.java Injects MPPQueryContext into ReadTsFileTableFunction during analysis.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/analyzer/predicate/schema/ConvertSchemaPredicateToFilterVisitor.java Adds BetweenPredicate conversion into SchemaFilter form (AND of bounds).
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/PlanVisitor.java Adds visitor hooks for external TsFile scan nodes/operators.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/PlanGraphPrinter.java Adds external TsFile scan details (e.g., TsFile count) to plan printing.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/DataNodeTableOperatorGenerator.java Generates external TsFile scan and aggregation scan operators and tracks resource lifetime.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/execution/QueryExecution.java Uses centralized timeout check and releases external TsFile query resources on cleanup.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/execution/config/metadata/ShowFunctionsTask.java Includes DataNode built-in table functions in SHOW FUNCTIONS output.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/SeriesScanUtil.java Refactors TTL filter update into overridable method (needed by external scan util).
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/InformationSchemaContentSupplierFactory.java Shows combined built-in table functions (commons + DataNode) in information schema iteration.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/ExternalTsFileTableScanOperator.java New operator to scan external TsFiles using device task runs and per-device offsets.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/ExternalTsFileSeriesScanUtil.java New scan util: disables TTL and loads metadata using per-device offset ranges.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/ExternalTsFileAggTableScanOperator.java New operator for aggregation scans over external TsFiles with offset-based metadata loading.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/AbstractTableScanOperator.java Makes selected fields/methods protected to enable external scan operator specialization.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/AbstractDefaultAggTableScanOperator.java Makes instance size constant protected for subclasses’ RAM accounting.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/AbstractAggTableScanOperator.java Refactors “move to next device” flow for reuse/override by external aggregation operator.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/FileLoaderUtils.java Adds offset-aware aligned metadata load path plumbing.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/AlignedSeriesScanUtil.java Adapts to new FileLoaderUtils signature by passing null offsets for normal scans.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/fragment/QueryContext.java Adds isExternalTsFileScan() hook for cache behavior control.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/fragment/FragmentInstanceContext.java Tracks/query-inits external TsFile QueryDataSource and wires QueryDataSourceType handling.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/common/MPPQueryContext.java Adds external query resource creation/cleanup and central timeout checking.
iotdb-core/datanode/src/main/i18n/zh/org/apache/iotdb/db/i18n/DataNodeQueryMessages.java Adds Chinese messages for external TsFile planning/execution/runtime errors.
iotdb-core/datanode/src/main/i18n/en/org/apache/iotdb/db/i18n/DataNodeQueryMessages.java Adds English messages for external TsFile planning/execution/runtime errors.
integration-test/src/test/java/org/apache/iotdb/relational/it/query/recent/IoTDBReadTsFileTableFunctionIT.java Adds IT coverage for reading/aggregating external TsFiles, schema merge, directory scanning, and path rejection.
integration-test/src/test/java/org/apache/iotdb/relational/it/db/it/udf/IoTDBSQLFunctionManagementIT.java Updates expected built-in function counts to include the new DataNode table function.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
D Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants