Skip to content

Support fetching very large extracts via CSV/download export (beyond query.maxLimit) #26

@jpetey75

Description

@jpetey75

Background

Follow-up to #19. That issue/PR (#25) fixed the SDK's incorrect 50k cap so the query results endpoint can now return up to the instance's configured query.maxLimit (e.g. 100,000 on Cloud).

However, the interactive query results endpoint is itself capped at query.maxLimit. Data-extraction / pipeline workflows that need millions of rows (the use case that motivated #19) can't be served by that endpoint — the Lightdash UI uses a separate CSV/download export path for those, governed by query.csvCellsLimit / query.csvMaxLimit (default ceiling 5,000,000 cells).

Proposed solution

Add an export-based fetch path to the SDK for result sets larger than query.maxLimit, e.g.:

result = query.to_df(via="export")   # or query.export() / result.download()

This would POST to the CSV/scheduler-style export endpoint, poll for the generated file, and stream it back (CSV → DataFrame), rather than paging the interactive results API.

Notes / open questions

  • Confirm the exact endpoint(s): the download//csv or scheduler export API used by the UI's "Export" / "Download results".
  • Respect query.csvCellsLimit (rows × columns) and surface a clear error when exceeded, consistent with the no-silent-truncation behaviour added in Support fetching results beyond the 50k row limit #19.
  • Decide on the surface area: a flag on to_df()/to_records() vs. a dedicated export() method.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions