Skip to content

Conversation

@luoyuxia
Copy link
Contributor

@luoyuxia luoyuxia commented Dec 7, 2025

Purpose

Linked issue: close #36

Brief change log

Tests

API and Format

Documentation

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for subscribing to and scanning remote log segments in Fluss, enabling the client to fetch and read log data that has been tiered to remote storage (filesystem or memory-backed storage). The implementation introduces a new I/O abstraction layer using OpenDAL for storage operations and extends the log scanner to handle both in-memory and remote log segments.

Key Changes:

  • Added I/O module with FileIO and Storage abstractions supporting filesystem and in-memory storage backends via OpenDAL
  • Implemented remote log download and processing capabilities in the scanner, allowing transparent fetching of tiered log segments
  • Modified create_log_scanner() to return Result<LogScanner> for proper error handling during scanner initialization

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
crates/fluss/src/io/mod.rs New module exposing FileIO and Storage abstractions for remote storage operations
crates/fluss/src/io/file_io.rs FileIO implementation providing input file operations and storage URL parsing
crates/fluss/src/io/storage.rs Storage enum supporting multiple backend types (memory, filesystem) with path handling
crates/fluss/src/io/storage_fs.rs Filesystem storage backend configuration using OpenDAL
crates/fluss/src/io/storage_memory.rs In-memory storage backend configuration for testing
crates/fluss/src/client/table/remote_log.rs Core remote log functionality including segment downloading, caching, and record parsing
crates/fluss/src/client/table/scanner.rs Extended log fetcher to handle remote log segments alongside in-memory records
crates/fluss/src/client/table/mod.rs Added remote_log module declaration
crates/fluss/src/error.rs New error variants for I/O operations (IoUnsupported, IoUnexpected)
crates/fluss/src/lib.rs Exposed new io module publicly
crates/fluss/src/metadata/datatype.rs Minor formatting improvements using shorthand syntax
crates/fluss/Cargo.toml Added dependencies (opendal, url, async-trait, uuid) and feature flags for storage backends
crates/fluss/tests/integration/table_remote_scan.rs New integration test verifying remote log scanning functionality
crates/fluss/tests/integration/fluss_cluster.rs Extended test cluster builder to support remote data directory mounting
crates/fluss/tests/integration/table.rs Updated to handle new create_log_scanner() Result return type and import ordering
crates/fluss/tests/test_fluss.rs Added table_remote_scan test module
crates/examples/src/example_table.rs Updated to handle new create_log_scanner() Result return type
bindings/python/src/table.rs Updated Python bindings to handle new create_log_scanner() Result return type

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow read log tiered to remote

1 participant