Skip to content

feat: Impl IncrementalAppendScan#590

Open
WZhuo wants to merge 4 commits intoapache:mainfrom
WZhuo:incremental_append_scan
Open

feat: Impl IncrementalAppendScan#590
WZhuo wants to merge 4 commits intoapache:mainfrom
WZhuo:incremental_append_scan

Conversation

@WZhuo
Copy link
Contributor

@WZhuo WZhuo commented Mar 12, 2026

No description provided.

@WZhuo WZhuo force-pushed the incremental_append_scan branch 3 times, most recently from 9b822d0 to 4c4ffff Compare March 13, 2026 03:45
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was generated by Gemini.

@WZhuo WZhuo force-pushed the incremental_append_scan branch 2 times, most recently from d1bacdf to 5c41a78 Compare March 23, 2026 03:30
@WZhuo WZhuo force-pushed the incremental_append_scan branch from 5c41a78 to 3f97b02 Compare March 24, 2026 03:07
@WZhuo WZhuo force-pushed the incremental_append_scan branch from 3f97b02 to 8e2ad99 Compare March 24, 2026 06:10
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was generated by Gemini.

Summary & Recommendation

  • Request Changes: There are logic bugs in handling empty table snapshots and a potential hashing/equality discrepancy for manifest files that need to be addressed.

// to the two-arg virtual PlanFiles() override in the concrete subclass.
// Defined as a friend to access the protected two-arg PlanFiles().
template <typename ScanTaskType>
Result<std::vector<std::shared_ptr<ScanTaskType>>> ResolvePlanFiles(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic / Parity Issue: scan.metadata()->Snapshot() returns a NotFound error if the table has no current snapshot (e.g., table is empty), rather than a successful result with nullptr. ICEBERG_ASSIGN_OR_RAISE will immediately propagate this NotFound error, making the if (current_snapshot == nullptr) check unreachable. This breaks parity with Java, which handles empty tables gracefully by returning an empty iterable.

Suggested fix: Check the ID directly: if (scan.metadata()->current_snapshot_id == kInvalidSnapshotId) { return ...; } or use scan.snapshot() which properly resolves to nullptr.

(Note: The same issue exists on line 214 in ToSnapshotIdInclusive, where ICEBERG_ASSIGN_OR_RAISE will propagate NotFound before the custom error message can be triggered).


namespace std {
template <>
struct hash<iceberg::ManifestFile> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style / Logic Issue: The added std::hash<iceberg::ManifestFile> specialization only hashes manifest_path. However, ManifestFile::operator== defaults to member-wise equality. If an unordered_set (like the one used in IncrementalAppendScan::PlanFiles) encounters two ManifestFile objects with the same path but slightly different metadata fields, they will have the same hash but fail equality, resulting in duplicates being inserted. In Java, GenericManifestFile uses only manifestPath for both hashCode() and equals().

Suggested fix: Modify ManifestFile::operator== to only compare manifest_path to align with Java, or provide a custom equality functor specifically for the unordered_set<ManifestFile>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants