Impala learned some new tricks while living on Iceberg

Apache Impala is a high-performance distributed query engine specifically designed for lightning fast read operations. Apache Iceberg is a groundbreaking table format that introduces advanced features such as partition and schema evolution, data partitioning through transform functions, time-travel capabilities, and row-level deletes. Recognizing the significance of Iceberg, the Impala team has invested tremendous development efforts to provide comprehensive support for its features. This talk will cover the following key points:

  • A brief overview of how Impala integrates with Iceberg
  • The limitations of traditional table formats
  • How Iceberg addresses these existing challenges and unlocks new possibilities
  • An exploration of Impala’s new features and some insights into their implementation

Iceberg has transformed Impala from being solely a query engine optimized for read operations into a robust data warehouse engine. With Iceberg, Impala now extends its support to include ACID write operations and table maintenance functions, enabling it to fulfill the role of a comprehensive and versatile data warehouse engine. Join us for this session to gain valuable insights into the integration between Impala and Iceberg, and discover the new functionalities of Impala.

[slides]

Appeared in https://communityovercode.org/past-sessions/community-over-code-na-2023/