Apache Impala (old posts, page 1)

Impala has proven to be a high-performance analytics query engine since the beginning. Even as an initial production release in 2013, it demonstrated performance 2x faster than a traditional DBMS, and each subsequent release has continued to demonstrate the wide performance gap between Impala’s analytic-database architecture and SQL-on-Apache Hadoop alternatives. Today, we are excited to continue that track record via some important performance gains for Impala 2.5 (with more to come on the roadmap), summarized below.

Overall, compared to Impala 2.3, in Impala 2.5:

TPC-DS queries run on average 4.3x faster.
TPC-H queries run 2.2x faster on flat tables, and 1.71x faster on nested tables.

Hug meetup impala 2.5 performance overview from Mostafa Mokhtar

Nested Types in Impala

Alex Behm Marcel Kornacker Skye Wanderman-Milne

2015-03-24 23:00

This document discusses nested data types in Impala, including structs, maps, and arrays. It provides an example schema using these types, describes Impala's SQL syntax extensions for querying nested data, and discusses techniques for advanced querying capabilities like correlated subqueries. The execution model materializes minimal nested structures in memory and uses new execution nodes to handle nested data types.

Nested Types in Impala from Cloudera, Inc.

Presented in Impala Meetup, PA, March 24th, 2015

Impala: A Modern, Open-Source SQL Engine for Hadoop

Impala Dev

2015-01-05 23:00

Presented at The Conference on Innovative Data Systems Research (CIDR) 2015.

ABSTRACT

Cloudera Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Apache Hive. This paper presents Impala from a user’s perspective, gives an overview of its architecture and main components and briefly demonstrates its superior performance compared against other popular SQL-on-Hadoop systems.

Paper | Slides

Apache Impala

All articles

Impala 2.5 performance overview

Nested Types in Impala

Impala: A Modern, Open-Source SQL Engine for Hadoop

ABSTRACT