Impala 4.1 Change Log
Release Notes - IMPALA - Version Impala 4.1.0
New Feature
- [IMPALA-955] - Implement the BYTES built-in
- [IMPALA-2019] - Proper UTF-8 support in string functions
- [IMPALA-6505] - Min-Max predicate push down in ORC scanner
- [IMPALA-9495] - Allow Struct type in SELECT list for ORC tables
- [IMPALA-9498] - Allow array type in SELECT list for Parquet tables
- [IMPALA-9662] - Add builtin functions for masking UTF-8 strings
- [IMPALA-10166] - ALTER TABLE for Iceberg tables
- [IMPALA-10401] - Enable Ranger Audit logs in minicluster
- [IMPALA-10557] - Allow Impala to use Kudu's multi-row transaction implementation
- [IMPALA-10679] - Create SHA2 builtin function
- [IMPALA-10687] - Implement ds_cpc_union() function
- [IMPALA-10688] - Implement ds_cpc_stringify function
- [IMPALA-10689] - Implement ds_cpc_union_f() function
- [IMPALA-10730] - Create MD5 built-in function
- [IMPALA-10739] - Add support for ALTER TABLE tbl SET PARTITION SPEC for Iceberg tables
- [IMPALA-11116] - DESCRIBE HISTORY should be parameterized
Epic
Improvement
- [IMPALA-2581] - Push down LIMIT past DISTINCT
- [IMPALA-5569] - Implement UNSET TBLPROPERTIES for ALTER TABLE
- [IMPALA-5628] - Parquet support for additional valid decimal representations
- [IMPALA-6636] - Use async IO in ORC scanner
- [IMPALA-7556] - Clean up ScanRange
- [IMPALA-7635] - Reduce size of hash tables in-memory by packing buckets more densely
- [IMPALA-7954] - Support automatic invalidates using metastore notification events
- [IMPALA-8762] - Track number of running queries on all backends in admission controller
- [IMPALA-9433] - Change FileHandleCache from using a multimap to an unordered_map
- [IMPALA-9822] - Impala does not notify user that row format delimited fields is only logical when using STORED AS TEXTFILE
- [IMPALA-9857] - Batch ALTER_PARTITION events
- [IMPALA-10046] - Compile with a newer version of DWARF
- [IMPALA-10197] - Add KUDU_REPLICA_SELECTION config and query option
- [IMPALA-10429] - Add Support for specifying HDFS path in 'scratch_dirs' startup option
- [IMPALA-10489] - Implement JWT support
- [IMPALA-10650] - Bail out min/max filters in hash join builder early
- [IMPALA-10695] - impala need use an independent disk io queue for JindoFS
- [IMPALA-10702] - Add warning logs for slow or large catalogd response
- [IMPALA-10711] - Allow restricting the filesystem permissions check at startup to a particular directory
- [IMPALA-10713] - Use PARTITION-level locking for static partition INSERTs for ACID tables
- [IMPALA-10721] - MetastoreServiceHandler should extend AbstractThriftHiveMetastore
- [IMPALA-10723] - Allow basic querying and computing stats on a materialized view
- [IMPALA-10724] - Add mutable validWriteIdList
- [IMPALA-10742] - CreateColIdx2EqConjunctMap hits DCHECK in exhaustive builds
- [IMPALA-10748] - Remove enable_orc_scanner flag
- [IMPALA-10763] - Min/max filters should be enabled on Z-order sorted columns
- [IMPALA-10777] - Enable min/max filtering for Iceberg partitions.
- [IMPALA-10779] - Print the username closing a session or cancelling a query from the WebUI
- [IMPALA-10784] - Add support for retaining cookies among http requests in impala-shell
- [IMPALA-10790] - Early materialize expressions in ScanNode
- [IMPALA-10799] - Analysis slowdown with inline views and thousands of column
- [IMPALA-10801] - Check the latest compaction Id before serving request
- [IMPALA-10806] - Create single node plan is slow when hundreds of inline views are joined
- [IMPALA-10817] - Share metastoreHmsDDL lock b/w CatalogOpExecutor and Catalog metastore server
- [IMPALA-10822] - Allow multiple group returns for LDAP search bind authentication group search
- [IMPALA-10836] - Add simplify cast rule
- [IMPALA-10846] - Skip Authentication for connection with trusted auth header
- [IMPALA-10857] - Add libcurl library to Impala native-toolchain
- [IMPALA-10862] - Optimization of the code structure of TmpDir
- [IMPALA-10873] - Push down EQUALS, IS NULL and IN-list predicate to ORC reader
- [IMPALA-10874] - Upgrade impyla to the latest version
- [IMPALA-10876] - Support to download JWKS from a given URL
- [IMPALA-10879] - Add parquet stats to iceberg manifest
- [IMPALA-10894] - Pushing down predicates in reading "original files" of ACID tables
- [IMPALA-10898] - Runtime IN-list filters for ORC tables
- [IMPALA-10923] - Fine grained table refreshing at partition level events for transactional tables
- [IMPALA-10928] - Upgrade source of Kudu's EasyCurl in kudu/util
- [IMPALA-10931] - Rebase source code under be/src/kudu
- [IMPALA-10934] - Enable table definition over a single file
- [IMPALA-10941] - Revert back the change for thrift_sasl 0.4.3
- [IMPALA-10958] - Decouple getConstraintsInformation from hive.ql.metadata.Table
- [IMPALA-10961] - Implement 3-way quicksort in sorter
- [IMPALA-10967] - Load data should handle AWS NLB-type timeout
- [IMPALA-10975] - Minor refactoring in alter table DDL operation in catalogd
- [IMPALA-10984] - Improve performance of FROM_UNIXTIME function.
- [IMPALA-10994] - Normalize pip package name within 'infra/python/deps' requirements
- [IMPALA-11027] - Support for ShellBasedUnixGroupMapping for Impala's user delegation via groups
- [IMPALA-11031] - Listmap.getIndex() name is misleading
- [IMPALA-11032] - Automatic Refresh of Metadata for Local Catalog after Compaction
- [IMPALA-11037] - Bump ORC to 1.7-p4 to contain the improvement of ORC-1020
- [IMPALA-11038] - Zipping unnest should work on arrays from views
- [IMPALA-11103] - Upgrade CMake to avoid boost warning messages
- [IMPALA-11107] - Allow specifying footer size in HdfsScanner::IssueFooterRanges
- [IMPALA-11110] - Enable basic optimizations for debug builds
- [IMPALA-11123] - Optimize count(*) for ORC scans
- [IMPALA-11124] - testdata loading should reuse TPCH/TPCDS local data if they exist
- [IMPALA-11131] - Replace 'cnd_cwnd' with 'snd_cwnd' in www/rpcz.tmpl
- [IMPALA-11141] - Use exact data types in IN-list filter
- [IMPALA-11152] - impala::CheckLogSize() could log error messages indefinitely
- [IMPALA-11178] - Bump ORC to 1.7.0-p7 to contain the improvement of ORC-1122
- [IMPALA-11181] - Improving performance of compaction checking
- [IMPALA-11185] - Reuse orc::ColumnVectorBatch in the scanner life-cycle
- [IMPALA-11204] - OrcStringColumnReader should be a template class
- [IMPALA-11220] - Bump ORC version to contain the improvement of ORC-1137
- [IMPALA-11264] - Bump ORC to 1.7.0-p12 with more improvements
Bug
- [IMPALA-2272] - Parquet scanner always materializes NULL for empty collections
- [IMPALA-5256] - ERROR log files can get very large
- [IMPALA-5476] - Catalogd restart bring about metadata is out of sync
- [IMPALA-6590] - Disable expr rewrites and codegen for VALUES() statements
- [IMPALA-7560] - Better selectivity estimate for != (not equals) binary predicate
- [IMPALA-9057] - TestEventProcessing.test_insert_events_transactional is flaky
- [IMPALA-9967] - Scan orc failed when table contains timestamp column
- [IMPALA-10187] - Event processing fails on multiple events + DROP TABLE
- [IMPALA-10272] - LOAD DATA should respect Ranger-HDFS policies
- [IMPALA-10376] - Data loading of a functional-query ORC table fails with "Fail to get checksum"
- [IMPALA-10414] - Retrying failed query may cause memory leak
- [IMPALA-10433] - Use Iceberg's fixed partition transforms
- [IMPALA-10468] - DROP events which are generated while a batch is being processed may add table incorrectly
- [IMPALA-10490] - truncate table fails with IllegalStateException
- [IMPALA-10502] - delayed 'Invalidated objects in cache' cause 'Table already exists'
- [IMPALA-10626] - Add support for Iceberg's Catalogs API
- [IMPALA-10627] - Use standard Iceberg table properties
- [IMPALA-10663] - Coordinator might observe stale metadata in local catalog mode
- [IMPALA-10674] - Update toolchain ORC libary for better Iceberg support
- [IMPALA-10681] - JOIN cardinality is wrong for INNER joins when combined with aggregations
- [IMPALA-10683] - TestHdfsParquetTableWriter.test_double_precision broken on S3
- [IMPALA-10703] - PrintPath() crashes with ARRAY in ORC format
- [IMPALA-10704] - test_retry_query_result_cacheing_failed and test_retry_query_set_query_in_flight_failed are flaky
- [IMPALA-10714] - Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests
- [IMPALA-10732] - Use consistent DDL for specifying Iceberg partitions
- [IMPALA-10733] - TestKuduOperations.test_replica_selection failing
- [IMPALA-10737] - Optimize Iceberg metadata handling
- [IMPALA-10741] - Set engine.hive.enabled=true table property for Iceberg tables
- [IMPALA-10754] - test_overlap_min_max_filters_on_sorted_columns failed during GVO
- [IMPALA-10762] - ASAN tests fail with use-after-poison in HdfsParquetScanner::FindSkipRangesForPagesWithMinMaxFilters
- [IMPALA-10764] - Web UI shows error in the /logs page, if stdout/stderr is not redirected to INFO/ERROR logs
- [IMPALA-10765] - IllegalStateException when inserting empty results to unpartitioned table with event processor enabled
- [IMPALA-10802] - test_show_create_table and test_catalogs fails with Iceberg syntax error
- [IMPALA-10808] - Crash of illegal decimal schema in test_fuzz_decimal_tbl
- [IMPALA-10810] - Bump json-smart from 2.3 to at least 2.4.1
- [IMPALA-10811] - RPC to submit query getting stuck for AWS NLB forever.
- [IMPALA-10814] - Hit DCHECK in DecimalUtil::DecodeFromFixedLenByteArray for core-s3 build
- [IMPALA-10815] - Ignore events on non-default hive catalogs
- [IMPALA-10819] - Backend test unifiedbetests failed in ASAN build
- [IMPALA-10820] - TestInsertWideTable.test_insert_wide_table failed due to file size too small
- [IMPALA-10821] - TestTPCHJoinQueries.test_outer_joins failed in s3 build
- [IMPALA-10823] - Output fewer information when external frontend is used
- [IMPALA-10825] - impala crashes when canceling the retrying query
- [IMPALA-10843] - NullPointerException when test runtime filter in TRACE log level
- [IMPALA-10850] - Interpret timestamp predicates in local timezone in IcebergScanNode
- [IMPALA-10886] - TestReusePartitionMetadata.test_reuse_partition_meta fails
- [IMPALA-10896] - Tests in TestImpalaShellInteractive failed in S3 build when strict_hs2_protocol=True
- [IMPALA-10899] - buildall.sh -release_and_debug -codecoverage doesn't work as expected
- [IMPALA-10900] - Some parquet files are missing from Iceberg avro metadata
- [IMPALA-10905] - query_test/test_iceberg.py test_time_travel fails in exhaustive builds
- [IMPALA-10910] - Iceberg scans don't apply runtime filters at Parquet row group level
- [IMPALA-10914] - Iceberg query create inconsistent scan ranges between consecutive runs
- [IMPALA-10922] - test_orc_stats failing on exhaustive builds
- [IMPALA-10930] - Bump the Java artifacts version to 4.1.0-SNAPSHOT
- [IMPALA-10933] - Impala build finds system libcurl instead of toolchain version
- [IMPALA-10935] - Impala crashes on old Iceberg table property in some cases
- [IMPALA-10936] - StmtMetadataLoader::collectPolicyTables() should handle FailedLoadLocalTable without NPE
- [IMPALA-10937] - shell/make_shell_tarball.sh broken on CentOS
- [IMPALA-10942] - Fix memory leak in admission controller
- [IMPALA-10950] - expr-benchmark.cc needs some update
- [IMPALA-10955] - query_test/test_iceberg.py test_time_travel fails in S3 builds
- [IMPALA-10956] - datasketches UDFS: memory leak and merge overhead
- [IMPALA-10957] - query_test/test_scanners.py fails in test_iceberg_query with "iceberg_partitioned_orc_external_old_fileformat table not found"
- [IMPALA-10970] - Coordinator only query judgement logic should concern separated join build execution
- [IMPALA-10972] - Adapt Impala's tests to behavior change in HIVE-24920
- [IMPALA-10973] - Empty scan nodes are scheduled to the (exclusive) coordinator
- [IMPALA-10974] - Impala cannot resolve columns of converted Iceberg table
- [IMPALA-10982] - Unable to explain plan for SetOperation statement
- [IMPALA-10989] - TSAN data race during data loading
- [IMPALA-10998] - Backend test scratch-tuple-batch-test failed in ASAN build
- [IMPALA-11000] - DHECK hit in FillScratchMicroBatches
- [IMPALA-11007] - Webserver should not log errors when handling HTTP HEAD
- [IMPALA-11008] - Invalid to propagate inferred predicates into the nullable side of an outer join
- [IMPALA-11011] - Impala crashes in OrcStructReader::NumElements()
- [IMPALA-11020] - CHECK failure in AttachStdoutStderrLocked
- [IMPALA-11021] - Impala throw IllegalStateException when use predicate hint in query
- [IMPALA-11022] - Impala uses wrong file descriptors for Iceberg tables in local catalog mode
- [IMPALA-11025] - Creation of functional.insert_only_transactional_table fails wIth 'illegal location for managed table'
- [IMPALA-11028] - Table loading could fail if metastore cleans up old events
- [IMPALA-11029] - DescriptorTable.copyTupleDescriptor throw exception for Kudu table
- [IMPALA-11030] - Wrong result due to predicate pushdown into inline view with Analytic function
- [IMPALA-11035] - Make x-forwarded-for http header case insensitive
- [IMPALA-11039] - DCHECK_GE(num_buffered_values_, num_rows) fails in parquet-column-readers.cc
- [IMPALA-11042] - Special characters are not escaped during LDAP search bind authentication
- [IMPALA-11047] - Preconditions.checkNotNull(statsTuple_) fail in HdfsScanNode.java if PARQUET_READ_STATISTICS=0
- [IMPALA-11049] - Order by clause contains cast expr execute failed after 'SimplifyCastExprRule' rewrite
- [IMPALA-11051] - Add support for 'void' Iceberg partition transform
- [IMPALA-11053] - Impala should be able to read migrated partitioned Iceberg tables
- [IMPALA-11055] - load-functional-query-exhaustive-hbase-generated.create failed to run with newer HBase shell.
- [IMPALA-11072] - TestSpillingDebugActionDimensions.test_spilling is flaky
- [IMPALA-11078] - Webui should return a Content-Security-Policy header
- [IMPALA-11093] - Fine grained table refreshing doesn't refresh table file metadata
- [IMPALA-11105] - Impala crashes in PhjBuilder::Close() when Prepare() fails
- [IMPALA-11106] - Make Impala compatible with Iceberg 0.13
- [IMPALA-11109] - Exhaustive tests fail in custom_cluster.test_permanent_udfs.py
- [IMPALA-11115] - Setting compression to brotli can hit DCHECK
- [IMPALA-11118] - report_benchmark_results.py uses wrong compression codec in the per-query table
- [IMPALA-11120] - load-data.py does not load ORC files with specified codec
- [IMPALA-11133] - compare_branches.py could fail if the author of a commit contains non-ascii characters
- [IMPALA-11134] - Impala returns "Couldn't skip rows in file" error for old Parquet file
- [IMPALA-11135] - TestSpillingDebugActionDimensions.test_spilling seems to be flaky
- [IMPALA-11144] - verifyApproxCardinality() failed in testAggregationNodeGroupByCardinalityCapping
- [IMPALA-11147] - Min/max filtering crashes on Parquet file that contains partition columns
- [IMPALA-11153] - Expose LOCK_RETRIES/LOCK_RETRY_WAIT_SECONDS settings for user
- [IMPALA-11154] - Idle Kudu daemons consume too much CPU
- [IMPALA-11156] - TestHmsIntegration.test_desc_json_table failed in exhaustive build
- [IMPALA-11175] - Iceberg table cannot be loaded when partition value is NULL
- [IMPALA-11176] - Memory leak in ClientCacheHelper
- [IMPALA-11177] - crash in useAsyncIoForStream due to unknown orc::StreamKind
- [IMPALA-11182] - hdfs-orc-scanner should catch exceptions thrown from the ORC lib
- [IMPALA-11184] - Log rotation can fail if FLAGS_log_filename is set with custom value
- [IMPALA-11186] - Assertion fails in TestShowCreateTable.test_show_create_table
- [IMPALA-11195] - Disable SSL session renegotiation
- [IMPALA-11200] - Redundant additions to ExecOption field in query profile of grouping aggregator node when inside a subplan
- [IMPALA-11203] - Build failure caused by missing ExecutorMembershipSnapshot import
- [IMPALA-11210] - Impala can only handle lowercase schema elements of Iceberg table
- [IMPALA-11214] - Impala reloads Iceberg tables per each data file
- [IMPALA-11216] - test_describe_history_params is flaky
- [IMPALA-11218] - TestIcebergTable.test_table_load_time_for_many_files flaky
- [IMPALA-11227] - FE OOM in TestParquetBloomFilter.test_fallback_from_dict_if_no_bloom_tbl_props
- [IMPALA-11230] - Crash in partitioned top-N codegen'd code
- [IMPALA-11239] - test_parquet_count_star_optimization fail in downstream erasure code build
- [IMPALA-11247] - TestQueries.test_views and PlannerTest.testViews fails
- [IMPALA-11256] - SHOW FILES on Iceberg tables lists all files in table directory
- [IMPALA-11263] - Coordinator hang when cancelling a query
Task
- [IMPALA-10696] - Minor size differences breaks metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation
- [IMPALA-10701] - Switch to use TByteBuffer from thrift
- [IMPALA-10872] - Add a snapshot version of ORC-1.7 to native-toolchain
- [IMPALA-10943] - Add additional test to verify support for multiple executor groups that map to different resource groups
- [IMPALA-10992] - Planner changes for estimate peak memory.
- [IMPALA-11033] - Add support for multiple executor group sets
- [IMPALA-11130] - Postgres JDBC driver should be upgraded to 42.3.3
- [IMPALA-11149] - Upgrade xmlsec to address CVE
- [IMPALA-11197] - Upgrade pac4j to 4.5.5 to address CVEs
- [IMPALA-11198] - Exclude aws-java-sdk-bundle from ranger-plugins-audit
- [IMPALA-11229] - Upgrade spring version to 5.3.18 to address CVEs
Sub-task
- [IMPALA-7087] - Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata
- [IMPALA-8131] - Impala is unable to read Parquet decimal columns with higher scale than table metadata
- [IMPALA-8795] - Enable event polling by default in tests
- [IMPALA-9873] - Skip decoding of non-materialised columns in Parquet
- [IMPALA-10049] - Include RPC call_id in slow RPC logs
- [IMPALA-10212] - Support ofs scheme
- [IMPALA-10485] - Support Iceberg field-id based column resolution in the ORC scanner
- [IMPALA-10640] - Support reading Parquet Bloom filters - most common types
- [IMPALA-10642] - Write support for Parquet Bloom filters - most common types
- [IMPALA-10645] - Expose metrics for catalogd's HMS endpoint
- [IMPALA-10648] - Invalidate catalogd cache for non transactional tables when create/alter/drop HMS apis are accessed
- [IMPALA-10680] - Replace StringToFloatInternal that converts String to Float using fast_double_parser library
- [IMPALA-10720] - Add versioning to admission heartbeats
- [IMPALA-10746] - Create table fails after dropping same table from catalog HMS endpoint
- [IMPALA-10796] - Add Arrow library to Impala native-toolchain
- [IMPALA-10797] - Support statements that do not require reading JSON File
- [IMPALA-10840] - Add support for "FOR SYSTEM_TIME AS OF" and "FOR SYSTEM_VERSION AS OF" for Iceberg tables
- [IMPALA-10888] - getPartitionsByNames should return partitions sorted by name
- [IMPALA-10920] - UNNEST function for arrays in the select list
- [IMPALA-10940] - Pick parts of recent gutil changes from Kudu repo
- [IMPALA-10951] - Upgrade protobuf library for Impala
- [IMPALA-11004] - Upgrade glog library for Impala
- [IMPALA-11005] - Upgrade Boost library for Impala
- [IMPALA-11054] - Support resource pool polling for frontend
- [IMPALA-11076] - Re-use FileDescriptors loaded by HdfsTable during IcebergTable load
- [IMPALA-11112] - Impala can't resolve json tables created by Hive
Test
- [IMPALA-10709] - Min/max filters should be enabled for joins on sorted columns in Parquet tables
- [IMPALA-10768] - Deflake CatalogHmsFileMetadataTest
- [IMPALA-11137] - ORC and Avro testdata on date_tbl are unusable
- [IMPALA-11192] - test_scanner_fuzz.py runs super slow on ORC format
- [IMPALA-11236] - Upgrade ehcache sizeof library from 0.3.0 to 0.4.0
Documentation
- [IMPALA-10788] - Statestore Scalability document should mention statestore_subscriber_timeout_secs and statestore_heartbeat_tcp_timeout_seconds
- [IMPALA-11043] - Create documentation about Impala's Iceberg support
- [IMPALA-11119] - Document BYTES builtin function
- [IMPALA-11127] - Document the UTF8_MODE query option and relavent string functions
Question
- [IMPALA-11040] - Query get stuck when containing multiple nested union