Impala 2.10 Change Log
New Feature
- [IMPALA-992] - rerun past queries from history
- [IMPALA-2373] - Extrapolate the number of rows in a scan based on the rows/byte ratio
- [IMPALA-2525] - Impala CREATE TABLE LIKE PARQUET breaks on ENUM logical type
- [IMPALA-3504] - function for current timestamp in UTC, i.e. utc_timestamp()
- [IMPALA-4622] - Support changing Kudu default and storage attributes
- [IMPALA-5489] - Improve Sentry authorization for Kudu tables
- [IMPALA-5529] - Add additional function signatures for TRUNC()
- [IMPALA-5546] - Add syntax for creating an unpartitioned kudu table
- [IMPALA-5600] - Small cleanups left over from IMPALA-5344
Improvement
- [IMPALA-1382] - Wasted space in buffered-tuple-stream in presence of many NULL tuples
- [IMPALA-2167] - Remove the old (unpartitioned) HJ and AGG nodes
- [IMPALA-2689] - Log every time codegen is disabled due to NYI
- [IMPALA-3200] - Replace BufferedBlockMgr with new buffer pool
- [IMPALA-3937] - Deprecate --be_service_threads
- [IMPALA-4086] - Write micro-benchmark for SimpleScheduler
- [IMPALA-4407] - Incorporate impala-setup repo into main Impala repo
- [IMPALA-4623] - Parquet Scanner - reduce NN RPC
- [IMPALA-4794] - Impala's count(distinct ...) plans are not robust to data skew
- [IMPALA-4833] - Use scheduling information to make per-node memory reservation tight
- [IMPALA-5009] - Clean up test_insert_parquet.py
- [IMPALA-5016] - Missed opportunities for static partition pruning with COALESCE()
- [IMPALA-5061] - Populate null_count in parquet::statistics in the parquet table writer
- [IMPALA-5109] - Increase plan fragment startup histogram max latency to > 20000ms
- [IMPALA-5167] - Reduce number of Kudu clients that get created
- [IMPALA-5240] - Allow configuration of # of disk I/O threads independently for solid-state and spinning disks
- [IMPALA-5263] - support CA bundles when running stress test against SSL'd Impala
- [IMPALA-5280] - Coalesce chains of OR conditions to an IN predicate.
- [IMPALA-5350] - Build threads should include fragment ID in their names
- [IMPALA-5389] - Clarify lifetime of DiskIoMgr::BufferDescriptor objects
- [IMPALA-5433] - Mark Status c'tors as explicit
- [IMPALA-5480] - Missing filters message isn't great
- [IMPALA-5481] - RowDescriptors should be shared, rather than copied
- [IMPALA-5483] - Consider automatically disabling codegen for entire query based on planner estimates
- [IMPALA-5495] - Improve error message if neither --is_coordinator nor --is_executor is set
- [IMPALA-5498] - Support for partial sorts
- [IMPALA-5500] - Reduce catalog topic size when --compact_catalog_topic is enabled
- [IMPALA-5506] - Help information of query_file option in impala-shell misses stdin description
- [IMPALA-5507] - Help information of KEYVAL option in impala-shell is not clear enough
- [IMPALA-5511] - Add process start time to debug web page
- [IMPALA-5547] - Improve join cardinality estimation with a more robust FK/PK detection
- [IMPALA-5560] - Consider always storing CHAR() slots inline
- [IMPALA-5572] - Support timestamp codegen for text scanner
- [IMPALA-5573] - Support decimal codegen in text scanner
- [IMPALA-5612] - Join inversion should avoid reducing the degree of parallelism
- [IMPALA-5616] - Add a new flag --enable_minidumps
- [IMPALA-5643] - Report the number of currently running and total number of started threads per group in /threadz
- [IMPALA-5644] - Fail queries early when their minimum reservation is too high to execute within the given mem_limit
- [IMPALA-5658] - Report details of process memory maps via metrics
- [IMPALA-5659] - glog / gflags should be dynamically linked if Impala is
- [IMPALA-5666] - Use manual poisoning for ASAN with new buffer pool
- [IMPALA-5670] - Remove redundant c'tor code from ExecEnv
- [IMPALA-5688] - Speed up a couple of heavy-hitting expr-tests
- [IMPALA-5696] - Enable cipher configuration when using TLS w/Thrift
- [IMPALA-5709] - Remove mini-impala-cluster
- [IMPALA-5716] - Switching to / from distcc can delete cmake_modules/*
- [IMPALA-5743] - Allow for configuration of TLS / SSL versions
- [IMPALA-5745] - Make breakpad dump_syms handle more than 4096 memory regions
- [IMPALA-5852] - Improve MINIMUM_RESERVATION_UNAVAILABLE message
Bug
- [IMPALA-1470] - Client gets NullPointerException when catalog service is down
- [IMPALA-1478] - Improve error message when subquery is used in the ON clause
- [IMPALA-1882] - Remove ORDER BY restriction from first_value()/last_value()
- [IMPALA-1891] - Statestore sends deletions with initial non-delta topic
- [IMPALA-2418] - Increase the number of characters dedicated to the Detail column in query profile
- [IMPALA-2826] - Outer join w/ old HJ returns wrong results if 0 outer tbl cols referenced
- [IMPALA-3487] - stress test didn't fail on hash mismatch errors
- [IMPALA-3496] - stress test output doesn't indicate the impala version
- [IMPALA-3894] - unix_timestamp date conversion for 2-digit years is broken
- [IMPALA-3931] - Support aggregate functions with arbitrary fixed-size intermediate type
- [IMPALA-4039] - Increase width of Operator column in query summary to match the widest entry
- [IMPALA-4162] - Extensive logging in HDFS NameNode during metadata load when dfs.namenode.acls.enabled=false
- [IMPALA-4226] - Make breakpad dump_syms handle more than 4096 threads and memory regions
- [IMPALA-4276] - Non-default query options not always populated in runtime profile.
- [IMPALA-4418] - Extra blank lines in query result
- [IMPALA-4483] - run-backend-tests.sh does not work with ninja
- [IMPALA-4666] - Remove thirdparty from search dir for toolchain deps
- [IMPALA-4737] - If minidumps are disabled, SIGUSR1 goes unhandled and will crash the process
- [IMPALA-4795] - TCatalogObjectFromObjectName is broken for functions
- [IMPALA-4861] - READ_WRITE warning thrown for source URI on CREATE TABLE LIKE PARQUET
- [IMPALA-4862] - Planner's peak resource estimates do not accurately reflect the behaviour of joins and unions in the backend
- [IMPALA-4866] - Hash join node does not apply limits correctly
- [IMPALA-4892] - Include the session ID in the "Invalid session ID" error message
- [IMPALA-4965] - EXPLAIN output blocked by Sentry but appears in query profile
- [IMPALA-4990] - run-tests.py --update_results doesn't work
- [IMPALA-5056] - Impala fails to recover from statestore connection loss while waiting for metadata
- [IMPALA-5104] - admission control memory check shouldn't fail queries with estimates equal to MEM_LIMIT
- [IMPALA-5108] - idle_session_timeout kicks in later than expected
- [IMPALA-5116] - ex/hash_map in gutil is deprecated
- [IMPALA-5221] - Fix TSaslTransport negotiation order
- [IMPALA-5223] - HBase/Zookeeper continues to be flaky on RHEL7
- [IMPALA-5236] - Negative byte values are not printed with the correct unit
- [IMPALA-5275] - Avoid printing Status stack trace on hot paths
- [IMPALA-5281] - concurrent_select.py still not failing when there is a results mismatch
- [IMPALA-5282] - Insert into a table partitioned by two column with clustered hint from a partitioned source table fails with IllegalStateException: null
- [IMPALA-5283] - Handle case sensitivity naming conflicts in Kudu tables
- [IMPALA-5286] - Query with Kudu col name w/ different casing from 'order by' fails
- [IMPALA-5327] - Handle return value and exception of JNI GetStringUTFChars()
- [IMPALA-5336] - Inconsistent results when comparing string and timestamp fields
- [IMPALA-5344] - Frontend tests do not work with Java 8
- [IMPALA-5352] - File handle cache needs timeout based eviction
- [IMPALA-5355] - Sentry Privileges and roles updated in the wrong order on impala restart
- [IMPALA-5363] - DCHECK hit in BlockingJoinNode: DCHECK_EQ(probe_batch_->num_rows(), 0);
- [IMPALA-5364] - Number of fragments reported in the web-ui is incorrect
- [IMPALA-5369] - Old pom parent in testdata module
- [IMPALA-5377] - IMPALAD Crashed With the impala starting large number of JDBC accessing
- [IMPALA-5386] - disk-io-mgr-handle-cache.inline.h:124] Check failed: release_it != range.second
- [IMPALA-5400] - Tests added to subplans.test never get executed.
- [IMPALA-5407] - Crash SIGSEGV in DeepCopyVarlenData while inserting into sequencefile
- [IMPALA-5412] - Scan returns wrong partition-column values when scanning multiple partitions pointing to the same filesystem location.
- [IMPALA-5420] - Check If HDFS ACLs are enabled before trying to get the ACLs
- [IMPALA-5423] - test_file_modifications fails on local filesystem and isilon
- [IMPALA-5424] - test_breakpad.py can fail if /tmp/minidumps is missing
- [IMPALA-5427] - beeswax get_state() can return EXCEPTION before error is visible via get_log()
- [IMPALA-5431] - Calling FileSystem.Exists() twice in a row for the same partition adds unnecessary latency to metadata loading
- [IMPALA-5432] - SetMemLimitExceeded DCHECK no longer valid
- [IMPALA-5435] - test_basic_filters failed on ASAN
- [IMPALA-5437] - Codegen for Trunc() of timestamp takes far too long
- [IMPALA-5438] - Union with constant exprs inside a subplan returns inconsistent results
- [IMPALA-5446] - Return Status from Sorter::Reset() is dropped
- [IMPALA-5452] - Nested subplans with non-trivial plan tree returns inconsistent results
- [IMPALA-5453] - TestCreateTableLikeFile fails on 'enum.parquet'
- [IMPALA-5454] - JVM metrics don't show up on /memz sometimes
- [IMPALA-5455] - test infra cannot always contact TLS-enabled CM
- [IMPALA-5457] - OOM during profile serialization leads to crash
- [IMPALA-5462] - Unhandled exception in RuntimeProfile::ToThrift() leads to abort()
- [IMPALA-5469] - IllegalStateException while processing catalog update in the Impalad
- [IMPALA-5477] - PPC port broke minidump-2-core in the toolchain
- [IMPALA-5479] - Propagate the argument 'type' for RawValue::Compare()
- [IMPALA-5482] - single_node_perf_run.py can fail to checkout when testing a patch that modifies testdata/workloads
- [IMPALA-5484] - LICENCE issues discovered in IPMC vote
- [IMPALA-5487] - Race in runtime-profile.cc::toThrift() can lead to corrupt profiles being generated while query is running
- [IMPALA-5488] - TestValidateMetrics fails sometimes when running tests on Isilon
- [IMPALA-5492] - There is an error in impala-shell introduction when using LDAP
- [IMPALA-5494] - NOT IN predicate shares the same selectivity as IN predicate
- [IMPALA-5497] - Right anti, right outer and full outer hash joins sometimes do not flush resources early enough
- [IMPALA-5499] - session-expiry-test failed because of conflicting ephemeral ports
- [IMPALA-5504] - wrong results with LEFT JOIN, inline view, and COALESCE()
- [IMPALA-5513] - When input invalid KEYVAL in impala-shell, the show message is abnormal
- [IMPALA-5514] - When only with ldap_password_cmd option, has invalid parameter, impala-shell runs successfully
- [IMPALA-5520] - TopN node does not reuse string memory
- [IMPALA-5524] - NullPointerException during planning with DISABLE_UNSAFE_SPILLS=TRUE
- [IMPALA-5527] - Create a nested testdata flattener for the query generator
- [IMPALA-5530] - Sentry broke Impala compilation
- [IMPALA-5531] - Scalar subquery with correlated inequality predicate returns wrong results
- [IMPALA-5532] - Don't heap-allocate compressor objects in RowBatch
- [IMPALA-5536] - TCLIService thrift compilation is broken on Hive 2
- [IMPALA-5537] - Impala does not retry RPCs that fail in SSL_read()
- [IMPALA-5539] - Reading timestamps from Kudu are wrong with -use_local_tz_for_unix_ts
- [IMPALA-5540] - Latest version of Sentry fails to connect
- [IMPALA-5549] - Remove deprecated fields from catalog's thrift API return types
- [IMPALA-5551] - test_failpoint.py fails in legacy join and agg tests
- [IMPALA-5553] - expr-test fails in release builds
- [IMPALA-5558] - Query hang after coordinator crash because DoRpc(ReportExecStatus) fails and is not retried
- [IMPALA-5562] - Query involving nested array and limit 0 hits IllegalStateException
- [IMPALA-5567] - Race in fragment instance teardown can lead to use-after-free in MemTracker::AnyLimitExceeded()
- [IMPALA-5571] - numerous test_grant_revoke failures
- [IMPALA-5576] - Wrong Cancel() in QueryState::ReportExecStatusAux() can lead to coordinator hang
- [IMPALA-5579] - GetSchemas throws IndexArrayOutOfBoundsException if a table can't be loaded
- [IMPALA-5580] - Java UDF: return null STRING incorrectly converted to empty string
- [IMPALA-5582] - Sentry privileges assigned to objects defined in upper case can get deleted from the catalog
- [IMPALA-5585] - BE tests for last_day() not being called
- [IMPALA-5586] - Null-aware anti-join can take a long time to cancel
- [IMPALA-5588] - test_rpc_secure_recv_timed_out: TypeError
- [IMPALA-5591] - Set statement handling in frontend can't handle negative numbers
- [IMPALA-5592] - DataStreamSender doesn't appear to be compressing the payload
- [IMPALA-5594] - Impala should not reference shaded classes from Kudu jar
- [IMPALA-5595] - Impala shouldn't set KuduScanner timestamp feature flag unless necessary
- [IMPALA-5598] - ExecQueryFInstances RPC recv side timeouts (observed in stress test in insecure+release build)
- [IMPALA-5602] - All predicates pushed to Kudu with limit runs incorrectly as 'small query'
- [IMPALA-5611] - KuduPartitionExpr holds onto memory unnecessarily
- [IMPALA-5615] - Compute Incremental stats is broken for general partition expressions
- [IMPALA-5623] - lag() on STRING cols may hold memory until query end
- [IMPALA-5627] - Various dropped statuses in HDFS writers
- [IMPALA-5630] - Add a string metric to expose the Kudu client version
- [IMPALA-5636] - Impala writer claims that file uses BIT_PACKED encoding when it doesn't
- [IMPALA-5638] - Alter table set tblproperties inconsistency for 'external'
- [IMPALA-5640] - Enable test coverage for Parquet gzip inserts was disabled
- [IMPALA-5641] - mem-estimate should never be less than mem-reservation
- [IMPALA-5648] - Count star optimisation regressed Parquet memory estimate accuracy
- [IMPALA-5650] - COUNT(*) optimization causes a crash when legacy aggregation is enabled
- [IMPALA-5657] - FunctionCallExpr.toSql() and clone() ignore "IGNORE NULLS" case
- [IMPALA-5679] - Count star optimization gives incorrect result for parquet table partitioned by STRING column
- [IMPALA-5686] - Update to Sentry causes build failures
- [IMPALA-5689] - Query with no right join fails with an error that mentions a right join
- [IMPALA-5691] - test_low_mem_limit_q18 is flaky
- [IMPALA-5708] - Test failure with invalid GetExecSummary; potential coord. race
- [IMPALA-5722] - Converting a string decimal with a large negative exponent causes a crash
- [IMPALA-5725] - coalesce() not being fully applied with outer joins on kudu tables
- [IMPALA-5733] - Kudu tservers seem to be unresponsive after TestKuduMemLimits
- [IMPALA-5739] - sles12 SP2 is not correctly detected by bootstrap_toolchain.py
- [IMPALA-5742] - Memory leak in parquet-reader
- [IMPALA-5749] - Race in coordinator hits DCHECK on 'num_remaining_backends_ > 0'
- [IMPALA-5751] - impala query KUDU response authentication token signing key expired
- [IMPALA-5756] - Impala crashes on startup in impala::SpinLock::lock()
- [IMPALA-5759] - Switch to long key ids in KEYS file
- [IMPALA-5760] - Flaky test: query_test/test_udfs.py
- [IMPALA-5769] - Minidumps need periodic cleanup when triggered by SIGUSR1
- [IMPALA-5772] - Expected failure in test_scratch_disk.TestScratchDir didn't occur
- [IMPALA-5773] - Memory limit exceeded on test_spilling.py
- [IMPALA-5774] - StringFunctions::FindInSet() may read one byte beyond a string's extent
- [IMPALA-5775] - Impala shell only supports TLSv1
- [IMPALA-5776] - HdfsTextScanner::WritePartialTuple() writes the varlen data to an incorrect memory pool
- [IMPALA-5778] - Clarify logging around and usage of read_size startup option
- [IMPALA-5781] - thrift-server-test failed
- [IMPALA-5784] - Separate planner-set query options from user set ones in the profile
- [IMPALA-5787] - Dropped Status in KuduTableSink::Send()
- [IMPALA-5788] - Spilling aggregation crashes when grouping by nondeterministic expression
- [IMPALA-5795] - BackendConfig::LookUpBackendDescriptor() may fail to look up non-executor coordinator
- [IMPALA-5796] - CTAS for Kudu table and expr rewrite analysis exception
- [IMPALA-5797] - Expected failure in test_scratch_disk.TestScratchLimit didn't occur
- [IMPALA-5798] - ASAN use-after-poison in Parquet decoder
- [IMPALA-5799] - INSERTs into Kudu can crash if a column is dropped concurrently
- [IMPALA-5800] - Configure Squeasel's TLS version / ciphers
- [IMPALA-5809] - test_breakpad.py failing on exhaustive builds
- [IMPALA-5815] - Right outer join returns reference to unpinned memory
- [IMPALA-5819] - HdfsTextScanner::Close hitting DCHECK(boundary_column_.IsEmpty())
- [IMPALA-5824] - TestSpilling.test_spilling_aggs failing: SpilledPartitions: 0 (0)
- [IMPALA-5825] - TSSLSocket factory may throw uncaught exception
- [IMPALA-5829] - Insert KUDU Table ERROR: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
- [IMPALA-5838] - Suggested MEM_LIMIT in rejected query error may be too low
- [IMPALA-5840] - Don't write page level statistics in Parquet files in anticipation of page indexes
- [IMPALA-5850] - Partitioned hash join inside union may return wrong results
- [IMPALA-5855] - Preaggregation crashes - unable to initialise hash table
- [IMPALA-5857] - Crash in impala::DiskIoMgr::ScanRange::Close-> tc_free while running concurrent TPC-DS
- [IMPALA-5866] - Min reservation error message is slightly inaccurate
Sub-task
- [IMPALA-2708] - Partitioned aggregation node repartitions when spilled partition could fit in memory
- [IMPALA-3205] - Validate and fix spilling performance and memory usage of new buffer pool
- [IMPALA-3208] - Backend support for large rows
- [IMPALA-3748] - Compute memory reservation in planner and claim atomically in Prepare()
- [IMPALA-3905] - Single-threaded scan node
- [IMPALA-4174] - Planner incorrectly estimates cardinality for many to many joins
- [IMPALA-4669] - Add Kudu's RPC, util and security libraries
- [IMPALA-4674] - Port spilling ExecNodes to new buffer pool
- [IMPALA-4687] - Get Impala working against HBase 2.0 APIs
- [IMPALA-4695] - Planner incorrectly estimates cardinality for multi column joins
- [IMPALA-4703] - Add reservation stress option for test coverage
- [IMPALA-4905] - Fragments always report insert status, even if not insert query
- [IMPALA-4925] - Coordinator does not cancel fragments if query completes w/limit
- [IMPALA-5036] - Improve COUNT(*) performance of Parquet scans.
- [IMPALA-5085] - Backend support for large rows in BufferedTupleStream
- [IMPALA-5093] - Rare failure to decode LZ4 batch
- [IMPALA-5136] - Running 48 concurrent Q17 queries against TPC-DS 1TB queries fail with Cannot process row that is bigger than the IO size (row_size=1.55 GB, null_indicators_size=0)
- [IMPALA-5138] - Running 32 concurrent queries from TPC-DS Q31 caused a crash in "impala::BufferedTupleStream::CopyStrings (this=0x7f182c9b4440, tuple=0x7f15aa008000, string_slots=...) buffered-tuple-stream.cc:840"
- [IMPALA-5158] - Account for difference between process memory consumption and memory used by queries
- [IMPALA-5160] - Queries with a large number of small joins regress in terms of memory usage due to memory reservation
- [IMPALA-5345] - Under stress, some TransmitData() RPCs are not responded to
- [IMPALA-5554] - crash in impala::Sorter::Run::ConvertOffsetsToPtrs
- [IMPALA-5566] - Crash in impala::DataStreamSender::DataStreamSender
- [IMPALA-5570] - Ensure that NAAJ works with spilling enabled and disabled.
- [IMPALA-5575] - BufferPoolTest ConcurrentRegistration is racy
- [IMPALA-5618] - Performance regresses on buffer pool dev branch for high-ndv aggregations
- [IMPALA-5622] - Ensure test coverage for spilling disabled for all spilling operators
- [IMPALA-5629] - list::size() in BufferedTupleStreamV2::AdvanceWritePage() is expensive
- [IMPALA-5661] - Consider allowing configuration of buffer pool size
- [IMPALA-5667] - Race in DataStreamSender could cause TransmitData sidecar corruption
- [IMPALA-5676] - BufferedTupleStreamV2::CheckConsistency() is too slow for large streams with small pages in Debug build
- [IMPALA-5677] - Consider ways to reduce the accumulation of clean pages when executing large spilling queries
- [IMPALA-5681] - Eagerly release reservation in blocking nodes
- [IMPALA-5713] - Consider always reserving memory for grouping pre-aggregations
- [IMPALA-5714] - Openssl 1.0.0 shared library support for legacy platform
- [IMPALA-5757] - Order-dependent comparison fails in query_test/test_kudu.py::TestShowCreateTable::()::test_properties
- [IMPALA-5810] - Consider reducing RESERVATION_MIN_MEM_REMAINING
- [IMPALA-5823] - SET_DENY_RESERVATION_PROBABILITY debug action is not always effective
Task
- [IMPALA-3265] - Create a metrics to track spilling per operator
- [IMPALA-5428] - update external hadoop ecosystem versions
- [IMPALA-5652] - Add deprecation warning for "unlimited" process mem_limit
- [IMPALA-5744] - Add dummy 'use_krpc' flag and create DataStream interface
Test
- [IMPALA-5390] - Add test that file handle cache is disabled for S3,ADLS,Isilon
- [IMPALA-5779] - Add test with spillable buffer size > --read_size
- [IMPALA-5780] - Add missing test coverage for disable_unsafe_spills
- [IMPALA-5830] - Add regression test for IMPALA-5823: fix SET_DENY_RESERVATION_PROBABILITY