Impala 2.12 Change Log
New Feature
Improvement
- [IMPALA-2782] - Allow Impala Shell to connect directly to impalad when config with proxy load balancer and kerberos
- [IMPALA-3651] - Implement murmur_hash() UDF
- [IMPALA-3998] - Remove refresh_after_connect option from impala-shell
- [IMPALA-4132] - Consider using -fno-omit-frame-pointer in release builds
- [IMPALA-4168] - Adopt Oracle-style hint placement for INSERT statements
- [IMPALA-4456] - Address scalability issues of qs_map_lock_ and client_request_state_map_lock_
- [IMPALA-4953] - Prevent large statestore updates from head-of-line blocking subsequent updates to different topics
- [IMPALA-4993] - Add support for dictionary filtering on nested fields
- [IMPALA-5052] - Read and write signed integer logical type metadata in Parquet
- [IMPALA-5058] - Improve concurrency of DDL/DML operations during catalog updates
- [IMPALA-5478] - run test_tpcds_queries with DECIMAL_V2=true
- [IMPALA-5519] - Allocate fragment instance runtime filter memory from BufferPool
- [IMPALA-5654] - Disallow managed Kudu table to explicitly set Kudu tbl name in CREATE TABLE
- [IMPALA-5721] - stress test: save profiles during binary search
- [IMPALA-5801] - Clean up codegen GetType() interface
- [IMPALA-5848] - Account for TCMalloc overhead and client cache buffers in MemTracker
- [IMPALA-5990] - End-to-end compression of metadata
- [IMPALA-6059] - Enhance ltrim() and rtrim() functions to trim any set of characters
- [IMPALA-6075] - Add Impala daemon metric for catalog version
- [IMPALA-6113] - Skip row groups with predicates on NULL columns
- [IMPALA-6128] - Spill-to-disk Encryption(AES-CFB + SHA256) can be a performance bottleneck while IO is getting faster
- [IMPALA-6177] - Clean up handcrafted IRs if they encounter error during creation
- [IMPALA-6222] - Make it easier to root-cause "failed to get minimum memory reservation" error
- [IMPALA-6387] - test_breakpad.py::test_sigusr1_writes_minidump fails on exhaustive build
- [IMPALA-6424] - REFRESH right after invalidate metadata <table> loads file metadata twice
- [IMPALA-6437] - increase frequency of admission control topic updates
- [IMPALA-6497] - Impala should expose when the last row is fetched
- [IMPALA-6519] - Allow atomic allocation of an unreserved buffer
- [IMPALA-6627] - Document Hive-incompatible behavior with the serialization.null.format table property
- [IMPALA-6629] - Clearer and more concise logging during catalog topic updates
- [IMPALA-6641] - Support more separators between date and time in default timestamp format
- [IMPALA-6655] - Set owner information on database creation
- [IMPALA-6675] - Change default configuration to --compact_catalog_topic=true
- [IMPALA-6682] - Support hash function other than md5 in pypi download script
- [IMPALA-6747] - Add a diagnostics collection script
- [IMPALA-6779] - Impala Doc: Improve REPLICA_PREFERENCE doc
- [IMPALA-6805] - Show current database in Impala shell prompt
- [IMPALA-6809] - bootstrap_system.sh does an unconditional git clone to ~/Impala
- [IMPALA-6820] - Remove builtins db from catalogd
- [IMPALA-6822] - Provide a query option to not shuffle on distinct exprs
- [IMPALA-6850] - Print the actual error message to the console when Sentry fails
- [IMPALA-6878] - SentryServicePinger should not print stacktrace at every retry
Sub-task
- [IMPALA-2397] - Use atomic operations for simple numeric metrics
- [IMPALA-3562] - Extend "compute stats" syntax to support a list of columns
- [IMPALA-4874] - Increase maximum KRPC message size
- [IMPALA-5054] - Enable KRPC TLS in Impala
- [IMPALA-5310] - Implement TABLESAMPLE for COMPUTE STATS
- [IMPALA-5518] - Allocate KrpcDataStreamRecvr RowBatch tuples from BufferPool
- [IMPALA-5528] - tcmalloc contention much higher with concurrency after KRPC patch
- [IMPALA-5557] - Disable rpc_default_keepalive_time_ms
- [IMPALA-5948] - Conflicting port 29000 with Sentry
- [IMPALA-6024] - Add minimum sample size for COMPUTE STATS TABLESAMPLE
- [IMPALA-6116] - Bound memory usage of KRPC service queue
- [IMPALA-6190] - Add a debug webpage to show fragment instances of a query
- [IMPALA-6193] - Track RPC allocated memory in a memtracker
- [IMPALA-6219] - Use AES-GCM for spill-to-disk encryption when CLMUL instruction is present and performant
- [IMPALA-6228] - More flexible configuration of stats extrapolation
- [IMPALA-6246] - Add timeline information to fragment instances in profile
- [IMPALA-6268] - KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
- [IMPALA-6269] - [observability] Add KRPC metrics to /rpcz and /metrics debug webpages
- [IMPALA-6290] - Simplify ScannerContext buffer management to only use one I/O buffer at a time.
- [IMPALA-6346] - Potential deadlock in KrpcDataStreamMgr
- [IMPALA-6347] - Monitor queue depth size for outgoing RPCs for Reactor threads
- [IMPALA-6356] - Excessive synchronous logging in RpczStore::LogTrace causes severe slowdown for exchange operators spanning 2-3 minutes
- [IMPALA-6416] - Extend Thread::Create to track fragment instance id automatically based on parent's fid
- [IMPALA-6430] - Log a detailed error message on failure of MetricVerifier
- [IMPALA-6432] - Default rpc_negotiation_timeout_ms may cause queries to fail on large clusters
- [IMPALA-6448] - Re-enable kerberized testing with KRPC
- [IMPALA-6457] - Impala 2.12 Doc: DECIMAL support for Kudu tables
- [IMPALA-6459] - Doc: TABLESAMPLE for COMPUTE STATS
- [IMPALA-6464] - Impala 2.12 & 3.0 Docs: Extend "compute stats" syntax to support a list of columnss
- [IMPALA-6477] - rpc-mgr-kerberized-test fails on CentOS 6.4
- [IMPALA-6482] - Add query option for query time limit
- [IMPALA-6483] - Impala 3.0 & 2.12 Docs: Describe the query option for query time limit
- [IMPALA-6508] - Allow tests to run with Krpc
- [IMPALA-6510] - Impala 2.12 Doc: Remove the refresh_after_connect option from INVALIDATE METADATA statemement
- [IMPALA-6512] - test_exchange_delays does not work with KRPC enabled
- [IMPALA-6514] - Impala 2.12 & 3.0 Docs: Allow Impala Shell to connect directly to impalad when config with proxy load balancer and kerberos
- [IMPALA-6538] - Fix read path when Parquet min(_value)/max(_value) statistics contain NaN
- [IMPALA-6542] - Fix inconsistent write path of Parquet min/max statistics
- [IMPALA-6546] - Impala 2.12 & 3.0 Docs: Document the new ODBC scalar functions
- [IMPALA-6554] - Check failed: consumption_->current_value() == 0 (126 vs. 0) KrpcDataStreamRecvr
- [IMPALA-6565] - Stress test with KRPC enabled shows inconsistent results for some queries
- [IMPALA-6576] - Add metric for Data Stream Service Queue memory consumption
- [IMPALA-6592] - Fix test gap - no test of handling for invalid/unsupported Parquet codec
- [IMPALA-6594] - Various tests failing (some flaky) with memory limit exceeded
- [IMPALA-6609] - Some COUNTER_ADD() in KrpcDataStreamRecvr may lead to use-after-free
- [IMPALA-6623] - Impala 2.12 Doc: Update ltrim and rtrim functions
- [IMPALA-6624] - Network error: failed to write to TLS socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c
- [IMPALA-6651] - Impala 2.13 & 3.0 Docs: Fine-grained privileges
- [IMPALA-6652] - KRPC : Data Stream Manager Deferred RPCs in memz page should be renamed
- [IMPALA-6667] - Impala 2.12 Doc: Enable file handle cache by default
- [IMPALA-6669] - Remove NeedsSeedingForBatchedReading()
- [IMPALA-6685] - Improve profile in KrpcDataStreamRecvr and KrpcDataStreamSender
- [IMPALA-6688] - Impala 2.12 & 3.0 Docs: Change default configuration to --compact_catalog_topic=true
- [IMPALA-6691] - KRPC w/ kerberos fails on SLES11
- [IMPALA-6723] - Impala 2.12 & 3.0 Docs: Support insert plan hints for CREATE TABLE AS SELECT
- [IMPALA-6726] - Catalog server's kerberos ticket gets deleted after 'ticket_lifetime' on SLES11
- [IMPALA-6728] - The combination use_kudu_kinit=false and use_krpc=true crashes Impalad
- [IMPALA-6748] - Impala 2.12 & 3.0 Docs: Support more separators between date and time in default timestamp format
- [IMPALA-6807] - Known Issues needs to be updated to reflect resolution of HDFS-12528
- [IMPALA-6867] - Impala 2.12 & 3.0 Docs: Provide a query option to not shuffle on distinct exprs
- [IMPALA-6868] - Impala 3.0 Doc: Remove old kinit code for Impala 3
Bug
- [IMPALA-2567] - KRPC milestone 1
- [IMPALA-2642] - Potential deadlock in statestore error path
- [IMPALA-3887] - Planner tests failing due to metadata loading race with HDFS, fewer #hosts than expected
- [IMPALA-3942] - After creating a view that uses regexp_replace we are getting the following error: ERROR: AnalysisException: Failed to load metadata for table:
- [IMPALA-4315] - USE <db> statement throws auth error if user only has column privileges
- [IMPALA-4323] - It is not possible to set row format through alter table
- [IMPALA-4664] - Impala shell can accidentally convert certain literal strings to lowercase
- [IMPALA-5014] - DECIMAL V2 round when casting to/from DECIMAL, part 2
- [IMPALA-5017] - Error on decimal overflow (rather than warn)
- [IMPALA-5139] - Make mvn-quiet.sh write its output to a logfile
- [IMPALA-5152] - Frontend requests metadata for one table at a time in the query
- [IMPALA-5269] - Comment on Final Line of Query Breaks Parsing
- [IMPALA-5270] - Crash with ORDER BY in OVER clause with RANDOM
- [IMPALA-5754] - rand() algorithm is very non-random
- [IMPALA-5909] - File handle cache causes HDFS to log excessive errors when trying to unbuffer files
- [IMPALA-5929] - Remove useless explicit casts to string
- [IMPALA-6008] - Creating a UDF from a shared library with a .ll extension crashes Impala
- [IMPALA-6081] - TestRuntimeFilters fails due to runtime profile missing portions
- [IMPALA-6092] - Flaky test: query_test/test_udfs.py still happening
- [IMPALA-6215] - Race between lib_cache and java udf class loading
- [IMPALA-6231] - Do some fuzz testing of decimal v2 operations
- [IMPALA-6242] - Flaky test: TimerCounterTest.CountersTestOneThread
- [IMPALA-6258] - Uninitialized tuple pointers in row batch for empty rows
- [IMPALA-6275] - Successful CTAS logs warning
- [IMPALA-6295] - Inconsistent handling of 'nan' and 'inf' with min/max analytic fns
- [IMPALA-6296] - DCheck in CodegenSymbolEmitter when --asm_module_dir is set on debug build
- [IMPALA-6297] - Remove partition/sort from Kudu INSERT for unpartitioned tables
- [IMPALA-6300] - Decimal modulo sometimes returns incorrect results
- [IMPALA-6307] - A CTAS query fails with error: AnalysisException: Duplicate column name: <columnName>
- [IMPALA-6318] - Test suite may hang on test_query_cancellation_during_fetch
- [IMPALA-6319] - Allocation/Dealloc mismatch unique_ptr param seems to be overwritten (mem-leak)
- [IMPALA-6332] - Impala webserver should return HTTP error code for missing query profiles
- [IMPALA-6334] - test_compute_stats_tablesample failing on Isilon builds
- [IMPALA-6348] - Redact only sensitive fields in runtime profile
- [IMPALA-6353] - Crash in snappy decompressor
- [IMPALA-6355] - dcheck failure for decimal asan tests
- [IMPALA-6362] - Queries don't make progress due to what seems like a memory reservation deadlock while running the stress tests
- [IMPALA-6363] - cscope build step seems racy, breaks compilation
- [IMPALA-6364] - Lock contention in FileHandleCache results in >2x slowdown for remote HDFS reads
- [IMPALA-6368] - test_chars.py races with itself when run in parallel
- [IMPALA-6370] - Crash when querying nested data in partitioned Parquet table
- [IMPALA-6371] - Not correctly validating unicode delimiters.
- [IMPALA-6377] - Bump breakpad version to include the fix for Breakpad #752
- [IMPALA-6381] - test_exchange_delays.py failed on isilon build due to sender timeout
- [IMPALA-6382] - Impalad crashes on SELECT query when spill buffer is set on certain values
- [IMPALA-6383] - Memory from previous row groups can accumulate in Parquet scanner
- [IMPALA-6384] - RequestPoolService doesn't honor custom user -> group mapping overrides in HDFS config
- [IMPALA-6386] - Dataload can fail due to "invalidate metadata" concurrent with DDLs
- [IMPALA-6388] - UnionNode sets the number of nodes incorrectly
- [IMPALA-6392] - Explain format for parquet predicate statistics should be consistent with predicates
- [IMPALA-6394] - GVO failed in "Waiting for HDFS replication"
- [IMPALA-6399] - test_query_profile_thrift_timestamps failure on exhaustive runs
- [IMPALA-6405] - There is no error under Decimal v2 when there is an overflow when casting String to Decimal
- [IMPALA-6414] - Impalad binary failed to start with SIGSEGV with GPerfTools 2.6.3 on certain platforms
- [IMPALA-6418] - Find a reliable way to detect supported TLS versions
- [IMPALA-6419] - hdfs-parquet-scanner.cc:624] Check failed: 0 == context_->NumStreams() (0 vs. 11)
- [IMPALA-6427] - Planner test expected output drops QUERYOPTIONS sections
- [IMPALA-6435] - Codegen crash when UNIONing CHAR(n) literals
- [IMPALA-6440] - Impala cannot read / write HBase tables when metadata is created with newer versions of Hive
- [IMPALA-6449] - Use CLOCK_MONOTONIC in ConditonVariable
- [IMPALA-6450] - Hit DCHECK in RuntimeProfile::EventSequence::Start()
- [IMPALA-6451] - Creating a Kudu table with CTAS fails with AuthorizationException: User 'username' does not have privileges to access: server1
- [IMPALA-6454] - CTAS into Kudu fails with mixed-case partition and/pr primary key column names
- [IMPALA-6455] - test_partition_metadata_compatibility flaky failures
- [IMPALA-6466] - Impala 2.12 Doc: Document clustered/noclustered hint for inserts into Kudu table
- [IMPALA-6476] - TestKrpcMemUsage.test_krpc_deferred_memory_usage fails on release build
- [IMPALA-6478] - NativeAddPendingTopicItem prints garbage into log
- [IMPALA-6484] - Crash in impala::RuntimeProfile::SortChildren
- [IMPALA-6485] - BE compilation failure: error: ‘EVP_CTRL_GCM_SET_IVLEN’ was not declared in this scope
- [IMPALA-6486] - INVALIDATE METADATA may hang after statestore restart
- [IMPALA-6488] - Crash in LibCache::GetCacheEntryInternal()
- [IMPALA-6489] - ASAN use-after-poison in impala::HdfsScanner::InitTupleFromTemplate
- [IMPALA-6495] - targeted-perf tests broken by column alias change
- [IMPALA-6498] - test_query_profile_thrift_timestamps causes following tests to fail
- [IMPALA-6500] - Impala crashes randomly on different queries with GROUP BY
- [IMPALA-6511] - Fix event "Open Finished" in state machine in FragmentInstanceState::UpdateState()
- [IMPALA-6515] - Impala Doc: HAproxy with sticky session requires the check option
- [IMPALA-6516] - Avoid logging during catalog update if the catalog version didn't change
- [IMPALA-6517] - bootstrap_toolchain.py fails to recognize lsb_release output from RHEL
- [IMPALA-6526] - Regression: query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling
- [IMPALA-6527] - NaN values lead to incorrect filtering under certain circumstances
- [IMPALA-6530] - Track time spent opening HDFS file handles
- [IMPALA-6543] - Limit RowBatch serialization size to INT_MAX
- [IMPALA-6549] - Enable file handle cache by default
- [IMPALA-6553] - Impala Doc: load_catalog_in_background default changed to false
- [IMPALA-6567] - Dataload performance regression due to slow invalidate metadata
- [IMPALA-6571] - NullPointerException in SHOW CREATE TABLE for HBase tables
- [IMPALA-6577] - TestQueryExpiration::test_concurrent_query_expiration failing
- [IMPALA-6579] - Data load failing with Error opening Kudu table 'impala::tpch_kudu.lineitem', Kudu error: The table does not exist: table_name: "impala::tpch_kudu.lineitem"
- [IMPALA-6580] - Cannot load Kudu functional data on localfs build
- [IMPALA-6582] - Flaky test: TestImpalaShellInteractive::test_multiline_queries_in_history
- [IMPALA-6583] - Various tests fail with missing database or table from catalog
- [IMPALA-6584] - TestKuduOperations::test_column_storage_attributes broken on exhaustive build
- [IMPALA-6585] - test_low_mem_limit_q21 flaky under ASAN
- [IMPALA-6586] - FrontendTest.TestGetTablesTypeTable failing on some builds
- [IMPALA-6588] - test_compute_stats_tablesample failing with "Cancelled"
- [IMPALA-6589] - Fuzz test on parquet table crashes impala
- [IMPALA-6595] - Hit crash freeing buffer in NljBuilder::Close()
- [IMPALA-6599] - Log spam: ImpaladCatalog.java:525] NativeLibCacheSetNeedsRefresh(hdfs://localhost:20500/test-warehouse/test-udfs.ll) failed.
- [IMPALA-6601] - ASAN memcpy-param-overlap in impala::RawValue::Write during RowBatchSerializeTest.RowBatchLZ4Success
- [IMPALA-6602] - TestQueryExpiration.test_query_expiration fails on Isilon with FINISHED rather than EXCEPTION state
- [IMPALA-6603] - Fix output for cherry-picking job
- [IMPALA-6606] - date_trunc with MILLENNIUM returns 2000 instead of 2001
- [IMPALA-6613] - Change TEST_KRPC to DISABLE_KRPC
- [IMPALA-6619] - Alter table recover partitions creates unneeded partitions when faces percent sign
- [IMPALA-6635] - IllegalStateException when applying an "in" predicate on a Kudu DECIMAL col
- [IMPALA-6638] - File handle cache shows contention when cold
- [IMPALA-6654] - [DOCS] Kudu/Sentry docs are out of date
- [IMPALA-6670] - Executor-only impalads do not refresh their lib-cache entries
- [IMPALA-6683] - Restarting the Catalog without restarting Impalad and SS can block topic updates
- [IMPALA-6687] - Upsert fails on Kudu table with upper case primary key and default value
- [IMPALA-6690] - Invalid syntax in pip_download.py due to a recent patch
- [IMPALA-6694] - BufferPool appears misaligned in query profile
- [IMPALA-6695] - Builds fail with pkg_resources.VersionConflict
- [IMPALA-6697] - Setuptools 39.0.0 does not work with Python 2.6
- [IMPALA-6710] - Docs around INSERT into partitioned tables are misleading
- [IMPALA-6715] - stress test is double-counting TPCDS queries
- [IMPALA-6717] - common_query_options are not used in binary search phase of stress test
- [IMPALA-6719] - refresh function case sensitivity
- [IMPALA-6722] - Local build failing due to missing libTestUdfs.so
- [IMPALA-6731] - Build failed with distutils.errors.DistutilsError: Could not find suitable distribution
- [IMPALA-6739] - Exception in ALTER TABLE SET statements
- [IMPALA-6743] - bump 2.x's version
- [IMPALA-6752] - import kudu fails in python on Ubuntu 16.04
- [IMPALA-6759] - stress test can't parse explain string with petabyte memory estimates (really)
- [IMPALA-6774] - SyntaxError: invalid syntax diagnostics/collect_diagnostics.py
- [IMPALA-6785] - Staring an Impalad on an already running cluster may result in inconsistent cluster subscription
- [IMPALA-6792] - Appears to be a memory leak in orphaned fragments
- [IMPALA-6793] - Metadata doesn't recover after restarting statestore
- [IMPALA-6806] - TLS certificate with Intermediate CA in server cert file fails with KRPC
- [IMPALA-6811] - test_exchange_delays failed
- [IMPALA-6824] - Crash in RuntimeProfile::EventSequence::AddNewerEvents() when events_ is empty
Test
- [IMPALA-5440] - Add planner tests with extreme statistics values.
Task
- [IMPALA-6509] - Impala Doc: Enabling haproxy for secure impala restricts connection to individual nodes
- [IMPALA-6622] - Backport parts of IMPALA-4924 to 2.x
- [IMPALA-6732] - Impala 2.12 Doc: Release Notes
- [IMPALA-6736] - stress test --filter-query-mem-ratio doesn't work
- [IMPALA-6886] - Impala Doc: Remove Impala Cluster Sizing doc
Documentation
- [IMPALA-6523] - There is misinformation in the PARQUET_FALLBACK_SCHEMA_RESOLUTION