Impala 3.2.0 Change Log
New Feature
- [IMPALA-5050] - Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner
- [IMPALA-6503] - Support reading complex types from ORC format files
- [IMPALA-7127] - Fetch-on-demand metadata for the impalad-side catalog
- [IMPALA-7645] - Allow configuring default file format via query option
- [IMPALA-7759] - Add Levenshtein edit distance built-in function
- [IMPALA-7795] - Add a command to refresh authorization data
- [IMPALA-7832] - Support IF NOT EXISTS in alter table add columns
- [IMPALA-7941] - Determine process memory limit from cgroups limit, if set
Improvement
- [IMPALA-2343] - Capture operator timing information covering open/close & first/last batch close
- [IMPALA-3819] - Block locality metadata may be stale message is misleading
- [IMPALA-4123] - Columnar decoding in Parquet scanner
- [IMPALA-5043] - Admission control error messages don't hint that information is stale when disconnected from statestore
- [IMPALA-5847] - Some query options do not work as expected in .test files
- [IMPALA-5872] - Implement a SQL test case builder for gathering query diagnostics
- [IMPALA-6533] - Support DECIMAL for min-max runtime filters
- [IMPALA-6656] - Metrics for time spent in BufferAllocator
- [IMPALA-6662] - Make stress test resilient to hangs due to client crashes
- [IMPALA-6664] - Tag log statements with query-ids
- [IMPALA-6741] - Profiles of running queries should tell last update time of counters
- [IMPALA-6742] - Profiles of running queries should include execution summary
- [IMPALA-6897] - Catalog server web-ui should expose top-n tables with most number of files
- [IMPALA-6924] - Compute stats profiles should include reference to child queries
- [IMPALA-6964] - Track stats about column and page sizes in Parquet reader
- [IMPALA-7183] - We should print the sender name when logging a report for an unknown status report on the coordinator
- [IMPALA-7265] - Cache remote file handles
- [IMPALA-7367] - Pack StringValue, CollectionValue and TimestampValue slots
- [IMPALA-7497] - Consider reintroducing numNulls count in compute stats
- [IMPALA-7565] - Extends TAcceptQueueServer connection_setup_pool to be multi-threaded
- [IMPALA-7568] - Implement timezone aware parquet stat filtering for timestamp columns
- [IMPALA-7657] - Proper codegen for TupleIsNullPredicate, IsNotEmptyPredicate and ValidTupleId
- [IMPALA-7679] - Improve error message and add test for creating table with NULL column
- [IMPALA-7694] - Add CPU resource utilization (user, system, iowait) timelines to profiles
- [IMPALA-7731] - Add ratio between scanned and transmitted bytes to fragment instances
- [IMPALA-7738] - Implement timeouts for hdfsOpenFile() calls
- [IMPALA-7761] - Add multiple count distinct to targeted stress and targeted perf
- [IMPALA-7764] - Add test coverage for SentryProxy
- [IMPALA-7807] - Analysis test fixture to enable deeper testing
- [IMPALA-7819] - Scanners should include "storage wait time" per scan node
- [IMPALA-7839] - Refactor Catalog::toCatalogObjectKey and CatalogObject::getUniqueName to reduce code repetition
- [IMPALA-7842] - Refactor FrontEnd to make plan fragments available for testing
- [IMPALA-7853] - Add support to read int64 NANO timestamps to the parquet scanner
- [IMPALA-7869] - Split up parquet-column-readers.cc for readability and compile time
- [IMPALA-7871] - Don't load Hive builtin jars for dataload
- [IMPALA-7881] - Visualize AST for easier debugging
- [IMPALA-7889] - Write new logical types in Parquet
- [IMPALA-7902] - Revise NumericLiteral to avoid analysis, fix multiple issues
- [IMPALA-7903] - DCHECK in RawValue::PrintValue() hit with VLOG level 3
- [IMPALA-7914] - Introduce AST base class/interface for statement-like nodes
- [IMPALA-7915] - Wrap SQL parser to avoid redundant code
- [IMPALA-7919] - Add predicates line in plan output for partition key predicates
- [IMPALA-8021] - Add estimated cardinality to EXPLAIN output
- [IMPALA-8023] - Fix PlannerTest to handle error lines consistently
- [IMPALA-8034] - PlannerTest cardinality tests are not realistic
- [IMPALA-8047] - Add support for the .proto file extension to .clang-format
- [IMPALA-8092] - Add a debug page to provide better observability for admission control
- [IMPALA-8095] - Detailed expression cardinality tests
- [IMPALA-8135] - Bump maven-surefire-plugin version to at least 2.19 to support running a single parameterized test
- [IMPALA-8147] - Merge make_impala.sh into CMake and buildall.sh
- [IMPALA-8148] - Misc. FE code cleanup
- [IMPALA-8162] - Add memory reserved and memory admitted per backend to the /backends debug page
- [IMPALA-8170] - Impala Doc: Add the Special Considerations for running SSL/TLS and a proxy
- [IMPALA-8177] - Log DDL exceptions in the coordinator log [supportability]
- [IMPALA-8181] - Show abbreviated row counts in DESCRIBE output
- [IMPALA-8187] - UDF/UDA samples should explicitly export entry points
- [IMPALA-8203] - Revisit disable_codegen docs
- [IMPALA-8223] - Remove mem-pool.total-bytes and hash-table.total-bytes metrics
- [IMPALA-8259] - Rerun bin/create-test-configuration.sh automatically if files are missing
- [IMPALA-8261] - create-test-configuration.sh should not fail when FE has not been built
- [IMPALA-8272] - test_catalog_tablesfilesusage failing
Bug
- [IMPALA-341] - Remote profiles may be ignored by coordinator if query has a limit
- [IMPALA-941] - Impala Parser issue when using fully qualified table names that start with a number.
- [IMPALA-1048] - Data sinks do not show up in the exec summary
- [IMPALA-3323] - impala-shell --ldap_password_cmd has no config file equivalent
- [IMPALA-5397] - Set "End Time" earlier rather than on unregistration.
- [IMPALA-5474] - Adding a trivial subquery turns error into warning
- [IMPALA-5861] - HdfsParquetScanner::GetNextInternal() IsZeroSlotTableScan() case double counts
- [IMPALA-6293] - Shell commands run by Impala can fail when using the Java debugger
- [IMPALA-6521] - Hidden flags should show up in /varz and should be printed in the logs during startup
- [IMPALA-6591] - TestClientSsl hung for a long time
- [IMPALA-6900] - Invalidate metadata operation is ignored at a coordinator if catalog is empty
- [IMPALA-6910] - Multiple tests failing on S3 build: error reading from HDFS file
- [IMPALA-6955] - Debug webpage request for unknown query ID crashes Impala in GetClientRequestState
- [IMPALA-7107] - [DOCS] Review docs for storage formats impala cannot insert into
- [IMPALA-7214] - Update Impala docs to reflect coordinator/executor separation and decoupling from DataNodes.
- [IMPALA-7446] - Queries can spill earlier than necessary because of accumulation of free buffers and clean pages
- [IMPALA-7473] - RawValue::PrintValue() hits general protection fault with VLOG level 3
- [IMPALA-7659] - Collect count of nulls when collecting stats
- [IMPALA-7790] - Kudu tests fail when with use_hybrid_clock=false
- [IMPALA-7804] - Various scanner tests intermittently failing on S3 on different runs
- [IMPALA-7809] - test_concurrent_schema_change incompatible with Kudu 1.9
- [IMPALA-7810] - query-state.cc:295] Check failed: profile_buf == nullptr
- [IMPALA-7828] - test_mem_leak() is flaky
- [IMPALA-7829] - Send the final profile only after all fragment instances have been closed
- [IMPALA-7837] - SCAN_BYTES_LIMIT="100M" test failing to raise exception in release build
- [IMPALA-7848] - Enable ParserTest.TestAdminFns
- [IMPALA-7851] - TestFetchFirst::test_query_stmts_v6 hang during core asan build
- [IMPALA-7852] - test_hash_join_timer flakiness on s3 testing build
- [IMPALA-7857] - Log more information about statestore failure detector
- [IMPALA-7863] - To solve Gmail Server Error 007
- [IMPALA-7864] - TestLocalCatalogRetries::test_replan_limit is flaky
- [IMPALA-7870] - TestAutomaticCatalogInvalidation.test_v1_catalog intermittently fails
- [IMPALA-7873] - TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit
- [IMPALA-7882] - ASAN failure in llvm-codegen-test
- [IMPALA-7893] - Impala shell does not handle Ctrl+C correctly for a non-running query
- [IMPALA-7895] - Incorrect expected results for spillable-buffer-sizing.test
- [IMPALA-7905] - ToSqlUtils does not correctly quote lower-case Hive keywords
- [IMPALA-7907] - Multiple toSql() bugs in ScalarFunction
- [IMPALA-7913] - test-with-docker can sometimes fail with a ccache fatal error when using CentOS 6 within the containers
- [IMPALA-7925] - test_bloom_filters and test_hdfs_scanner_profile running out of memory during exhaustive tests
- [IMPALA-7926] - test_reconnect failing
- [IMPALA-7928] - Investigate consistent placement of remote scan ranges
- [IMPALA-7929] - Impala query on HBASE table failing with InternalException: Required field*
- [IMPALA-7931] - test_shutdown_executor fails with timeout waiting for query target state
- [IMPALA-7934] - Switch to using Java 8's Base64 impl for incremental stats encoding
- [IMPALA-7939] - Impala shell not displaying results for a CTE query.
- [IMPALA-7943] - Bump up the default timeout set on impala-shell
- [IMPALA-7945] - test_hdfs_timeout.py fails on Centos6/python2.6
- [IMPALA-7946] - SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit
- [IMPALA-7960] - wrong results when comparing timestamp casted to varchar of smaller length to a string literal in a binary predicate
- [IMPALA-7961] - Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast
- [IMPALA-7963] - test_empty_build_joins failed with hdfs timeout
- [IMPALA-7978] - Impala Doc: Clarify Impala Memory requirements
- [IMPALA-7989] - Impala cluster kill failing with python ImportError
- [IMPALA-7990] - Failing assert in TestFailpoints .test_lifecycle_failures
- [IMPALA-7992] - test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
- [IMPALA-7994] - Queries hitting memory limit issues in release builds
- [IMPALA-8007] - test_slow_subscriber is flaky
- [IMPALA-8008] - InvocationTargetException when making HMS RPCs
- [IMPALA-8026] - Actual row counts for nested loop join are way too high while the query is executing
- [IMPALA-8043] - ExprTest fails on Ubuntu 16 when the timezone is America/Los_Angeles
- [IMPALA-8058] - HBase scan cardinality division-by-zero leads to bogus cardinality
- [IMPALA-8061] - S3_ACCESS_VALIDATED unbound variable when using TARGET_FILESYSTEM=s3
- [IMPALA-8062] - single_node_perf_run.py doesn't re-source impala-config.sh on different branches
- [IMPALA-8063] - Excessive logging from BeeswaxConnection::get_state() bloats JUnitXML output
- [IMPALA-8064] - test_min_max_filters is flaky
- [IMPALA-8069] - crash in impala::Sorter::Run::Run
- [IMPALA-8073] - SentryProxy.testAddCatalog() failed in private build because of socket error
- [IMPALA-8078] - test_corrupt_stats failing on exhaustive builds
- [IMPALA-8089] - Sporadic upstream jenkins failures with "ERROR in bin/run-all-tests.sh at line 237: pkill -P $TIMEOUT_PID"
- [IMPALA-8090] - DiskIoMgrTest.SyncReadTest hits file_ != nullptr DCHECK in LocalFileReader::ReadFromPos()
- [IMPALA-8091] - minicluster kudu failure: Cannot initialize clock: failed to wait for clock sync
- [IMPALA-8093] - Profiles prefix counters inconsistently
- [IMPALA-8103] - Plan hints show up as "--" comments in analysed query
- [IMPALA-8113] - test_aggregation and test_avro_primitive_in_list fail in S3
- [IMPALA-8114] - Build test failure in test_breakpad.py
- [IMPALA-8118] - ASAN build failure: query_test/test_scanners.py
- [IMPALA-8129] - Build failure: query_test/test_observability.py
- [IMPALA-8137] - Order by docs incorrectly state that order by happens on one node
- [IMPALA-8140] - Grouping aggregation with limit breaks asan build
- [IMPALA-8142] - ASAN build failure in query_test/test_nested_types.py
- [IMPALA-8150] - AuditingTest.TestAccessEventsOnAuthFailure
- [IMPALA-8151] - HiveUdfCall assumes StringValue is 16 bytes
- [IMPALA-8154] - Disable auth_to_local by default
- [IMPALA-8163] - Local catalog mode needs to be visible on catalogd web UI if turned on
- [IMPALA-8168] - On S3 Sentry HDFS sync should be disabled
- [IMPALA-8169] - update some random query generator infra settings
- [IMPALA-8171] - stress test doesn't work against minicluster
- [IMPALA-8173] - run-workload.py KeyError on 'query_id'
- [IMPALA-8175] - centos6: tests_minicluster_obj fails with pgrep usage error
- [IMPALA-8178] - Tests failing with “Could not allocate memory while trying to increase reservation” on EC filesystem
- [IMPALA-8183] - TestRPCTimeout.test_reportexecstatus_retry times out
- [IMPALA-8188] - Some SSDs are not properly detected as non-rotational
- [IMPALA-8189] - TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails
- [IMPALA-8191] - TestBreakpadExhaustive.test_minidump_creation fails to kill cluster
- [IMPALA-8193] - junitxml_prune_notrun.py fails on Centos6 passing xml_declaration=True to ElementTree
- [IMPALA-8194] - TestPauseMonitor.test_jvm_pause_monitor_logs_entries needs to wait longer to see output
- [IMPALA-8195] - Impala Doc: Impala does support Cartesian join
- [IMPALA-8199] - stress test fails: "No module named RuntimeProfile.ttypes"
- [IMPALA-8200] - Builds fail using wrong branch of impala-lzo
- [IMPALA-8207] - Fix query loading in run-workload.py
- [IMPALA-8209] - Fragment instance ID no longer displayed on /memz
- [IMPALA-8212] - Crash during startup in kudu::security::CanonicalizeKrb5Principal()
- [IMPALA-8214] - Bad plan in load_nested.py
- [IMPALA-8222] - Timeout calculation in stress test doesn't make sense
- [IMPALA-8234] - TUnit enum (part of profile format) was reordered
- [IMPALA-8235] - AdmissionControlTimeSinceLastUpdate TIME_MS counter breaks some profile consumers
- [IMPALA-8239] - Check failed: deferred_rpcs_.empty() || (num_deserialize_tasks_pending_ + num_pending_enqueue_) > 0
- [IMPALA-8243] - ConcurrentModificationException in Catalog stress tests
- [IMPALA-8244] - Toolchain build fails to publish binaries even if asked to do so
- [IMPALA-8245] - Add hostname to timeout error message in HdfsMonitoredOps
- [IMPALA-8247] - Backend tests from bit-stream-utils-test.cc and system-state-info-test.cc are not running
- [IMPALA-8249] - End-to-end test framework doesn't read aggregated counters properly
- [IMPALA-8251] - Impala fails to start for TestExchangeDeferredBatches.test_exchange_small_buffer() under release build
- [IMPALA-8252] - Impala writes malformed thrift profiles
- [IMPALA-8254] - Compute stats fails if COMPRESSION_CODEC is not default
- [IMPALA-8256] - ImpalaServicePool::RejectTooBusy() should print more meaningful message
- [IMPALA-8257] - Parquet writer sometimes hits DCHECK when handling empty string
- [IMPALA-8264] - system-state-info.cc:102] Check failed: total_tics > 0 (-4294962910 vs. 0)
- [IMPALA-8274] - Missing update to index into profiles vector in Coordinator::BackendState::ApplyExecStatusReport()
- [IMPALA-8299] - GroupingAggregator::Partition::Close() may access an uninitialized hash table
- [IMPALA-8300] - Build failed on S3: test_max_nesting_depth (table_format: orc/def/block) timeouts consistently
Task
- [IMPALA-5605] - document how to increase thread resource limits
- [IMPALA-6932] - Simple LIMIT 1 query can be really slow on many-filed sequence datasets
- [IMPALA-7728] - Impala Doc: Add the Changing Privileges section in Impala Sentry doc
- [IMPALA-7924] - Generate Thrift 11 Python Code
- [IMPALA-7980] - High system CPU time usage (and waste) when runtime filters filter out files
- [IMPALA-8060] - Impala Doc: Clean up and re-org resource management and admission control docs
- [IMPALA-8098] - [Docs] Document incompatible changes to :shutdown command
- [IMPALA-8102] - Impala/HBase recommendations need update
- [IMPALA-8111] - Document workaround for some authentication issues with KRPC
- [IMPALA-8133] - Impala Doc: Review the list of Known Issues and update for 3.2
- [IMPALA-8250] - Impala crashes with -Xcheck:jni
- [IMPALA-8298] - Update docs of ORC about complex types support
- [IMPALA-8308] - Impala 3.2 Release Notes
Sub-task
- [IMPALA-4063] - Make fragment instance reports per-query (or per-host) instead of per-fragment instance.
- [IMPALA-4555] - Don't cancel query for failed ReportExecStatus (done=false) RPC
- [IMPALA-4889] - Use sidecars for Thrift-wrapped RPC payloads
- [IMPALA-7213] - Port ReportExecStatus() RPCs to KRPC
- [IMPALA-7353] - Fix bogus too-high memory estimates
- [IMPALA-7468] - Port CancelQueryFInstances() to KRPC
- [IMPALA-7477] - Improve QueryResultSet interface to allow appending a batch of rows at a time
- [IMPALA-7718] - [DOCS] document changes to explain output from IMPALA-5821
- [IMPALA-7725] - Impala Doc: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner
- [IMPALA-7811] - Add flag to count JVM memory against process limit
- [IMPALA-7920] - Impala 3.2 Doc: Doc Levenshtein edit distance built-in function
- [IMPALA-7948] - Create docker container for impalad/statestored/catalogd
- [IMPALA-7970] - Add support for automatic invalidates by polling metastore events
- [IMPALA-7972] - Detect self-events to avoid unnecessary invalidates
- [IMPALA-7974] - Impala Doc: Doc the options to enable automatic invalidates using metastore notification events
- [IMPALA-7975] - Improve supportability of the automatic invalidate feature
- [IMPALA-7976] - Add a flag to disable sync using events at a table level
- [IMPALA-7979] - Enhance decoders to support value-skipping
- [IMPALA-7985] - Port RemoteShutdown() to KRPC
- [IMPALA-7986] - Extend start-impala-cluster.py to start and stop daemon docker containers
- [IMPALA-7987] - Get start-impala-cluster.py to start up a usable minicluster
- [IMPALA-7988] - Support loading data into a dockerised minicluster
- [IMPALA-7999] - Sort out what to do with bin/start-*d.sh functionality
- [IMPALA-8044] - Impala Doc: Document the command to refresh authorization data
- [IMPALA-8066] - Create coordinator and executor containers
- [IMPALA-8067] - Impala Doc: Doc CPU resource utilization (user, system, iowait) timelines in Profile
- [IMPALA-8071] - Change an initial set of backend tests to use a unified executable
- [IMPALA-8096] - Limit on #rows returned from query
- [IMPALA-8099] - Update Impala build infrastructure to support Apache Ranger
- [IMPALA-8104] - Impala Doc: Doc IF NOT EXISTS in ALTER TABLE
- [IMPALA-8105] - Impala Doc: Document remote file handle cache
- [IMPALA-8134] - Update docs to reflect CGroups memory limit changes
- [IMPALA-8153] - Impala Doc: Add a section on Admission Debug page to Web UI doc
- [IMPALA-8172] - Impala Doc: Doc the query option to limit on #rows returned from query
- [IMPALA-8186] - Automate setup of docker bridge network for dockerised minicluster
- [IMPALA-8233] - Do not re-download Ranger if it is already downloaded
- [IMPALA-8240] - Event processor should keep trying if metastore is unavailable
- [IMPALA-8255] - Impala Doc: Document the query option for default file format
- [IMPALA-8266] - Event filtering logic may not filter all the events
- [IMPALA-8273] - Change metastore configuration template so that table parameters do not exclude impala specific properties
- [IMPALA-8278] - Fix MetastoreEventsProcessorTest flakiness
- [IMPALA-8296] - Impala Doc: Doc the supportability metrics of the automatic metadata invalidate feature
- [IMPALA-8297] - Impala Doc: Doc the flag to disable sync using events at the table level
Test
- [IMPALA-7148] - test_profile_fragment_instances is flaky
- [IMPALA-7625] - test_web_pages.py backend tests are failing
- [IMPALA-7648] - Add tests for all cases where OOM is expected