The Impala version covered by this documentation library contains the following incompatible changes. These are things such as file format changes, removed features, or changes to implementation, default configuration, dependencies, or prerequisites that could cause issues during or after an Impala upgrade.
Even added SQL statements or clauses can produce incompatibilities, if you have databases, tables, or columns whose names conflict with the new keywords. See Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.
For the full list of incompatible changes introduced in this release, see the release notes for Impala 4.0.
fs.s3a.block.size
startup flag when calculating
the split size on non-block based stores, e.g. S3, ADLS, etc.
Starting in this release, Impala planner uses the
PARQUET_OBJECT_STORE_SPLIT_SIZE
query option to
get the Parquet file format specific split size.For Parquet
files, the fs.s3a.block.size
startup flag is no
longer used.
The default value of the
PARQUET_OBJECT_STORE_SPLIT_SIZE
query option is
256 MB.
When you create a table, the default format for that table data is now Parquet.
For backward compatibility, you can use the DEFAULT_FILE_FORMAT query option to set the default file format to the previous default, text, or other formats.
For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 3.2.
SHUTDOWN
commandThe
SHUTDOWN
command for shutting down a remote
server used the backend port in Impala 3.1. Starting in Impala 3.2,
the command uses the KRPC port, e.g.
:shutdown('host100:27000')
.
For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 3.1.
For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 3.0.
For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 2.12.
For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 2.11.
For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 2.10.
For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for Impala 2.9.
Llama support is removed completely from Impala.
Related flags (--enable_rm
) and query options (such
as V_CPU_CORES
) remain but do not have any effect.
If --enable_rm
is passed to
Impala, a warning is printed to the log on startup.
The syntax related to Kudu tables includes a number of
new reserved words, such as COMPRESSION
,
DEFAULT
, and ENCODING
, that
might conflict with names of existing tables, columns, or other
identifiers from older Impala versions. See Impala Reserved Words for the full
list of reserved words.
The DDL syntax for Kudu tables, particularly in the
CREATE TABLE
statement, is different from the
special impala_next
fork that was previously used
for accessing Kudu tables from Impala:
The DISTRIBUTE BY
clause is now
PARTITIONED BY
.
The INTO N BUCKETS
clause
is now PARTITIONS N
.
The SPLIT ROWS
clause is replaced by
different syntax for specifying the ranges covered by each
partition.
The DESCRIBE
output for Kudu tables includes
several extra columns.
Non-primary-key columns can contain
NULL
values by default. The SHOW CREATE
TABLE
output for these columns displays the
NULL
attribute. There was a period during early
experimental versions of Impala + Kudu where non-primary-key columns
had the NOT NULL
attribute by default.
The IGNORE
keyword that
was present in early experimental versions of Impala + Kudu is no
longer present. The behavior of the IGNORE
keyword
is now the default: DML statements continue with warnings, instead
of failing with errors, if they encounter conditions such as
"primary key already exists" for an INSERT
statement or "primary key already deleted" for a
DELETE
statement.
The replication factor for Kudu tables must be an odd number.
A UDF compiled into an LLVM IR bitcode module
(.bc
) might encounter a runtime error when native
code generation is turned off by setting the query option
DISABLE_CODEGEN=1
. This issue also applies when
running a built-in or native UDF with more than 20 arguments. See
IMPALA-4432 for details. As a
workaround, either turn native code generation back on with the
query option DISABLE_CODEGEN=0
, or use the regular
UDF compilation path that does not produce an IR module.
Bug fixes related to parsing of
floating-point values (IMPALA-1731 and IMPALA-3868) can change the
results of casting strings that represent invalid floating-point
values. For example, formerly a string value beginning or ending
with inf
, such as 1.23inf
or
infinite
, now are converted to
NULL
when interpreted as a floating-point value.
Formerly, they were interpreted as the special "infinity" value
when converting from string to floating-point. Similarly, now only
the string NaN
(case-sensitive) is interpreted as
the special "not a number" value. String values containing
multiple dots, such as 3..141
or
3.1.4.1
, are now interpreted as
NULL
rather than being converted to valid
floating-point values.
The default for the RUNTIME_FILTER_MODE
query option is changed to GLOBAL
(the highest
setting).
The RUNTIME_BLOOM_FILTER_SIZE
setting is now only
used as a fallback if statistics are not available; otherwise,
Impala uses the statistics to estimate the appropriate size to use
for each filter.
Admission control and dynamic resource pools are enabled by default. When upgrading from an earlier release, you must turn on these settings yourself if they are not already enabled. See Admission Control and Query Queuing for details about admission control.
Impala reserves some new keywords, in preparation for support for
Kudu syntax: buckets
, delete
,
distribute
, hash
,
ignore
, split
, and
update
.
For Kerberized clusters, the Catalog service now
uses the Kerberos principal instead of the operating sytem user that
runs the catalogd daemon. This eliminates the
requirement to configure a
hadoop.user.group.static.mapping.overrides
setting to put the OS user into the Sentry administrative group, on
clusters where the principal and the OS user name for this user are
different.
The mechanism for interpreting DECIMAL
literals
is improved, no longer going through an intermediate conversion step
to DOUBLE
:
Casting a DECIMAL
value to
TIMESTAMP
DOUBLE
produces a more precise value for the
TIMESTAMP
than formerly.
Certain function calls involving
DECIMAL
literals now succeed, when formerly
they failed due to lack of a function signature with a
DOUBLE
argument.
Improved type accuracy for CASE
return values. If all WHEN
clauses of the
CASE
expression are of CHAR
type, the final result is also CHAR
instead of
being converted to STRING
.
set
num_scanner_threads=30
set batch_size=512
set mem_limit=64g
The S3_SKIP_INSERT_STAGING
query option, which is
enabled by default, increases the speed of INSERT
operations for S3 tables. The speedup applies to regular
INSERT
, but not INSERT
OVERWRITE
. The tradeoff is the possibility of inconsistent
output files left behind if a node fails during
INSERT
execution. See S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)
for details.
Certain features are turned off by default, to avoid regressions or unexpected behavior following an upgrade. Consider turning on these features after suitable testing:
Impala now recognizes the
auth_to_local
setting, specified through the HDFS
configuration setting
hadoop.security.auth_to_local
. This feature is
disabled by default; to enable it, specify
--load_auth_to_local_rules=true
in the
impalad configuration settings.
A new query option,
PARQUET_ANNOTATE_STRINGS_UTF8
, makes Impala
include the UTF-8
annotation metadata for
STRING
, CHAR
, and
VARCHAR
columns in Parquet files created by
INSERT
or CREATE TABLE AS SELECT
statements.
A new query option,
PARQUET_FALLBACK_SCHEMA_RESOLUTION
, lets Impala
locate columns within Parquet files based on column name rather than
ordinal position. This enhancement improves interoperability with
applications that write Parquet files with a different order or
subset of columns than are used in the Impala table.
The admission control default limit for concurrent queries (the max requests setting) is now unlimited instead of 200.
Multiplying a mixture of
DECIMAL
and FLOAT
or
DOUBLE
values now returns DOUBLE
rather than DECIMAL
. This change avoids some cases
where an intermediate value would underflow or overflow and become
NULL
unexpectedly. The results of multiplying
DECIMAL
and FLOAT
or
DOUBLE
might now be slightly less precise than
before. Previously, the intermediate types and thus the final result
depended on the exact order of the values of different types being
multiplied, which made the final result values difficult to reason
about.
Previously, the _
and %
wildcard
characters for the LIKE
operator would not match
characters on the second or subsequent lines of multi-line string
values. The fix for issue IMPALA-2204 causes the wildcard matching to apply to the
entire string for values containing embedded \n
characters. This could cause different results than in previous
Impala releases for identical queries on identical data.
Formerly, all Impala UDFs and UDAs required running the
CREATE FUNCTION
statements to re-create them
after each catalogd restart. In Impala 2.5 and higher, functions written in C++ are
persisted across restarts, and the requirement to re-create
functions only applies to functions written in Java. Adapt any
function-reloading logic that you have added to your Impala
environment.
CREATE TABLE LIKE
no longer inherits HDFS caching
settings from the source table.
The SHOW DATABASES
statement
now returns two columns rather than one. The second column includes
the associated comment string, if any, for each database. Adjust any
application code that examines the list of databases and assumes the
result set contains only a single column.
The output of the SHOW FUNCTIONS
statement
includes two new columns, showing the kind of the function (for
example, BUILTIN
) and whether or not the function
persists across catalog server restarts. For example, the
SHOW FUNCTIONS
output for the
_impala_builtins
database starts with:
+--------------+-------------------------------------------------+-------------+---------------+
| return type | signature | binary type | is persistent |
+--------------+-------------------------------------------------+-------------+---------------+
| BIGINT | abs(BIGINT) | BUILTIN | true |
| DECIMAL(*,*) | abs(DECIMAL(*,*)) | BUILTIN | true |
| DOUBLE | abs(DOUBLE) | BUILTIN | true |
...
Other than support for DSSD storage, the Impala feature set for Impala 2.4 is the same as for Impala 2.3. Therefore, there are no incompatible changes for Impala introduced in Impala 2.4.
The use of the Llama component for integrated resource management within YARN is no longer supported with Impala 2.3 and higher. The Llama support code is removed entirely in Impala 2.8 and higher.
For clusters running Impala alongside other data management components, you define static service pools to define the resources available to Impala and other components. Then within the area allocated for Impala, you can create dynamic service pools, each with its own settings for the Impala admission control feature.
If Impala encounters a Parquet file that is invalid because of an incorrect magic number, the query skips the file. This change is caused by the fix for issue IMPALA-2130. Previously, Impala would attempt to read the file despite the possibility that the file was corrupted.
Previously, calls to overloaded built-in functions could treat
parameters as DOUBLE
or FLOAT
when
no overload had a signature that matched the exact argument types.
Now Impala prefers the function signature with
DECIMAL
parameters in this case. This change
avoids a possible loss of precision in function calls such as
greatest(0, 99999.8888)
; now both parameters are
treated as DECIMAL
rather than
DOUBLE
, avoiding any loss of precision in the
fractional value. This could cause slightly different results than
in previous Impala releases for certain function calls.
Formerly, adding or subtracting a large interval value to a
TIMESTAMP
could produce a nonsensical result. Now
when the result goes outside the range of TIMESTAMP
values, Impala returns NULL
.
Formerly, it was possible to accidentally create a table with identical row and column delimiters. This could happen unintentionally, when specifying one of the delimiters and using the default value for the other. Now an attempt to use identical delimiters still succeeds, but displays a warning message.
Formerly, Impala could include snippets of table data in log files by default, for example when reporting conversion errors for data values. Now any such log messages are only produced at higher logging levels that you would enable only during debugging.
Impala queries ignore files with extensions commonly used for temporary work files by
Hadoop tools. Any files with extensions .tmp
or
.copying
are not considered part of the Impala table. The suffix
matching is case-insensitive, so for example Impala ignores both
.copying
and .COPYING
suffixes.
The log rotation feature in Impala 2.2.0 and higher means that older
log files are now removed by default. The default is to preserve the
latest 10 log files for each severity level, for each Impala-related
daemon. If you have set up your own log rotation processes that expect
older files to be present, either adjust your procedures or change the
Impala -max_log_files
setting. See
Rotating Impala Logs for details.
The prerequisite for CPU architecture has been relaxed in Impala 2.2.0 and higher. From this release onward, Impala works on CPUs that have the SSSE3 instruction set. The SSE4 instruction set is no longer required. This relaxed requirement simplifies the upgrade planning from Impala 1.x releases, which also worked on SSSE3-enabled processors.
Currently, Impala 2.1.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check the CPU level of the hosts in your cluster before upgrading to Impala 2.1.
The "small query" optimization feature introduces some new
information in the EXPLAIN
plan, which you might need
to account for if you parse the text of the plan output.
New SQL syntax introduces additional reserved words:
FOR
, GRANT
,
REVOKE
, ROLE
,
ROLES
, INCREMENTAL
. As always, see Impala Reserved Words for the set of
reserved words for the current release, and the quoting techniques
to avoid name conflicts.
No incompatible changes.
No incompatible changes.
No incompatible changes.
The INSERT
statement has always left behind a hidden work directory
inside the data directory of the table. Formerly, this hidden work directory was named
.impala_insert_staging . In Impala 2.0.1 and later, this directory
name is changed to _impala_insert_staging . (While HDFS tools are
expected to treat names beginning either with underscore and dot as hidden, in practice
names beginning with an underscore are more widely supported.) If you have any scripts,
cleanup jobs, and so on that rely on the name of this work directory, adjust them to use
the new name.
The abs()
function now takes a broader range of
numeric types as arguments, and the return type is the same as the
argument type.
Shorthand notation for character classes in regular expressions,
such as \d
for digit, are now available again in
regular expression operators and functions such as
regexp_extract()
and
regexp_replace()
. Some other differences in
regular expression behavior remain between Impala 1.x and Impala 2.x
releases. See Incompatible Changes Introduced in Impala 2.0.0
for details.
Currently, Impala 2.0.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check the CPU level of the hosts in your cluster before upgrading to Impala 2.0.
The new syntax where query hints are allowed in comments causes some
changes in the way comments are parsed in the
impala-shell interpreter. Previously, you could
end a --
comment line with a semicolon and
impala-shell would treat that as a no-op
statement. Now, a comment line ending with a semicolon is passed as an
empty statement to the Impala daemon, where it is flagged as an error.
Impala 2.0 and later uses a different support library for regular expression parsing than in earlier Impala versions. Now, Impala uses the Google RE2 library rather than Boost for evaluating regular expressions. This implementation change causes some differences in the allowed regular expression syntax, and in the way certain regex operators are interpreted. The following are some of the major differences (not necessarily a complete list):
.*?
notation for non-greedy matches is now
supported, where it was not in earlier Impala releases.
By default, ^
and $
now match
only begin/end of buffer, not begin/end of each line. This
behavior can be overridden in the regex itself using the
m
flag.
By default, .
does not match newline. This
behavior can be overridden in the regex itself using the
s
flag.
\Z
is not supported.
<
and >
for start of word
and end of word are not supported.
Lookahead and lookbehind are not supported.
Shorthand notation for character classes, such as
\d
for digit, is not recognized. (This
restriction is lifted in Impala 2.0.1, which restores the
shorthand notation.)
In Impala 2.0 and later, user()
returns the full Kerberos principal
string, such as user@example.com
, in a Kerberized environment.
The changed format for the user name in secure environments is also
reflected where the user name is displayed in the output of the
PROFILE
command.
In the output from SHOW FUNCTIONS
, SHOW
AGGREGATE FUNCTIONS
, and SHOW ANALYTIC
FUNCTIONS
, arguments and return types of arbitrary
DECIMAL
scale and precision are represented as
DECIMAL(*,*)
. Formerly, these items were displayed
as DECIMAL(-1,-1)
.
The PARQUET_COMPRESSION_CODEC
query option has been
replaced by the COMPRESSION_CODEC
query option. See COMPRESSION_CODEC Query Option (Impala 2.0 or higher only) for
details.
The meaning of the --idle_query_timeout
configuration option is changed, to accommodate the new
QUERY_TIMEOUT_S
query option. Rather than setting
an absolute timeout period that applies to all queries, it now sets a
maximum timeout period, which can be adjusted downward for individual
queries by specifying a value for the QUERY_TIMEOUT_S
query option. In sessions where no QUERY_TIMEOUT_S
query option is specified, the --idle_query_timeout
timeout period applies the same as in earlier versions.
The --strict_unicode
option of
impala-shell was removed. To avoid problems with
Unicode values in impala-shell, define the
following locale setting before running
impala-shell:
export LC_CTYPE=en_US.UTF-8
Some new SQL syntax requires the addition of new reserved words:
ANTI
, ANALYTIC
,
OVER
, PRECEDING
,
UNBOUNDED
, FOLLOWING
,
CURRENT
, ROWS
,
RANGE
, CHAR
,
VARCHAR
. As always, see Impala Reserved Words for the set of
reserved words for the current release, and the quoting techniques
to avoid name conflicts.
The default Parquet block size for Impala is
changed from 1 GB to 256 MB. This change could have implications for
the sizes of Parquet files produced by INSERT
and
CREATE TABLE AS SELECT
statements.
Although older Impala releases typically produced files that were
smaller than the old default size of 1 GB, now the file size matches
more closely whatever value is specified for the
PARQUET_FILE_SIZE
query option. Thus, if you use a
non-default value for this setting, the output files could be larger
than before. They still might be somewhat smaller than the specified
value, because Impala makes conservative estimates about the space
needed to represent each column as it encodes the data.
When you do not specify an explicit value for the
PARQUET_FILE_SIZE
query option, Impala tries to
keep the file size within the 256 MB default size, but Impala might
adjust the file size to be somewhat larger if needed to accommodate
the layout for wide tables, that is, tables with hundreds
or thousands of columns.
This change is unlikely to affect memory usage while writing Parquet files, because Impala does not pre-allocate the memory needed to hold the entire Parquet block.
No incompatible changes.
No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with Impala.
None. Impala 1.4.2 is purely a bug-fix release. It does not include any incompatible changes.
None. Impala 1.4.1 is purely a bug-fix release. It does not include any incompatible changes.
There is a slight change to required security privileges in the
Sentry framework. To create a new object, now you need the
ALL
privilege on the parent object. For example,
to create a new table, view, or function requires having the
ALL
privilege on the database containing the new
object. See Impala Authorization for a full list
of operations and associated privileges.
With the ability of ORDER BY
queries to process
unlimited amounts of data with no LIMIT
clause, the
query options DEFAULT_ORDER_BY_LIMIT
and
ABORT_ON_DEFAULT_LIMIT_EXCEEDED
are now
deprecated and have no effect. See ORDER BY Clause for details about
improvements to the ORDER BY
clause.
There are some changes to the list of reserved words. See Impala Reserved Words for the most current list. The following keywords are new:
API_VERSION
BINARY
CACHED
CLASS
PARTITIONS
PRODUCED
UNCACHED
The following were formerly reserved keywords, but are no longer reserved:
COUNT
GROUP_CONCAT
NDV
SUM
The fix for issue IMPALA-973
changes the behavior of the INVALIDATE METADATA
statement regarding nonexistent tables. In Impala 1.4.0 and higher,
the statement returns an error if the specified table is not in the
metastore database at all. It completes successfully if the
specified table is in the metastore database but not yet recognized
by Impala, for example if the table was created through Hive.
Formerly, you could issue this statement for a completely
nonexistent table, with no error.
No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with Impala.
With the fix for IMPALA-1019, you can use HDFS caching for files that are accessed by Impala.
In Impala 1.3.1 and higher, the REGEXP
and RLIKE
operators now match a regular expression string that occurs anywhere inside the target
string, the same as if the regular expression was enclosed on each side by
.*
. See REGEXP Operator for
examples. Previously, these operators only succeeded when the regular expression matched
the entire target string. This change improves compatibility with the regular expression
support for popular database systems. There is no change to the behavior of the
regexp_extract()
and regexp_replace()
built-in
functions.
The result set for the SHOW FUNCTIONS
statement
includes a new first column, with the data type of the return value.
See SHOW Statement for
examples.
The EXPLAIN_LEVEL
query option now accepts
numeric options from 0 (most concise) to 3 (most verbose), rather
than only 0 or 1. If you formerly used SET
EXPLAIN_LEVEL=1
to get detailed explain plans, switch to
SET EXPLAIN_LEVEL=3
. If you used the mnemonic
keyword (SET EXPLAIN_LEVEL=verbose
), you do not
need to change your code because now level 3 corresponds to
verbose
. See EXPLAIN_LEVEL Query Option for details
about the allowed explain levels, and Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles for usage
information.
DECIMAL
is now a reserved word. If
you have any databases, tables, columns, or other objects already
named DECIMAL
, quote any references to them using
backticks (``
) to avoid name conflicts with the
keyword. DECIMAL
keyword is a
reserved word, currently Impala does not support
DECIMAL
as a data type for columns. The query option formerly named YARN_POOL
is now
named REQUEST_POOL
to reflect its broader use with
the Impala admission control feature. See REQUEST_POOL Query Option for information
about the option, and Admission Control and Query Queuing for details
about its use with the admission control feature.
There are some changes to the list of reserved words. See Impala Reserved Words for the most current list.
The names of aggregate functions are no longer reserved words,
so you can have databases, tables, columns, or other objects
named AVG
, MIN
, and so on
without any name conflicts.
The internal function names DISTINCTPC
and
DISTINCTPCSA
are no longer reserved words,
although DISTINCT
is still a reserved word.
The keywords CLOSE_FN
and
PREPARE_FN
are now reserved words. See CREATE FUNCTION Statement for
their role in the CREATE FUNCTION
statement,
and Thread-Safe Work Area for UDFs for usage
information.
The HDFS property
dfs.client.file-block-storage-locations.timeout
was renamed to
dfs.client.file-block-storage-locations.timeout.millis
,
to emphasize that the unit of measure is milliseconds, not seconds.
Impala requires a timeout of at least 10 seconds, making the minimum
value for this setting 10000. If you are not using cluster
management software, you might need to edit the
hdfs-site.xml file in the Impala
configuration directory for the new name and minimum value.
There are no incompatible changes introduced in Impala 1.2.4.
Previously, after creating a table in Hive, you had to issue the
INVALIDATE METADATA
statement with no table name, a
potentially expensive operation on clusters with many databases, tables,
and partitions. Starting in Impala 1.2.4, you can issue the statement
INVALIDATE METADATA table_name
for
a table newly created through Hive. Loading the metadata for only this
one table is faster and involves less network overhead. Therefore, you
might revisit your setup DDL scripts to add the table name to
INVALIDATE METADATA
statements, in cases where you
create and populate the tables through Hive before querying them through
Impala.
Because the feature set of Impala 1.2.3 is identical to Impala 1.2.2, there are no new incompatible changes. See Incompatible Changes Introduced in Impala 1.2.2 if you are upgrading from Impala 1.2.1 or 1.1.x.
The following changes to SQL syntax and semantics in Impala 1.2.2 could require updates to your SQL code, or schema objects such as tables or views:
With the addition of the CROSS JOIN
keyword, you
might need to rewrite any queries that refer to a table named
CROSS
or use the name CROSS
as a
table alias:
-- Formerly, 'cross' in this query was an alias for t1
-- and it was a normal join query.
-- In 1.2.2 and higher, CROSS JOIN is a keyword, so 'cross'
-- is not interpreted as a table alias, and the query
-- uses the special CROSS JOIN processing rather than a
-- regular join.
select * from t1 cross join t2...
-- Now if CROSS is used in other context such as a table or column name,
-- use backticks to escape it.
create table `cross` (x int);
select * from `cross`;
Formerly, a DROP DATABASE
statement in Impala
would not remove the top-level HDFS directory for that database. The
DROP DATABASE
has been enhanced to remove that
directory. (You still need to drop all the tables inside the
database first; this change only applies to the top-level directory
for the entire database.)
PARQUET
is introduced as a synonym for
PARQUETFILE
in the CREATE TABLE
and ALTER TABLE
statements, because that is the
common name for the file format. (As opposed to SequenceFile and
RCFile where the "File" suffix is part of the name.)
Documentation examples have been changed to prefer the new shorter
keyword. The PARQUETFILE
keyword is still available
for backward compatibility with older Impala versions. INT
,
SMALLINT
, TINYINT
, and
FLOAT
without using a CAST()
call.
If you remove the CAST()
calls from
INSERT
statements, those statements might not work
with earlier versions of Impala. Because many users are likely to upgrade straight from Impala 1.x to Impala 1.2.2, also read Incompatible Changes Introduced in Impala 1.2.1 for things to note about upgrading to Impala 1.2.x in general.
The following changes to SQL syntax and semantics in Impala 1.2.1 could require updates to your SQL code, or schema objects such as tables or views:
In Impala 1.2.1 and higher, all NULL
values come at the end of the
result set for ORDER BY ... ASC
queries, and at the beginning of the
result set for ORDER BY ... DESC
queries. In effect,
NULL
is considered greater than all other values for sorting purposes.
The original Impala behavior always put NULL
values at the end, even
for ORDER BY ... DESC
queries. The new behavior in Impala 1.2.1 makes
Impala more compatible with other popular database systems. In Impala 1.2.1 and higher,
you can override or specify the sorting behavior for NULL
by adding the
clause NULLS FIRST
or NULLS LAST
at the end of the
ORDER BY
clause.
See NULL for more information.
The new catalogd service might require changes to
any user-written scripts that stop, start, or restart Impala services,
install or upgrade Impala packages, or issue REFRESH
or
INVALIDATE METADATA
statements:
See Installing Impala, Upgrading Impala and Starting Impala, for usage information for the catalogd daemon.
The REFRESH
and INVALIDATE METADATA
statements are
no longer needed when the CREATE TABLE
, INSERT
, or
other table-changing or data-changing operation is performed through Impala. These
statements are still needed if such operations are done through Hive or by
manipulating data files directly in HDFS, but in those cases the statements only
need to be issued on one Impala node rather than on all nodes. See
REFRESH Statement and
INVALIDATE METADATA Statement for the
latest usage information for those statements.
See The Impala Catalog Service for background information on the catalogd service.
There are no incompatible changes to SQL syntax in Impala 1.2.0 (beta).
The new catalogd service might require changes to
any user-written scripts that stop, start, or restart Impala services,
install or upgrade Impala packages, or issue REFRESH
or
INVALIDATE METADATA
statements:
See Installing Impala, Upgrading Impala and Starting Impala, for usage information for the catalogd daemon.
The REFRESH
and INVALIDATE METADATA
statements are
no longer needed when the CREATE TABLE
, INSERT
, or
other table-changing or data-changing operation is performed through Impala. These
statements are still needed if such operations are done through Hive or by
manipulating data files directly in HDFS, but in those cases the statements only
need to be issued on one Impala node rather than on all nodes. See
REFRESH Statement and
INVALIDATE METADATA Statement for the
latest usage information for those statements.
See The Impala Catalog Service for background information on the catalogd service.
The new resource management feature interacts with both YARN and Llama services. See Resource Management for usage information for Impala resource management.
There are no incompatible changes in Impala 1.1.1.
Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now that Parquet support is available for Hive 10, reusing existing Impala Parquet data files in Hive requires updating the table metadata. Use the following command if you are already running Impala 1.1.1:
ALTER TABLE table_name SET FILEFORMAT PARQUETFILE;
If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:
ALTER TABLE table_name SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
ALTER TABLE table_name SET FILEFORMAT
INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";
Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.
As usual, make sure to upgrade the Impala LZO package to the latest level at the same time as you upgrade the Impala server.
The REFRESH
statement now requires a table name;
in Impala 1.0, the table name was optional. This syntax change is
part of the internal rework to make REFRESH
a true
Impala SQL statement so that it can be called through the JDBC and
ODBC APIs. REFRESH
now reloads the metadata
immediately, rather than marking it for update the next time any
affected table is accessed. The previous behavior, where omitting
the table name caused a refresh of the entire Impala metadata
catalog, is available through the new INVALIDATE
METADATA
statement. INVALIDATE METADATA
can be specified with a table name to affect a single table, or
without a table name to affect the entire metadata catalog; the
relevant metadata is reloaded the next time it is requested during
the processing for a SQL statement. See REFRESH Statement and INVALIDATE METADATA Statement for
the latest details about these statements.