Each of the Impala data types has corresponding notation for literal values of that type.
You specify literal values in SQL statements, such as in the SELECT
list
or WHERE
clause of a query, or as an argument to a function call. See
Data Types for a complete list of types,
ranges, and conversion rules.
To write literals for the integer types (TINYINT
,
SMALLINT
, INT
, and BIGINT
), use a
sequence of digits with optional leading zeros.
To write literals for the floating-point types (DECIMAL
,
FLOAT
, and DOUBLE
), use a sequence of digits with an
optional decimal point (.
character). To preserve accuracy during
arithmetic expressions, Impala interprets floating-point literals as the
DECIMAL
type with the smallest appropriate precision and scale, until
required by the context to convert the result to FLOAT
or
DOUBLE
.
Integer values are promoted to floating-point when necessary, based on the context.
You can also use exponential notation by including an e
character. For
example, 1e6
is 1 times 10 to the power of 6 (1 million). A number in
exponential notation is always interpreted as floating-point.
When Impala encounters a numeric literal, it considers the type to be the "smallest" that can accurately represent the value. The type is promoted to larger or more accurate types if necessary, based on subsequent parts of an expression.
For example, you can see by the types Impala defines for the following table columns how it interprets the corresponding numeric literals:
[localhost:21000] > create table ten as select 10 as x;
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
[localhost:21000] > desc ten;
+------+---------+---------+
| name | type | comment |
+------+---------+---------+
| x | tinyint | |
+------+---------+---------+
[localhost:21000] > create table four_k as select 4096 as x;
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
[localhost:21000] > desc four_k;
+------+----------+---------+
| name | type | comment |
+------+----------+---------+
| x | smallint | |
+------+----------+---------+
[localhost:21000] > create table one_point_five as select 1.5 as x;
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
[localhost:21000] > desc one_point_five;
+------+--------------+---------+
| name | type | comment |
+------+--------------+---------+
| x | decimal(2,1) | |
+------+--------------+---------+
[localhost:21000] > create table one_point_three_three_three as select 1.333 as x;
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
[localhost:21000] > desc one_point_three_three_three;
+------+--------------+---------+
| name | type | comment |
+------+--------------+---------+
| x | decimal(4,3) | |
+------+--------------+---------+
String literals are quoted using either single or double quotation marks. You can use either kind of quotes for string literals, even both kinds for different literals within the same statement.
Quoted literals are considered to be of type STRING
. To use quoted
literals in contexts requiring a CHAR
or VARCHAR
value, CAST()
the literal to a CHAR
or
VARCHAR
of the appropriate length.
Escaping special characters:
To encode special characters within a string literal, precede them with the backslash
(\
) escape character:
\t
represents a tab.
\n
represents a newline or linefeed. This might cause extra line
breaks in impala-shell output.
\r
represents a carriage return. This might cause unusual formatting
(making it appear that some content is overwritten) in impala-shell
output.
\b
represents a backspace. This might cause unusual formatting
(making it appear that some content is overwritten) in impala-shell
output.
\0
represents an ASCII nul
character (not the same
as a SQL NULL
). This might not be visible in
impala-shell output.
\Z
represents a DOS end-of-file character. This might not be visible
in impala-shell output.
\%
and \_
can be used to escape wildcard characters
within the string passed to the LIKE
operator.
\
followed by 3 octal digits represents the ASCII code of a single
character; for example, \101
is ASCII 65, the character
A
.
\\
) to prevent the backslash from
being interpreted as an escape character.
\
does not represent the start of a
recognized escape sequence, the character is passed through unchanged.
Quotes within quotes:
To include a single quotation character within a string value, enclose the literal with
either single or double quotation marks, and optionally escape the single quote as a
\'
sequence. Earlier releases required escaping a single quote inside
double quotes. Continue using escape sequences in this case if you also need to run your
SQL code on older versions of Impala.
To include a double quotation character within a string value, enclose the literal with
single quotation marks, no escaping is necessary in this case. Or, enclose the literal
with double quotation marks and escape the double quote as a \"
sequence.
[localhost:21000] > select "What\'s happening?" as single_within_double,
> 'I\'m not sure.' as single_within_single,
> "Homer wrote \"The Iliad\"." as double_within_double,
> 'Homer also wrote "The Odyssey".' as double_within_single;
+----------------------+----------------------+--------------------------+---------------------------------+
| single_within_double | single_within_single | double_within_double | double_within_single |
+----------------------+----------------------+--------------------------+---------------------------------+
| What's happening? | I'm not sure. | Homer wrote "The Iliad". | Homer also wrote "The Odyssey". |
+----------------------+----------------------+--------------------------+---------------------------------+
Field terminator character in CREATE TABLE:
CREATE TABLE
clauses FIELDS TERMINATED BY
,
ESCAPED BY
, and LINES TERMINATED BY
have special rules
for the string literal used for their argument, because they all require a single
character. You can use a regular character surrounded by single or double quotation
marks, an octal sequence such as '\054'
(representing a comma), or an
integer in the range '-127'..'128' (with quotation marks but no backslash), which is
interpreted as a single-byte ASCII character. Negative values are subtracted from 256;
for example, FIELDS TERMINATED BY '-2'
sets the field delimiter to
ASCII code 254, the "Icelandic Thorn" character used as a delimiter by some data
formats.
impala-shell considerations:
When dealing with output that includes non-ASCII or non-printable characters such as linefeeds and backspaces, use the impala-shell options to save to a file, turn off pretty printing, or both rather than relying on how the output appears visually. See impala-shell Configuration Options for a list of impala-shell options.
For BOOLEAN
values, the literals are TRUE
and
FALSE
, with no quotation marks and case-insensitive.
Examples:
select true;
select * from t1 where assertion = false;
select case bool_col when true then 'yes' when false 'no' else 'null' end from t1;
Impala automatically converts STRING
literals of the correct format
into TIMESTAMP
values. Timestamp values are accepted in the format
'yyyy‑MM‑dd HH:mm:ss.SSSSSS'
, and can consist of just the date, or
just the time, with or without the fractional second portion. For example, you can
specify TIMESTAMP
values such as '1966‑07‑30'
,
'08:30:00'
, or '1985‑09‑25 17:45:30.005'
.
Leading zeroes are not required in the numbers representing the date component, such as
month and date, or the time component, such as hour, minute, and second. For example,
Impala accepts both '2018‑1‑1 01:02:03'
and
'2018‑01‑01 1:2:3'
as valid.
In STRING
to TIMESTAMP
conversions, leading and
trailing white spaces, such as a space, a tab, a newline, or a carriage return, are
ignored. For example, Impala treats the following as equivalent:
'1999‑12‑01 01:02:03 ', ' 1999‑12‑01 01:02:03',
'1999‑12‑01 01:02:03\r\n\t'.
STRING
literal to
TIMESTAMP
, you can use the following separators between the date part
and the time part:
One or more space characters
Example: CAST('2001-01-09 01:05:01' AS TIMESTAMP)
The character “T”
Example: CAST('2001-01-09T01:05:01' AS TIMESTAMP)
You can also use INTERVAL
expressions to add or subtract from timestamp
literal values, such as CAST('1966‑07‑30' AS
TIMESTAMP) + INTERVAL 5 YEARS + INTERVAL 3 DAYS
. See
TIMESTAMP Data Type for details.
Depending on your data pipeline, you might receive date and time data as text, in
notation that does not exactly match the format for Impala TIMESTAMP
literals. See Impala Date and Time Functions for
functions that can convert between a variety of string literals (including different
field order, separators, and timezone notation) and equivalent
TIMESTAMP
or numeric values.
The DATE
literals are in the form of DATE'YYYY-MM-DD'
.
For example, DATE '2013-01-01'
The notion of NULL
values is familiar from all kinds of database
systems, but each SQL dialect can have its own behavior and restrictions on
NULL
values. For Big Data processing, the precise semantics of
NULL
values are significant: any misunderstanding could lead to
inaccurate results or misformatted data, that could be time-consuming to correct for
large data sets.
NULL
is a different value than an empty string. The empty string is
represented by a string literal with nothing inside, ""
or
''
.
NULL
value is represented by the
special token \N
.
NULL
or the empty string, the data is placed
in a special partition that holds only these two kinds of values. When these values
are returned in a query, the result is NULL
whether the value was
originally NULL
or an empty string. This behavior is compatible with
the way Hive treats NULL
values in partitioned tables. Hive does not
allow empty strings as partition keys, and it returns a string value such as
__HIVE_DEFAULT_PARTITION__
instead of NULL
when such
values are returned from a query. For example:
create table t1 (i int) partitioned by (x int, y string);
-- Select an INT column from another table, with all rows going into a special HDFS subdirectory
-- named __HIVE_DEFAULT_PARTITION__. Depending on whether one or both of the partitioning keys
-- are null, this special directory name occurs at different levels of the physical data directory
-- for the table.
insert into t1 partition(x=NULL, y=NULL) select c1 from some_other_table;
insert into t1 partition(x, y=NULL) select c1, c2 from some_other_table;
insert into t1 partition(x=NULL, y) select c1, c3 from some_other_table;
NOT NULL
clause when defining a column to prevent
NULL
values in that column.
DEFAULT
clause to specify a non-NULL
default value.
INSERT
operation mentions some columns but not others, the
unmentioned columns contain NULL
for all inserted rows.
In Impala 1.2.1 and higher, all NULL
values come at the end of the
result set for ORDER BY ... ASC
queries, and at the beginning of the
result set for ORDER BY ... DESC
queries. In effect,
NULL
is considered greater than all other values for sorting purposes.
The original Impala behavior always put NULL
values at the end, even
for ORDER BY ... DESC
queries. The new behavior in Impala 1.2.1 makes
Impala more compatible with other popular database systems. In Impala 1.2.1 and higher,
you can override or specify the sorting behavior for NULL
by adding the
clause NULLS FIRST
or NULLS LAST
at the end of the
ORDER BY
clause.
NULLS FIRST
and NULLS LAST
keywords
are not currently available in Hive queries, any views you create using those
keywords will not be available through Hive.
ORDER BY
, comparing a
NULL
to anything else returns NULL
, making the
comparison meaningless. For example, 10 > NULL
produces
NULL
, 10 < NULL
also produces
NULL
, 5 BETWEEN 1 AND NULL
produces
NULL
, and so on.
Several built-in functions serve as shorthand for evaluating expressions and returning
NULL
, 0, or some other substitution value depending on the expression
result: ifnull()
, isnull()
, nvl()
,
nullif()
, nullifzero()
, and
zeroifnull()
. See
Impala Conditional Functions for details.
Kudu considerations:
Columns in Kudu tables have an attribute that specifies whether or not they can contain
NULL
values. A column with a NULL
attribute can
contain nulls. A column with a NOT NULL
attribute cannot contain any
nulls, and an INSERT
, UPDATE
, or
UPSERT
statement will skip any row that attempts to store a null in a
column designated as NOT NULL
. Kudu tables default to the
NULL
setting for each column, except columns that are part of the
primary key.
In addition to columns with the NOT NULL
attribute, Kudu tables also
have restrictions on NULL
values in columns that are part of the
primary key for a table. No column that is part of the primary key in a Kudu table can
contain any NULL
values.