SQL operators are a class of comparison functions that are widely used within the WHERE clauses of
SELECT statements.
The arithmetic operators use expressions with a left-hand argument, the operator, and then (in most cases) a right-hand argument.
Syntax:
left_hand_arg binary_operator right_hand_arg
unary_operator single_arg
+ and -: Can be used either as unary or binary operators.
With unary notation, such as +5, -2.5, or -col_name,
they multiply their single numeric argument by +1 or -1. Therefore, unary
+ returns its argument unchanged, while unary - flips the sign of its argument. Although
you can double up these operators in expressions such as ++5 (always positive) or -+2 or
+-2 (both always negative), you cannot double the unary minus operator because -- is
interpreted as the start of a comment. (You can use a double unary minus operator if you separate the -
characters, for example with a space or parentheses.)
With binary notation, such as 2+2, 5-2.5, or col1 +
col2, they add or subtract respectively the right-hand argument to (or from) the left-hand
argument. Both arguments must be of numeric types.
* and /: Multiplication and division respectively. Both arguments must be of numeric types.
When multiplying, the shorter argument is promoted if necessary (such as SMALLINT to INT or
BIGINT, or FLOAT to DOUBLE), and then the result is promoted again to the
next larger type. Thus, multiplying a TINYINT and an INT produces a BIGINT
result. Multiplying a FLOAT and a FLOAT produces a DOUBLE result. Multiplying
a FLOAT and a DOUBLE or a DOUBLE and a DOUBLE produces a
DECIMAL(38,17), because DECIMAL values can represent much larger and more precise values than
DOUBLE.
When dividing, Impala always treats the arguments and result as DOUBLE values to avoid losing precision. If you
need to insert the results of a division operation into a FLOAT column, use the CAST()
function to convert the result to the correct type.
DIV: Integer division. Arguments are not promoted to a floating-point type, and any fractional result
is discarded. For example, 13 DIV 7 returns 1, 14 DIV 7 returns 2, and
15 DIV 7 returns 2. This operator is the same as the QUOTIENT() function.
%: Modulo operator. Returns the remainder of the left-hand argument divided by the right-hand argument. Both
arguments must be of one of the integer types.
&, |, ~, and ^: Bitwise operators that return the
logical AND, logical OR, NOT, or logical XOR (exclusive OR) of their argument values. Both arguments must be of
one of the integer types. If the arguments are of different type, the argument with the smaller type is implicitly extended to
match the argument with the longer type.
You can chain a sequence of arithmetic expressions, optionally grouping them with parentheses.
The arithmetic operators generally do not have equivalent calling conventions using functional notation. For example, prior to
Impala 2.2, there is no MOD() function equivalent to the % modulo operator.
Conversely, there are some arithmetic functions that do not have a corresponding operator. For example, for exponentiation you use
the POW() function, but there is no ** exponentiation operator. See
Impala Mathematical Functions for the arithmetic functions you can use.
Complex type considerations:
To access a column with a complex type (ARRAY, STRUCT,
or MAP) in an aggregation function, you unpack the individual elements
using join notation in the query, and then apply the function to the final scalar item,
field, key, or value at the bottom of any nested type hierarchy in the column. See
Complex Types (Impala 2.3 or higher only) for details about using
complex types in Impala.
ARRAY of
STRUCT items). The array is unpacked inside the query using join
notation. The array elements are referenced using the ITEM
pseudocolumn, and the structure fields inside the array elements are referenced using
dot notation. Numeric values such as SUM() and AVG()
are computed using the numeric R_NATIONKEY field, and the
general-purpose MAX() and MIN() values are computed
from the string N_NAME field.
describe region;
+-------------+-------------------------+---------+
| name | type | comment |
+-------------+-------------------------+---------+
| r_regionkey | smallint | |
| r_name | string | |
| r_comment | string | |
| r_nations | array<struct< | |
| | n_nationkey:smallint, | |
| | n_name:string, | |
| | n_comment:string | |
| | >> | |
+-------------+-------------------------+---------+
select r_name, r_nations.item.n_nationkey
from region, region.r_nations as r_nations
order by r_name, r_nations.item.n_nationkey;
+-------------+------------------+
| r_name | item.n_nationkey |
+-------------+------------------+
| AFRICA | 0 |
| AFRICA | 5 |
| AFRICA | 14 |
| AFRICA | 15 |
| AFRICA | 16 |
| AMERICA | 1 |
| AMERICA | 2 |
| AMERICA | 3 |
| AMERICA | 17 |
| AMERICA | 24 |
| ASIA | 8 |
| ASIA | 9 |
| ASIA | 12 |
| ASIA | 18 |
| ASIA | 21 |
| EUROPE | 6 |
| EUROPE | 7 |
| EUROPE | 19 |
| EUROPE | 22 |
| EUROPE | 23 |
| MIDDLE EAST | 4 |
| MIDDLE EAST | 10 |
| MIDDLE EAST | 11 |
| MIDDLE EAST | 13 |
| MIDDLE EAST | 20 |
+-------------+------------------+
select
r_name,
count(r_nations.item.n_nationkey) as count,
sum(r_nations.item.n_nationkey) as sum,
avg(r_nations.item.n_nationkey) as avg,
min(r_nations.item.n_name) as minimum,
max(r_nations.item.n_name) as maximum,
ndv(r_nations.item.n_nationkey) as distinct_vals
from
region, region.r_nations as r_nations
group by r_name
order by r_name;
+-------------+-------+-----+------+-----------+----------------+---------------+
| r_name | count | sum | avg | minimum | maximum | distinct_vals |
+-------------+-------+-----+------+-----------+----------------+---------------+
| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
+-------------+-------+-----+------+-----------+----------------+---------------+
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
The following example shows how to do an arithmetic operation using a numeric field of a STRUCT type that is an
item within an ARRAY column. Once the scalar numeric value R_NATIONKEY is extracted, it can be
used in an arithmetic expression, such as multiplying by 10:
-- The SMALLINT is a field within an array of structs.
describe region;
+-------------+-------------------------+---------+
| name | type | comment |
+-------------+-------------------------+---------+
| r_regionkey | smallint | |
| r_name | string | |
| r_comment | string | |
| r_nations | array<struct< | |
| | n_nationkey:smallint, | |
| | n_name:string, | |
| | n_comment:string | |
| | >> | |
+-------------+-------------------------+---------+
-- When we refer to the scalar value using dot notation,
-- we can use arithmetic and comparison operators on it
-- like any other number.
select r_name, nation.item.n_name, nation.item.n_nationkey * 10
from region, region.r_nations as nation
where nation.item.n_nationkey < 5;
+-------------+-------------+------------------------------+
| r_name | item.n_name | nation.item.n_nationkey * 10 |
+-------------+-------------+------------------------------+
| AMERICA | CANADA | 30 |
| AMERICA | BRAZIL | 20 |
| AMERICA | ARGENTINA | 10 |
| MIDDLE EAST | EGYPT | 40 |
| AFRICA | ALGERIA | 0 |
+-------------+-------------+------------------------------+
In a WHERE clause, compares an expression to both a lower and upper bound. The comparison is successful is the
expression is greater than or equal to the lower bound, and less than or equal to the upper bound. If the bound values are switched,
so the lower bound is greater than the upper bound, does not match any values.
Syntax:
expression BETWEEN lower_bound AND upper_bound
Data types: Typically used with numeric data types. Works with any data type, although not very practical for
BOOLEAN values. (BETWEEN false AND true will match all BOOLEAN values.) Use
CAST() if necessary to ensure the lower and upper bound values are compatible types. Call string or date/time
functions if necessary to extract or transform the relevant portion to compare, especially if the value can be transformed into a
number.
Usage notes:
Be careful when using short string operands. A longer string that starts with the upper bound value will not be included, because it
is considered greater than the upper bound. For example, BETWEEN 'A' and 'M' would not match the string value
'Midway'. Use functions such as upper(), lower(), substr(),
trim(), and so on if necessary to ensure the comparison works as expected.
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
Examples:
-- Retrieve data for January through June, inclusive.
select c1 from t1 where month between 1 and 6;
-- Retrieve data for names beginning with 'A' through 'M' inclusive.
-- Only test the first letter to ensure all the values starting with 'M' are matched.
-- Do a case-insensitive comparison to match names with various capitalization conventions.
select last_name from customers where upper(substr(last_name,1,1)) between 'A' and 'M';
-- Retrieve data for only the first week of each month.
select count(distinct visitor_id)) from web_traffic where dayofmonth(when_viewed) between 1 and 7;
The following example shows how to do a BETWEEN comparison using a numeric field of a STRUCT type
that is an item within an ARRAY column. Once the scalar numeric value R_NATIONKEY is extracted, it
can be used in a comparison operator:
-- The SMALLINT is a field within an array of structs.
describe region;
+-------------+-------------------------+---------+
| name | type | comment |
+-------------+-------------------------+---------+
| r_regionkey | smallint | |
| r_name | string | |
| r_comment | string | |
| r_nations | array<struct< | |
| | n_nationkey:smallint, | |
| | n_name:string, | |
| | n_comment:string | |
| | >> | |
+-------------+-------------------------+---------+
-- When we refer to the scalar value using dot notation,
-- we can use arithmetic and comparison operators on it
-- like any other number.
select r_name, nation.item.n_name, nation.item.n_nationkey
from region, region.r_nations as nation
where nation.item.n_nationkey between 3 and 5
+-------------+-------------+------------------+
| r_name | item.n_name | item.n_nationkey |
+-------------+-------------+------------------+
| AMERICA | CANADA | 3 |
| MIDDLE EAST | EGYPT | 4 |
| AFRICA | ETHIOPIA | 5 |
+-------------+-------------+------------------+
Impala supports the familiar comparison operators for checking equality and sort order for the column data types:
Syntax:
left_hand_expression comparison_operator right_hand_expression
=, !=, <>: apply to all types.
<, <=, >, >=: apply to all types; for
BOOLEAN, TRUE is considered greater than FALSE.
Alternatives:
The IN and BETWEEN operators provide shorthand notation for expressing combinations of equality,
less than, and greater than comparisons with a single operator.
Because comparing any value to NULL produces NULL rather than TRUE or
FALSE, use the IS NULL and IS NOT NULL operators to check if a value is
NULL or not.
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
The following example shows how to do an arithmetic operation using a numeric field of a STRUCT type that is an
item within an ARRAY column. Once the scalar numeric value R_NATIONKEY is extracted, it can be
used with a comparison operator such as <:
-- The SMALLINT is a field within an array of structs.
describe region;
+-------------+-------------------------+---------+
| name | type | comment |
+-------------+-------------------------+---------+
| r_regionkey | smallint | |
| r_name | string | |
| r_comment | string | |
| r_nations | array<struct< | |
| | n_nationkey:smallint, | |
| | n_name:string, | |
| | n_comment:string | |
| | >> | |
+-------------+-------------------------+---------+
-- When we refer to the scalar value using dot notation,
-- we can use arithmetic and comparison operators on it
-- like any other number.
select r_name, nation.item.n_name, nation.item.n_nationkey
from region, region.r_nations as nation
where nation.item.n_nationkey < 5
+-------------+-------------+------------------+
| r_name | item.n_name | item.n_nationkey |
+-------------+-------------+------------------+
| AMERICA | CANADA | 3 |
| AMERICA | BRAZIL | 2 |
| AMERICA | ARGENTINA | 1 |
| MIDDLE EAST | EGYPT | 4 |
| AFRICA | ALGERIA | 0 |
+-------------+-------------+------------------+
The EXISTS operator tests whether a subquery returns any results. You typically use it to find values from one
table that have corresponding values in another table.
The converse, NOT EXISTS, helps to find all the values from one table that do not have any corresponding values in
another table.
Syntax:
EXISTS (subquery)
NOT EXISTS (subquery)
Usage notes:
The subquery can refer to a different table than the outer query block, or the same table. For example, you might use
EXISTS or NOT EXISTS to check the existence of parent/child relationships between two columns of
the same table.
You can also use operators and function calls within the subquery to test for other kinds of relationships other than strict
equality. For example, you might use a call to COUNT() in the subquery to check whether the number of matching
values is higher or lower than some limit. You might call a UDF in the subquery to check whether values in one table matches a
hashed representation of those same values in a different table.
NULL considerations:
If the subquery returns any value at all (even NULL), EXISTS returns TRUE and
NOT EXISTS returns false.
The following example shows how even when the subquery returns only NULL values, EXISTS still
returns TRUE and thus matches all the rows from the table in the outer query block.
[localhost:21000] > create table all_nulls (x int);
[localhost:21000] > insert into all_nulls values (null), (null), (null);
[localhost:21000] > select y from t2 where exists (select x from all_nulls);
+---+
| y |
+---+
| 2 |
| 4 |
| 6 |
+---+
However, if the table in the subquery is empty and so the subquery returns an empty result set, EXISTS returns
FALSE:
[localhost:21000] > create table empty (x int);
[localhost:21000] > select y from t2 where exists (select x from empty);
[localhost:21000] >
Added in: Impala 2.0.0
Restrictions:
Correlated subqueries used in EXISTS and IN operators
cannot include a LIMIT clause.
Prior to Impala 2.6,
the NOT EXISTS operator required a correlated subquery.
In Impala 2.6 and higher, NOT EXISTS works with
uncorrelated queries also.
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
Examples:
[localhost:21000] > create table t1 (x int);
[localhost:21000] > insert into t1 values (1), (2), (3), (4), (5), (6);
[localhost:21000] > create table t2 (y int);
[localhost:21000] > insert into t2 values (2), (4), (6);
[localhost:21000] > create table t3 (z int);
[localhost:21000] > insert into t3 values (1), (3), (5);
[localhost:21000] > create table month_names (m string);
[localhost:21000] > insert into month_names values
> ('January'), ('February'), ('March'),
> ('April'), ('May'), ('June'), ('July'),
> ('August'), ('September'), ('October'),
> ('November'), ('December');
The following example shows a correlated subquery that finds all the values in one table that exist in another table. For each value
X from T1, the query checks if the Y column of T2 contains an
identical value, and the EXISTS operator returns TRUE or FALSE as appropriate in
each case.
localhost:21000] > select x from t1 where exists (select y from t2 where t1.x = y);
+---+
| x |
+---+
| 2 |
| 4 |
| 6 |
+---+
An uncorrelated query is less interesting in this case. Because the subquery always returns TRUE, all rows from
T1 are returned. If the table contents where changed so that the subquery did not match any rows, none of the rows
from T1 would be returned.
[localhost:21000] > select x from t1 where exists (select y from t2 where y > 5);
+---+
| x |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+---+
The following example shows how an uncorrelated subquery can test for the existence of some condition within a table. By using
LIMIT 1 or an aggregate function, the query returns a single result or no result based on whether the subquery
matches any rows. Here, we know that T1 and T2 contain some even numbers, but T3
does not.
[localhost:21000] > select "contains an even number" from t1 where exists (select x from t1 where x % 2 = 0) limit 1;
+---------------------------+
| 'contains an even number' |
+---------------------------+
| contains an even number |
+---------------------------+
[localhost:21000] > select "contains an even number" as assertion from t1 where exists (select x from t1 where x % 2 = 0) limit 1;
+-------------------------+
| assertion |
+-------------------------+
| contains an even number |
+-------------------------+
[localhost:21000] > select "contains an even number" as assertion from t2 where exists (select x from t2 where y % 2 = 0) limit 1;
ERROR: AnalysisException: couldn't resolve column reference: 'x'
[localhost:21000] > select "contains an even number" as assertion from t2 where exists (select y from t2 where y % 2 = 0) limit 1;
+-------------------------+
| assertion |
+-------------------------+
| contains an even number |
+-------------------------+
[localhost:21000] > select "contains an even number" as assertion from t3 where exists (select z from t3 where z % 2 = 0) limit 1;
[localhost:21000] >
The following example finds numbers in one table that are 1 greater than numbers from another table. The EXISTS
notation is simpler than an equivalent CROSS JOIN between the tables. (The example then also illustrates how the
same test could be performed using an IN operator.)
[localhost:21000] > select x from t1 where exists (select y from t2 where x = y + 1);
+---+
| x |
+---+
| 3 |
| 5 |
+---+
[localhost:21000] > select x from t1 where x in (select y + 1 from t2);
+---+
| x |
+---+
| 3 |
| 5 |
+---+
The following example finds values from one table that do not exist in another table.
[localhost:21000] > select x from t1 where not exists (select y from t2 where x = y);
+---+
| x |
+---+
| 1 |
| 3 |
| 5 |
+---+
The following example uses the NOT EXISTS operator to find all the leaf nodes in tree-structured data. This
simplified "tree of life" has multiple levels (class, order, family, and so on), with each item pointing upward through a
PARENT pointer. The example runs an outer query and a subquery on the same table, returning only those items whose
ID value is not referenced by the PARENT of any other item.
[localhost:21000] > create table tree (id int, parent int, name string);
[localhost:21000] > insert overwrite tree values
> (0, null, "animals"),
> (1, 0, "placentals"),
> (2, 0, "marsupials"),
> (3, 1, "bats"),
> (4, 1, "cats"),
> (5, 2, "kangaroos"),
> (6, 4, "lions"),
> (7, 4, "tigers"),
> (8, 5, "red kangaroo"),
> (9, 2, "wallabies");
[localhost:21000] > select name as "leaf node" from tree one
> where not exists (select parent from tree two where one.id = two.parent);
+--------------+
| leaf node |
+--------------+
| bats |
| lions |
| tigers |
| red kangaroo |
| wallabies |
+--------------+
Related information:
A case-insensitive comparison operator for STRING data, with basic wildcard capability using _ to match a single
character and % to match multiple characters. The argument expression must match the entire string value.
Typically, it is more efficient to put any % wildcard match at the end of the string.
This operator, available in Impala 2.5 and higher, is the equivalent of the LIKE operator,
but with case-insensitive comparisons.
Syntax:
string_expression ILIKE wildcard_expression
string_expression NOT ILIKE wildcard_expression
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
Examples:
In the following examples, strings that are the same except for differences in uppercase
and lowercase match successfully with ILIKE, but do not match
with LIKE:
select 'fooBar' ilike 'FOOBAR';
+-------------------------+
| 'foobar' ilike 'foobar' |
+-------------------------+
| true |
+-------------------------+
select 'fooBar' like 'FOOBAR';
+------------------------+
| 'foobar' like 'foobar' |
+------------------------+
| false |
+------------------------+
select 'FOOBAR' ilike 'f%';
+---------------------+
| 'foobar' ilike 'f%' |
+---------------------+
| true |
+---------------------+
select 'FOOBAR' like 'f%';
+--------------------+
| 'foobar' like 'f%' |
+--------------------+
| false |
+--------------------+
select 'ABCXYZ' not ilike 'ab_xyz';
+-----------------------------+
| not 'abcxyz' ilike 'ab_xyz' |
+-----------------------------+
| false |
+-----------------------------+
select 'ABCXYZ' not like 'ab_xyz';
+----------------------------+
| not 'abcxyz' like 'ab_xyz' |
+----------------------------+
| true |
+----------------------------+
Related information:
For case-sensitive comparisons, see LIKE Operator. For a more general kind of search operator using regular expressions, see REGEXP Operator or its case-insensitive counterpart IREGEXP Operator.
The IN operator compares an argument value to a set of values, and returns TRUE if the argument
matches any value in the set. The NOT IN operator reverses the comparison, and checks if the argument value is not
part of a set of values.
Syntax:
expression IN (expression [, expression])
expression IN (subquery)
expression NOT IN (expression [, expression])
expression NOT IN (subquery)
The left-hand expression and the set of comparison values must be of compatible types.
The left-hand expression must consist only of a single value, not a tuple. Although the left-hand expression is typically a column
name, it could also be some other value. For example, the WHERE clauses WHERE id IN (5) and
WHERE 5 IN (id) produce the same results.
The set of values to check against can be specified as constants, function calls, column names, or other expressions in the query
text. The maximum number of expressions in the IN list is 9999. (The maximum number of elements of
a single expression is 10,000 items, and the IN operator itself counts as one.)
In Impala 2.0 and higher, the set of values can also be generated by a subquery. IN can evaluate an unlimited
number of results using a subquery.
Usage notes:
Any expression using the IN operator could be rewritten as a series of equality tests connected with
OR, but the IN syntax is often clearer, more concise, and easier for Impala to optimize. For
example, with partitioned tables, queries frequently use IN clauses to filter data by comparing the partition key
columns to specific values.
NULL considerations:
If there really is a matching non-null value, IN returns TRUE:
[localhost:21000] > select 1 in (1,null,2,3);
+----------------------+
| 1 in (1, null, 2, 3) |
+----------------------+
| true |
+----------------------+
[localhost:21000] > select 1 not in (1,null,2,3);
+--------------------------+
| 1 not in (1, null, 2, 3) |
+--------------------------+
| false |
+--------------------------+
If the searched value is not found in the comparison values, and the comparison values include NULL, the result is
NULL:
[localhost:21000] > select 5 in (1,null,2,3);
+----------------------+
| 5 in (1, null, 2, 3) |
+----------------------+
| NULL |
+----------------------+
[localhost:21000] > select 5 not in (1,null,2,3);
+--------------------------+
| 5 not in (1, null, 2, 3) |
+--------------------------+
| NULL |
+--------------------------+
[localhost:21000] > select 1 in (null);
+-------------+
| 1 in (null) |
+-------------+
| NULL |
+-------------+
[localhost:21000] > select 1 not in (null);
+-----------------+
| 1 not in (null) |
+-----------------+
| NULL |
+-----------------+
If the left-hand argument is NULL, IN always returns NULL. This rule applies even
if the comparison values include NULL.
[localhost:21000] > select null in (1,2,3);
+-------------------+
| null in (1, 2, 3) |
+-------------------+
| NULL |
+-------------------+
[localhost:21000] > select null not in (1,2,3);
+-----------------------+
| null not in (1, 2, 3) |
+-----------------------+
| NULL |
+-----------------------+
[localhost:21000] > select null in (null);
+----------------+
| null in (null) |
+----------------+
| NULL |
+----------------+
[localhost:21000] > select null not in (null);
+--------------------+
| null not in (null) |
+--------------------+
| NULL |
+--------------------+
Added in: Available in earlier Impala releases, but new capabilities were added in Impala 2.0.0
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
The following example shows how to do an arithmetic operation using a numeric field of a STRUCT type that is an
item within an ARRAY column. Once the scalar numeric value R_NATIONKEY is extracted, it can be
used in an arithmetic expression, such as multiplying by 10:
-- The SMALLINT is a field within an array of structs.
describe region;
+-------------+-------------------------+---------+
| name | type | comment |
+-------------+-------------------------+---------+
| r_regionkey | smallint | |
| r_name | string | |
| r_comment | string | |
| r_nations | array<struct< | |
| | n_nationkey:smallint, | |
| | n_name:string, | |
| | n_comment:string | |
| | >> | |
+-------------+-------------------------+---------+
-- When we refer to the scalar value using dot notation,
-- we can use arithmetic and comparison operators on it
-- like any other number.
select r_name, nation.item.n_name, nation.item.n_nationkey
from region, region.r_nations as nation
where nation.item.n_nationkey in (1,3,5)
+---------+-------------+------------------+
| r_name | item.n_name | item.n_nationkey |
+---------+-------------+------------------+
| AMERICA | CANADA | 3 |
| AMERICA | ARGENTINA | 1 |
| AFRICA | ETHIOPIA | 5 |
+---------+-------------+------------------+
Restrictions:
Correlated subqueries used in EXISTS and IN operators
cannot include a LIMIT clause.
Examples:
-- Using IN is concise and self-documenting.
SELECT * FROM t1 WHERE c1 IN (1,2,10);
-- Equivalent to series of = comparisons ORed together.
SELECT * FROM t1 WHERE c1 = 1 OR c1 = 2 OR c1 = 10;
SELECT c1 AS "starts with vowel" FROM t2 WHERE upper(substr(c1,1,1)) IN ('A','E','I','O','U');
SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','June','July');
Related information:
Tests whether a value matches a regular expression, using case-insensitive string comparisons.
Uses the POSIX regular expression syntax where ^ and
$ match the beginning and end of the string, . represents any single character, *
represents a sequence of zero or more items, + represents a sequence of one or more items, ?
produces a non-greedy match, and so on.
This operator, available in Impala 2.5 and higher, is the equivalent of the REGEXP operator,
but with case-insensitive comparisons.
Syntax:
string_expression IREGEXP regular_expression
Usage notes:
The | symbol is the alternation operator, typically used within () to match different sequences.
The () groups do not allow backreferences. To retrieve the part of a value matched within a ()
section, use the regexp_extract()
built-in function. (Currently, there is not any case-insensitive equivalent for the regexp_extract() function.)
In Impala 1.3.1 and higher, the REGEXP and RLIKE
operators now match a regular expression string that occurs anywhere inside the target
string, the same as if the regular expression was enclosed on each side by
.*. See REGEXP Operator for
examples. Previously, these operators only succeeded when the regular expression matched
the entire target string. This change improves compatibility with the regular expression
support for popular database systems. There is no change to the behavior of the
regexp_extract() and regexp_replace() built-in
functions.
In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX
Extended Regular Expression syntax used by the Google RE2 library. For details, see
the RE2
documentation. It has most idioms familiar from regular expressions in Perl,
Python, and so on, including .*? for non-greedy matches.
In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the way regular expressions are interpreted by this function. Test any queries that use regular expressions and adjust the expression patterns if necessary. See Incompatible Changes Introduced in Impala 2.0.0 for details.
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
Examples:
The following examples demonstrate the syntax for the IREGEXP operator.
select 'abcABCaabbcc' iregexp '^[a-c]+$';
+---------------------------------+
| 'abcabcaabbcc' iregexp '[a-c]+' |
+---------------------------------+
| true |
+---------------------------------+
Related information:
The IS DISTINCT FROM operator, and its converse the
IS NOT DISTINCT FROM operator, test whether or not
values are identical. IS NOT DISTINCT FROM is similar
to the = operator, and IS DISTINCT
FROM is similar to the != operator, except
that NULL values are treated as identical. Therefore,
IS NOT DISTINCT FROM returns true
rather than NULL, and IS DISTINCT FROM
returns false rather than NULL, when
comparing two NULL values. If one of the values being
compared is NULL and the other is not, IS
DISTINCT FROM returns true and IS
NOT DISTINCT FROM returns false, again
instead of returning NULL in both cases.
Syntax:
expression1 IS DISTINCT FROM expression2
expression1 IS NOT DISTINCT FROM expression2
expression1 <=> expression2
The operator <=> is an alias for IS NOT DISTINCT FROM.
It is typically used as a NULL-safe equality operator in join queries.
That is, A <=> B is true if A equals B
or if both A and B are NULL.
Usage notes:
This operator provides concise notation for comparing two values and always producing a true or
false result, without treating NULL as a special case. Otherwise, to unambiguously distinguish
between two values requires a compound expression involving IS [NOT] NULL tests of both operands in addition to the
= or != operator.
The <=> operator, used like an equality
operator in a join query, is more efficient than the equivalent clause:
IF (A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A =
B). The <=> operator can use a hash
join, while the IF expression cannot.
Examples:
The following examples show how IS DISTINCT FROM gives output similar to
the != operator, and IS NOT DISTINCT FROM gives output
similar to the = operator. The exception is when the expression involves
a NULL value on one side or both sides, where != and
= return NULL but the IS [NOT] DISTINCT FROM
operators still return true or false.
select 1 is distinct from 0, 1 != 0;
+----------------------+--------+
| 1 is distinct from 0 | 1 != 0 |
+----------------------+--------+
| true | true |
+----------------------+--------+
select 1 is distinct from 1, 1 != 1;
+----------------------+--------+
| 1 is distinct from 1 | 1 != 1 |
+----------------------+--------+
| false | false |
+----------------------+--------+
select 1 is distinct from null, 1 != null;
+-------------------------+-----------+
| 1 is distinct from null | 1 != null |
+-------------------------+-----------+
| true | NULL |
+-------------------------+-----------+
select null is distinct from null, null != null;
+----------------------------+--------------+
| null is distinct from null | null != null |
+----------------------------+--------------+
| false | NULL |
+----------------------------+--------------+
select 1 is not distinct from 0, 1 = 0;
+--------------------------+-------+
| 1 is not distinct from 0 | 1 = 0 |
+--------------------------+-------+
| false | false |
+--------------------------+-------+
select 1 is not distinct from 1, 1 = 1;
+--------------------------+-------+
| 1 is not distinct from 1 | 1 = 1 |
+--------------------------+-------+
| true | true |
+--------------------------+-------+
select 1 is not distinct from null, 1 = null;
+-----------------------------+----------+
| 1 is not distinct from null | 1 = null |
+-----------------------------+----------+
| false | NULL |
+-----------------------------+----------+
select null is not distinct from null, null = null;
+--------------------------------+-------------+
| null is not distinct from null | null = null |
+--------------------------------+-------------+
| true | NULL |
+--------------------------------+-------------+
The following example shows how IS DISTINCT FROM considers
CHAR values to be the same (not distinct from each other)
if they only differ in the number of trailing spaces. Therefore, sometimes
the result of an IS [NOT] DISTINCT FROM operator differs
depending on whether the values are STRING/VARCHAR
or CHAR.
select
'x' is distinct from 'x ' as string_with_trailing_spaces,
cast('x' as char(5)) is distinct from cast('x ' as char(5)) as char_with_trailing_spaces;
+-----------------------------+---------------------------+
| string_with_trailing_spaces | char_with_trailing_spaces |
+-----------------------------+---------------------------+
| true | false |
+-----------------------------+---------------------------+
The IS NULL operator, and its converse the IS NOT NULL operator, test whether a specified value is
NULL. Because using NULL with any of the other
comparison operators such as = or != also returns NULL rather than
TRUE or FALSE, you use a special-purpose comparison operator to check for this special condition.
In Impala 2.11 and higher, you can use
the operators IS UNKNOWN and
IS NOT UNKNOWN as synonyms for
IS NULL and IS NOT NULL,
respectively.
Syntax:
expression IS NULL
expression IS NOT NULL
expression IS UNKNOWN
expression IS NOT UNKNOWN
Usage notes:
In many cases, NULL values indicate some incorrect or incomplete processing during data ingestion or conversion.
You might check whether any values in a column are NULL, and if so take some followup action to fill them in.
With sparse data, often represented in "wide" tables, it is common for most values to be NULL with only an
occasional non-NULL value. In those cases, you can use the IS NOT NULL operator to identify the
rows containing any data at all for a particular column, regardless of the actual value.
With a well-designed database schema, effective use of NULL values and IS NULL and IS NOT
NULL operators can save having to design custom logic around special values such as 0, -1, 'N/A', empty
string, and so on. NULL lets you distinguish between a value that is known to be 0, false, or empty, and a truly
unknown value.
Complex type considerations:
The IS [NOT] UNKNOWN operator, as with the IS [NOT] NULL
operator, is not applicable to complex type columns (STRUCT,
ARRAY, or MAP). Using a complex type column with this
operator causes a query error.
Examples:
-- If this value is non-zero, something is wrong.
select count(*) from employees where employee_id is null;
-- With data from disparate sources, some fields might be blank.
-- Not necessarily an error condition.
select count(*) from census where household_income is null;
-- Sometimes we expect fields to be null, and followup action
-- is needed when they are not.
select count(*) from web_traffic where weird_http_code is not null;
This variation of the IS operator tests for truth
or falsity, with right-hand arguments [NOT] TRUE,
[NOT] FALSE, and [NOT] UNKNOWN.
Syntax:
expression IS TRUE
expression IS NOT TRUE
expression IS FALSE
expression IS NOT FALSE
Usage notes:
This IS TRUE and IS FALSE forms are
similar to doing equality comparisons with the Boolean values
TRUE and FALSE, except that
IS TRUE and IS FALSE
always return either TRUE or FALSE,
even if the left-hand side expression returns NULL
These operators let you simplify Boolean comparisons that must also
check for NULL, for example
X != 10 AND X IS NOT NULL is equivalent to
(X != 10) IS TRUE.
In Impala 2.11 and higher, you can use the operators IS
[NOT] TRUE and IS [NOT] FALSE as equivalents for the built-in
functions ISTRUE(), ISNOTTRUE(),
ISFALSE(), and ISNOTFALSE().
Complex type considerations:
The IS [NOT] TRUE and IS [NOT] FALSE operators are not
applicable to complex type columns (STRUCT, ARRAY, or
MAP). Using a complex type column with these operators causes a query error.
Added in: Impala 2.11.0
Examples:
select assertion, b, b is true, b is false, b is unknown
from boolean_test;
+-------------+-------+-----------+------------+-----------+
| assertion | b | istrue(b) | isfalse(b) | b is null |
+-------------+-------+-----------+------------+-----------+
| 2 + 2 = 4 | true | true | false | false |
| 2 + 2 = 5 | false | false | true | false |
| 1 = null | NULL | false | false | true |
| null = null | NULL | false | false | true |
+-------------+-------+-----------+------------+-----------+
A comparison operator for STRING data, with basic wildcard capability using the underscore
(_) to match a single character and the percent sign (%) to match multiple
characters. The argument expression must match the entire string value.
Typically, it is more efficient to put any % wildcard match at the end of the string.
Syntax:
string_expression LIKE wildcard_expression
string_expression NOT LIKE wildcard_expression
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
Examples:
select distinct c_last_name from customer where c_last_name like 'Mc%' or c_last_name like 'Mac%';
select count(c_last_name) from customer where c_last_name like 'M%';
select c_email_address from customer where c_email_address like '%.edu';
-- We can find 4-letter names beginning with 'M' by calling functions...
select distinct c_last_name from customer where length(c_last_name) = 4 and substr(c_last_name,1,1) = 'M';
-- ...or in a more readable way by matching M followed by exactly 3 characters.
select distinct c_last_name from customer where c_last_name like 'M___';
For case-insensitive comparisons, see ILIKE Operator. For a more general kind of search operator using regular expressions, see REGEXP Operator or its case-insensitive counterpart IREGEXP Operator.
Logical operators return a BOOLEAN value, based on a binary or unary logical operation between arguments that are
also Booleans. Typically, the argument expressions use comparison
operators.
Syntax:
boolean_expression binary_logical_operator boolean_expression
unary_logical_operator boolean_expression
The Impala logical operators are:
AND: A binary operator that returns true if its left-hand and right-hand arguments both evaluate
to true, NULL if either argument is NULL, and false otherwise.
OR: A binary operator that returns true if either of its left-hand and right-hand arguments
evaluate to true, NULL if one argument is NULL and the other is either
NULL or false, and false otherwise.
NOT: A unary operator that flips the state of a Boolean expression from true to
false, or false to true. If the argument expression is NULL,
the result remains NULL. (When NOT is used this way as a unary logical operator, it works
differently than the IS NOT NULL comparison operator, which returns true when applied to a
NULL.)
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
The following example shows how to do an arithmetic operation using a numeric field of a STRUCT type that is an
item within an ARRAY column. Once the scalar numeric value R_NATIONKEY is extracted, it can be
used in an arithmetic expression, such as multiplying by 10:
-- The SMALLINT is a field within an array of structs.
describe region;
+-------------+-------------------------+---------+
| name | type | comment |
+-------------+-------------------------+---------+
| r_regionkey | smallint | |
| r_name | string | |
| r_comment | string | |
| r_nations | array<struct< | |
| | n_nationkey:smallint, | |
| | n_name:string, | |
| | n_comment:string | |
| | >> | |
+-------------+-------------------------+---------+
-- When we refer to the scalar value using dot notation,
-- we can use arithmetic and comparison operators on it
-- like any other number.
select r_name, nation.item.n_name, nation.item.n_nationkey
from region, region.r_nations as nation
where
nation.item.n_nationkey between 3 and 5
or nation.item.n_nationkey < 15;
+-------------+----------------+------------------+
| r_name | item.n_name | item.n_nationkey |
+-------------+----------------+------------------+
| EUROPE | UNITED KINGDOM | 23 |
| EUROPE | RUSSIA | 22 |
| EUROPE | ROMANIA | 19 |
| ASIA | VIETNAM | 21 |
| ASIA | CHINA | 18 |
| AMERICA | UNITED STATES | 24 |
| AMERICA | PERU | 17 |
| AMERICA | CANADA | 3 |
| MIDDLE EAST | SAUDI ARABIA | 20 |
| MIDDLE EAST | EGYPT | 4 |
| AFRICA | MOZAMBIQUE | 16 |
| AFRICA | ETHIOPIA | 5 |
+-------------+----------------+------------------+
Examples:
These examples demonstrate the AND operator:
[localhost:21000] > select true and true;
+---------------+
| true and true |
+---------------+
| true |
+---------------+
[localhost:21000] > select true and false;
+----------------+
| true and false |
+----------------+
| false |
+----------------+
[localhost:21000] > select false and false;
+-----------------+
| false and false |
+-----------------+
| false |
+-----------------+
[localhost:21000] > select true and null;
+---------------+
| true and null |
+---------------+
| NULL |
+---------------+
[localhost:21000] > select (10 > 2) and (6 != 9);
+-----------------------+
| (10 > 2) and (6 != 9) |
+-----------------------+
| true |
+-----------------------+
These examples demonstrate the OR operator:
[localhost:21000] > select true or true;
+--------------+
| true or true |
+--------------+
| true |
+--------------+
[localhost:21000] > select true or false;
+---------------+
| true or false |
+---------------+
| true |
+---------------+
[localhost:21000] > select false or false;
+----------------+
| false or false |
+----------------+
| false |
+----------------+
[localhost:21000] > select true or null;
+--------------+
| true or null |
+--------------+
| true |
+--------------+
[localhost:21000] > select null or true;
+--------------+
| null or true |
+--------------+
| true |
+--------------+
[localhost:21000] > select false or null;
+---------------+
| false or null |
+---------------+
| NULL |
+---------------+
[localhost:21000] > select (1 = 1) or ('hello' = 'world');
+--------------------------------+
| (1 = 1) or ('hello' = 'world') |
+--------------------------------+
| true |
+--------------------------------+
[localhost:21000] > select (2 + 2 != 4) or (-1 > 0);
+--------------------------+
| (2 + 2 != 4) or (-1 > 0) |
+--------------------------+
| false |
+--------------------------+
These examples demonstrate the NOT operator:
[localhost:21000] > select not true;
+----------+
| not true |
+----------+
| false |
+----------+
[localhost:21000] > select not false;
+-----------+
| not false |
+-----------+
| true |
+-----------+
[localhost:21000] > select not null;
+----------+
| not null |
+----------+
| NULL |
+----------+
[localhost:21000] > select not (1=1);
+-------------+
| not (1 = 1) |
+-------------+
| false |
+-------------+
Tests whether a value matches a regular expression. Uses the POSIX regular expression syntax where ^ and
$ match the beginning and end of the string, . represents any single character, *
represents a sequence of zero or more items, + represents a sequence of one or more items, ?
produces a non-greedy match, and so on.
Syntax:
string_expression REGEXP regular_expression
Usage notes:
The RLIKE operator is a synonym for REGEXP.
The | symbol is the alternation operator, typically used within () to match different sequences.
The () groups do not allow backreferences. To retrieve the part of a value matched within a ()
section, use the regexp_extract()
built-in function.
In Impala 1.3.1 and higher, the REGEXP and RLIKE
operators now match a regular expression string that occurs anywhere inside the target
string, the same as if the regular expression was enclosed on each side by
.*. See REGEXP Operator for
examples. Previously, these operators only succeeded when the regular expression matched
the entire target string. This change improves compatibility with the regular expression
support for popular database systems. There is no change to the behavior of the
regexp_extract() and regexp_replace() built-in
functions.
In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX
Extended Regular Expression syntax used by the Google RE2 library. For details, see
the RE2
documentation. It has most idioms familiar from regular expressions in Perl,
Python, and so on, including .*? for non-greedy matches.
In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the way regular expressions are interpreted by this function. Test any queries that use regular expressions and adjust the expression patterns if necessary. See Incompatible Changes Introduced in Impala 2.0.0 for details.
Complex type considerations:
You cannot refer to a column with a complex data type (ARRAY,
STRUCT, or MAP directly in an operator. You can apply
operators only to scalar values that make up a complex type (the fields of a
STRUCT, the items of an ARRAY, or the key or value
portion of a MAP) as part of a join query that refers to the scalar
value using the appropriate dot notation or ITEM, KEY,
or VALUE pseudocolumn names.
Examples:
The following examples demonstrate the identical syntax for the REGEXP and RLIKE operators.
-- Find all customers whose first name starts with 'J', followed by 0 or more of any character.
select c_first_name, c_last_name from customer where c_first_name regexp '^J.*';
select c_first_name, c_last_name from customer where c_first_name rlike '^J.*';
-- Find 'Macdonald', where the first 'a' is optional and the 'D' can be upper- or lowercase.
-- The ^...$ are required, to match the start and end of the value.
select c_first_name, c_last_name from customer where c_last_name regexp '^Ma?c[Dd]onald$';
select c_first_name, c_last_name from customer where c_last_name rlike '^Ma?c[Dd]onald$';
-- Match multiple character sequences, either 'Mac' or 'Mc'.
select c_first_name, c_last_name from customer where c_last_name regexp '^(Mac|Mc)donald$';
select c_first_name, c_last_name from customer where c_last_name rlike '^(Mac|Mc)donald$';
-- Find names starting with 'S', then one or more vowels, then 'r', then any other characters.
-- Matches 'Searcy', 'Sorenson', 'Sauer'.
select c_first_name, c_last_name from customer where c_last_name regexp '^S[aeiou]+r.*$';
select c_first_name, c_last_name from customer where c_last_name rlike '^S[aeiou]+r.*$';
-- Find names that end with 2 or more vowels: letters from the set a,e,i,o,u.
select c_first_name, c_last_name from customer where c_last_name regexp '.*[aeiou]{2,}$';
select c_first_name, c_last_name from customer where c_last_name rlike '.*[aeiou]{2,}$';
-- You can use letter ranges in the [] blocks, for example to find names starting with A, B, or C.
select c_first_name, c_last_name from customer where c_last_name regexp '^[A-C].*';
select c_first_name, c_last_name from customer where c_last_name rlike '^[A-C].*';
-- If you are not sure about case, leading/trailing spaces, and so on, you can process the
-- column using string functions first.
select c_first_name, c_last_name from customer where lower(trim(c_last_name)) regexp '^de.*';
select c_first_name, c_last_name from customer where lower(trim(c_last_name)) rlike '^de.*';
Related information:
For regular expression matching with case-insensitive comparisons, see IREGEXP Operator.
Synonym for the REGEXP operator. See REGEXP Operator for details.
Examples:
The following examples demonstrate the identical syntax for the REGEXP and RLIKE operators.
-- Find all customers whose first name starts with 'J', followed by 0 or more of any character.
select c_first_name, c_last_name from customer where c_first_name regexp '^J.*';
select c_first_name, c_last_name from customer where c_first_name rlike '^J.*';
-- Find 'Macdonald', where the first 'a' is optional and the 'D' can be upper- or lowercase.
-- The ^...$ are required, to match the start and end of the value.
select c_first_name, c_last_name from customer where c_last_name regexp '^Ma?c[Dd]onald$';
select c_first_name, c_last_name from customer where c_last_name rlike '^Ma?c[Dd]onald$';
-- Match multiple character sequences, either 'Mac' or 'Mc'.
select c_first_name, c_last_name from customer where c_last_name regexp '^(Mac|Mc)donald$';
select c_first_name, c_last_name from customer where c_last_name rlike '^(Mac|Mc)donald$';
-- Find names starting with 'S', then one or more vowels, then 'r', then any other characters.
-- Matches 'Searcy', 'Sorenson', 'Sauer'.
select c_first_name, c_last_name from customer where c_last_name regexp '^S[aeiou]+r.*$';
select c_first_name, c_last_name from customer where c_last_name rlike '^S[aeiou]+r.*$';
-- Find names that end with 2 or more vowels: letters from the set a,e,i,o,u.
select c_first_name, c_last_name from customer where c_last_name regexp '.*[aeiou]{2,}$';
select c_first_name, c_last_name from customer where c_last_name rlike '.*[aeiou]{2,}$';
-- You can use letter ranges in the [] blocks, for example to find names starting with A, B, or C.
select c_first_name, c_last_name from customer where c_last_name regexp '^[A-C].*';
select c_first_name, c_last_name from customer where c_last_name rlike '^[A-C].*';
-- If you are not sure about case, leading/trailing spaces, and so on, you can process the
-- column using string functions first.
select c_first_name, c_last_name from customer where lower(trim(c_last_name)) regexp '^de.*';
select c_first_name, c_last_name from customer where lower(trim(c_last_name)) rlike '^de.*';