In Impala, the TIMESTAMP
data type holds a value of date and time. It can
be decomposed into year, month, day, hour, minute and seconds fields, but with no time
zone information available, it does not correspond to any specific point in time.
Internally, the resolution of the time portion of a TIMESTAMP
value is in
nanoseconds.
Syntax:
In the column definition of a CREATE TABLE
statement:
column_name TIMESTAMP
timestamp [+ | -] INTERVAL interval
DATE_ADD (timestamp, INTERVAL interval time_unit)
Range: 1400-01-01 to 9999-12-31
Out of range TIMESTAMP
values are converted to NULL.
The range of Impala TIMESTAMP
is different from the Hive
TIMESTAMP
type. Refer to
Hive
documentation for detail.
INTERVAL expressions:
You can perform date arithmetic by adding or subtracting a specified number of time units,
using the INTERVAL
keyword and the +
operator, the
-
operator, date_add()
or date_sub()
.
time_unit
in the
INTERVAL
clause:
YEAR[S]
MONTH[S]
WEEK[S]
DAY[S]
HOUR[S]
MINUTE[S]
SECOND[S]
MILLISECOND[S]
MICROSECOND[S]
NANOSECOND[S]
You can only specify one time unit in each interval expression, for example
INTERVAL 3 DAYS
or INTERVAL 25 HOURS
, but you can
produce any granularity by adding together successive INTERVAL
values,
such as timestamp_value + INTERVAL 3 WEEKS - INTERVAL 1 DAY +
INTERVAL 10 MICROSECONDS
.
Internal details: Represented in memory as a 16-byte value.
Time zones:
By default, Impala stores and interprets TIMESTAMP
values in UTC time
zone when writing to data files, reading from data files, or converting to and from system
time values through functions.
‑‑use_local_tz_for_unix_timestamp_conversions
startup flag to
TRUE
, Impala treats the TIMESTAMP
values specified in
the local time zone. The local time zone is determined in the following order with the
TIMEZONE
query option takes the highest precedence:
TIMEZONE
query option
$TZ
environment variable
The ‑‑use_local_tz_for_unix_timestamp_conversions
setting can
be used to fix discrepancy in INTERVAL
operations. For example, a
TIMESTAMP + INTERVAL n-hours
can be affected by
Daylight Saving Time, which Impala does not consider by default as these operations are
applied as if the timestamp was in UTC. You can use the
--use_local_tz_for_unix_timestamp_conversions
setting to fix the issue.
See Customizing Time Zones for configuring to use custom time zone database and aliases.
See Impala Date and Time
Functions for the list of functions affected by the
--use_local_tz_for_unix_timestamp_conversions
setting.
Time zone handling between Impala and Hive:
Interoperability between Hive and Impala is different depending on the file format.
For text tables, TIMESTAMP
values can be written and read
interchangeably by Impala and Hive as Hive reads and writes TIMESTAMP
values without converting with respect to time zones.
INT96 TIMESTAMP
. See
Data Type Considerations for Parquet Tables for information about Parquet
data types.
When Hive writes to Parquet data files, the TIMESTAMP
values are
normalized to UTC from the local time zone of the host where the data was written. On
the other hand, Impala does not make any time zone adjustment when it writes or reads
INT96 TIMESTAMP
values to Parquet files. This difference in time zone
handling can cause potentially inconsistent results when Impala processes
TIMESTAMP
values in the Parquet files written by Hive.
‑use_local_tz_for_unix_timestamp_conversions=true
‑convert_legacy_hive_parquet_utc_timestamps=true
When the ‑‑convert_legacy_hive_parquet_utc_timestamps
setting is enabled, Impala recognizes the Parquet data files written by Hive, and
applies the same UTC-to-local-timezone conversion logic during the query as Hive does.
In Impala 3.0 and lower, the
‑‑convert_legacy_hive_parquet_utc_timestamps
setting had a severe
impact on multi-threaded performance. The new time zone implementation in
Impala 3.1 eliminated most of the performance overhead and made
Impala scale well to multiple threads. The
‑‑convert_legacy_hive_parquet_utc_timestamps
setting is turned
off by default for a performance reason. To avoid unexpected incompatibility problems,
you should turn on the option when processing TIMESTAMP
columns in
Parquet files written by Hive.
Hive currently cannot write INT64
TIMESTAMP
values.
In Impala 3.2 and higher, INT64
TIMESTAMP
values annotated with the TIMESTAMP_MILLIS
or
TIMESTAMP_MICROS
OriginalType
are assumed to be
always UTC normalized, so the UTC to local conversion will be always done.
INT64 TIMESTAMP
annotated with the TIMESTAMP
LogicalType
specifies whether UTC to local conversion is necessary
depending on the Parquet metadata.
Conversions:
Impala automatically converts STRING
literals of the correct format
into TIMESTAMP
values. Timestamp values are accepted in the format
'yyyy‑MM‑dd HH:mm:ss.SSSSSS'
, and can consist of just the date, or
just the time, with or without the fractional second portion. For example, you can
specify TIMESTAMP
values such as '1966‑07‑30'
,
'08:30:00'
, or '1985‑09‑25 17:45:30.005'
.
Leading zeroes are not required in the numbers representing the date component, such as
month and date, or the time component, such as hour, minute, and second. For example,
Impala accepts both '2018‑1‑1 01:02:03'
and
'2018‑01‑01 1:2:3'
as valid.
In STRING
to TIMESTAMP
conversions, leading and
trailing white spaces, such as a space, a tab, a newline, or a carriage return, are
ignored. For example, Impala treats the following as equivalent:
'1999‑12‑01 01:02:03 ', ' 1999‑12‑01 01:02:03',
'1999‑12‑01 01:02:03\r\n\t'.
STRING
literal to
TIMESTAMP
, you can use the following separators between the date part
and the time part:
One or more space characters
Example: CAST('2001-01-09 01:05:01' AS TIMESTAMP)
The character “T”
Example: CAST('2001-01-09T01:05:01' AS TIMESTAMP)
Casting an integer or floating-point value
N
to TIMESTAMP
produces a value that is
N
seconds past the start of the epoch date (January 1, 1970). By
default, the result value represents a date and time in the UTC time zone. If the
setting ‑‑use_local_tz_for_unix_timestamp_conversions=true
is in effect, the resulting TIMESTAMP
represents a date and time in the
local time zone.
In Impala 1.3 and higher, the FROM_UNIXTIME()
and
UNIX_TIMESTAMP()
functions allow a wider range of format strings, with
more flexibility in element order, repetition of letter placeholders, and separator
characters. In Impala 2.3 and higher, the
UNIX_TIMESTAMP()
function also allows a numeric timezone offset to be
specified as part of the input string. See
Impala Date and Time Functions for details.
In Impala 2.2.0 and higher, built-in functions that accept or return integers
representing TIMESTAMP
values use the BIGINT
type for
parameters and return values, rather than INT
. This change lets the
date and time functions avoid an overflow error that would otherwise occur on January
19th, 2038 (known as the
"Year
2038 problem" or "Y2K38 problem"). This change affects the
FROM_UNIXTIME()
and UNIX_TIMESTAMP()
functions. You
might need to change application code that interacts with these functions, change the
types of columns that store the return values, or add CAST()
calls to
SQL statements that call these functions.
Partitioning:
Although you cannot use a TIMESTAMP
column as a partition key, you can
extract the individual years, months, days, hours, and so on and partition based on those
columns. Because the partition key column values are represented in HDFS directory names,
rather than as fields in the data files themselves, you can also keep the original
TIMESTAMP
values if desired, without duplicating data or wasting storage
space. See Partition Key Columns for more
details on partitioning with date and time values.
[localhost:21000] > create table timeline (event string) partitioned by (happened timestamp);
ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column type in column: happened
NULL considerations: Casting any unrecognized STRING
value to
this type produces a NULL
value.
HBase considerations: This data type is fully compatible with HBase tables.
Parquet consideration: INT96
encoded Parquet timestamps are
supported in Impala. INT64
timestamps are supported in
Impala 3.2 and higher.
Parquet considerations: This type is fully compatible with Parquet tables.
Text table considerations: Values of this type are potentially larger in text tables than in tables using Parquet or other binary formats.
Column statistics considerations: Because this type has a fixed size, the maximum
and average size fields are always filled in for column statistics, even before you run
the COMPUTE STATS
statement.
Kudu considerations:
TIMESTAMP
columns in Kudu tables, instead of representing the date and
time as a BIGINT
value. The behavior of TIMESTAMP
for
Kudu tables has some special considerations:
Any nanoseconds in the original 96-bit value produced by Impala are not stored,
because Kudu represents date/time columns using 64-bit values. The nanosecond
portion of the value is rounded, not truncated. Therefore, a
TIMESTAMP
value that you store in a Kudu table might not be
bit-for-bit identical to the value returned by a query.
The conversion between the Impala 96-bit representation and the Kudu 64-bit
representation introduces some performance overhead when reading or writing
TIMESTAMP
columns. You can minimize the overhead during writes by
performing inserts through the Kudu API. Because the overhead during reads applies
to each query, you might continue to use a BIGINT
column to
represent date/time values in performance-critical applications.
The Impala TIMESTAMP
type has a narrower range for years than the
underlying Kudu data type. Impala can represent years 1400-9999. If year values
outside this range are written to a Kudu table by a non-Impala client, Impala
returns NULL
by default when reading those
TIMESTAMP
values during a query. Or, if the
ABORT_ON_ERROR
query option is enabled, the query fails when it
encounters a value with an out-of-range year.
Restrictions:
If you cast a STRING
with an unrecognized format to a
TIMESTAMP
, the result is NULL
rather than an error. Make
sure to test your data pipeline to be sure any textual date and time values are in a
format that Impala TIMESTAMP
can recognize.
Currently, Avro tables cannot contain TIMESTAMP
columns. If you need to
store date and time values in Avro tables, as a workaround you can use a
STRING
representation of the values, convert the values to
BIGINT
with the UNIX_TIMESTAMP()
function, or create
separate numeric columns for individual date and time fields using the
EXTRACT()
function.
Examples:
The following examples demonstrate using TIMESTAMP
values with built-in
functions:
select cast('1966-07-30' as timestamp);
select cast('1985-09-25 17:45:30.005' as timestamp);
select cast('08:30:00' as timestamp);
select hour('1970-01-01 15:30:00'); -- Succeeds, returns 15.
select hour('1970-01-01 15:30'); -- Returns NULL because seconds field required.
select hour('1970-01-01 27:30:00'); -- Returns NULL because hour value out of range.
select dayofweek('2004-06-13'); -- Returns 1, representing Sunday.
select dayname('2004-06-13'); -- Returns 'Sunday'.
select date_add('2004-06-13', 365); -- Returns 2005-06-13 with zeros for hh:mm:ss fields.
select day('2004-06-13'); -- Returns 13.
select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates?
select now(); -- Returns current date and time in local timezone.
The following examples demonstrate using TIMESTAMP
values with
HDFS-backed tables:
create table dates_and_times (t timestamp);
insert into dates_and_times values
('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());
The following examples demonstrate using TIMESTAMP
values with Kudu
tables:
create table timestamp_t (x int primary key, s string, t timestamp, b bigint)
partition by hash (x) partitions 16
stored as kudu;
-- The default value of now() has microsecond precision, so the final 3 digits
-- representing nanoseconds are all zero.
insert into timestamp_t values (1, cast(now() as string), now(), unix_timestamp(now()));
-- Values with 1-499 nanoseconds are rounded down in the Kudu TIMESTAMP column.
insert into timestamp_t values (2, cast(now() + interval 100 nanoseconds as string), now() + interval 100 nanoseconds, unix_timestamp(now() + interval 100 nanoseconds));
insert into timestamp_t values (3, cast(now() + interval 499 nanoseconds as string), now() + interval 499 nanoseconds, unix_timestamp(now() + interval 499 nanoseconds));
-- Values with 500-999 nanoseconds are rounded up in the Kudu TIMESTAMP column.
insert into timestamp_t values (4, cast(now() + interval 500 nanoseconds as string), now() + interval 500 nanoseconds, unix_timestamp(now() + interval 500 nanoseconds));
insert into timestamp_t values (5, cast(now() + interval 501 nanoseconds as string), now() + interval 501 nanoseconds, unix_timestamp(now() + interval 501 nanoseconds));
-- The string representation shows how underlying Impala TIMESTAMP can have nanosecond precision.
-- The TIMESTAMP column shows how timestamps in a Kudu table are rounded to microsecond precision.
-- The BIGINT column represents seconds past the epoch and so if not affected much by nanoseconds.
select s, t, b from timestamp_t order by t;
+-------------------------------+-------------------------------+------------+
| s | t | b |
+-------------------------------+-------------------------------+------------+
| 2017-05-31 15:30:05.107157000 | 2017-05-31 15:30:05.107157000 | 1496244605 |
| 2017-05-31 15:30:28.868151100 | 2017-05-31 15:30:28.868151000 | 1496244628 |
| 2017-05-31 15:34:33.674692499 | 2017-05-31 15:34:33.674692000 | 1496244873 |
| 2017-05-31 15:35:04.769166500 | 2017-05-31 15:35:04.769167000 | 1496244904 |
| 2017-05-31 15:35:33.033082501 | 2017-05-31 15:35:33.033083000 | 1496244933 |
+-------------------------------+-------------------------------+------------+
Added in: Available in all versions of Impala.
Related information:
from_unixtime()
function requires a case-sensitive
format string such as "yyyy-MM-dd HH:mm:ss.SSSS"
, matching one of the
allowed variations of a TIMESTAMP
value (date plus time, only date,
only time, optional fractional seconds).
TIMESTAMP
handling between Impala and
Hive.