TIMESTAMP Data Type
      A data type used in CREATE TABLE and ALTER TABLE
      statements, representing a point in time.
    
Syntax:
      In the column definition of a CREATE TABLE statement:
    
column_name TIMESTAMP
      Range: Allowed date values range from 1400-01-01 to 9999-12-31; this range is
      different from the Hive TIMESTAMP type. Internally, the resolution of the
      time portion of a TIMESTAMP value is in nanoseconds.
    
INTERVAL expressions:
      You can perform date arithmetic by adding or subtracting a specified number of time units,
      using the INTERVAL keyword and the + and
      - operators or date_add() and
      date_sub() functions. You can specify units as YEAR[S],
      MONTH[S], WEEK[S], DAY[S],
      HOUR[S], MINUTE[S], SECOND[S],
      MILLISECOND[S], MICROSECOND[S], and
      NANOSECOND[S]. You can only specify one time unit in each interval
      expression, for example INTERVAL 3 DAYS or INTERVAL 25
      HOURS, but you can produce any granularity by adding together successive
      INTERVAL values, such as timestamp_value +
      INTERVAL 3 WEEKS - INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS.
    
For example:
select now() + interval 1 day;
select date_sub(now(), interval 5 minutes);
insert into auction_details
  select auction_id, auction_start_time, auction_start_time + interval 2 days + interval 12 hours
  from new_auctions;Time zones:
      By default, Impala does not store timestamps using the local timezone, to avoid undesired
      results from unexpected time zone issues. Timestamps are stored and interpreted relative
      to UTC, both when written to or read from data files, or when converted to or from Unix
      time values through functions such as from_unixtime() or
      unix_timestamp(). To convert such a TIMESTAMP value to
      one that represents the date and time in a specific time zone, convert the original value
      with the from_utc_timestamp() function.
    
      Because Impala does not assume that TIMESTAMP values are in any
      particular time zone, you must be conscious of the time zone aspects of data that you
      query, insert, or convert.
    
      For consistency with Unix system calls, the TIMESTAMP returned by the
      now() function represents the local time in the system time zone, rather
      than in UTC. To store values relative to the current time in a portable way, convert any
      now() return values using the to_utc_timestamp()
      function first. For example, the following example shows that the current time in
      California (where this Impala cluster is located) is shortly after 2 PM. If that value was
      written to a data file, and shipped off to a distant server to be analyzed alongside other
      data from far-flung locations, the dates and times would not match up precisely because of
      time zone differences. Therefore, the to_utc_timestamp() function
      converts it using a common reference point, the UTC time zone (descended from the old
      Greenwich Mean Time standard). The 'PDT' argument indicates that the
      original value is from the Pacific time zone with Daylight Saving Time in effect. When
      servers in all geographic locations run the same transformation on any local date and time
      values (with the appropriate time zone argument), the stored data uses a consistent
      representation. Impala queries can use functions such as EXTRACT(),
      MIN(), AVG(), and so on to do time-series analysis on
      those timestamps.
    
[localhost:21000] > select now();
+-------------------------------+
| now()                         |
+-------------------------------+
| 2015-04-09 14:07:46.580465000 |
+-------------------------------+
[localhost:21000] > select to_utc_timestamp(now(), 'PDT');
+--------------------------------+
| to_utc_timestamp(now(), 'pdt') |
+--------------------------------+
| 2015-04-09 21:08:07.664547000  |
+--------------------------------+
      The converse function, from_utc_timestamp(), lets you take stored
      TIMESTAMP data or calculated results and convert back to local date and
      time for processing on the application side. The following example shows how you might
      represent some future date (such as the ending date and time of an auction) in UTC, and
      then convert back to local time when convenient for reporting or other processing. The
      final query in the example tests whether this arbitrary UTC date and time has passed yet,
      by converting it back to the local time zone and comparing it against the current date and
      time.
    
[localhost:21000] > select to_utc_timestamp(now() + interval 2 weeks, 'PDT');
+---------------------------------------------------+
| to_utc_timestamp(now() + interval 2 weeks, 'pdt') |
+---------------------------------------------------+
| 2015-04-23 21:08:34.152923000                     |
+---------------------------------------------------+
[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT');
+------------------------------------------------------------+
| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') |
+------------------------------------------------------------+
| 2015-04-23 14:08:34.152923000                              |
+------------------------------------------------------------+
[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT') < now();
+--------------------------------------------------------------------+
| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') < now() |
+--------------------------------------------------------------------+
| false                                                              |
+--------------------------------------------------------------------+
      If you have data files written by Hive, those TIMESTAMP values represent
      the local timezone of the host where the data was written, potentially leading to
      inconsistent results when processed by Impala. To avoid compatibility problems or having
      to code workarounds, you can specify one or both of these impalad
      startup flags: --use_local_tz_for_unix_timestamp_conversions=true
      -convert_legacy_hive_parquet_utc_timestamps=true. Although
      -convert_legacy_hive_parquet_utc_timestamps is turned off by default to
      avoid performance overhead, where practical turn it on when processing
      TIMESTAMP columns in Parquet files written by Hive, to avoid unexpected
      behavior.
    
      The --use_local_tz_for_unix_timestamp_conversions setting affects
      conversions from TIMESTAMP to BIGINT, or from
      BIGINT to TIMESTAMP. By default, Impala treats all
      TIMESTAMP values as UTC, to simplify analysis of time-series data from
      different geographic regions. When you enable the
      --use_local_tz_for_unix_timestamp_conversions setting, these operations
      treat the input values as if they are in the local tie zone of the host doing the
      processing. See Impala Date and Time Functions
      for the list of functions affected by the
      --use_local_tz_for_unix_timestamp_conversions setting.
    
      The following sequence of examples shows how the interpretation of
      TIMESTAMP values in Parquet tables is affected by the setting of the
      -convert_legacy_hive_parquet_utc_timestamps setting.
    
      Regardless of the -convert_legacy_hive_parquet_utc_timestamps setting,
      TIMESTAMP columns in text tables can be written and read interchangeably
      by Impala and Hive:
    
Impala DDL and queries for text table:
[localhost:21000] > create table t1 (x timestamp);
[localhost:21000] > insert into t1 values (now()), (now() + interval 1 day);
[localhost:21000] > select x from t1;
+-------------------------------+
| x                             |
+-------------------------------+
| 2015-04-07 15:43:02.892403000 |
| 2015-04-08 15:43:02.892403000 |
+-------------------------------+
[localhost:21000] > select to_utc_timestamp(x, 'PDT') from t1;
+-------------------------------+
| to_utc_timestamp(x, 'pdt')    |
+-------------------------------+
| 2015-04-07 22:43:02.892403000 |
| 2015-04-08 22:43:02.892403000 |
+-------------------------------+
Hive query for text table:
hive> select * from t1;
OK
2015-04-07 15:43:02.892403
2015-04-08 15:43:02.892403
Time taken: 1.245 seconds, Fetched: 2 row(s)
      When the table uses Parquet format, Impala expects any time zone adjustment to be applied
      prior to writing, while TIMESTAMP values written by Hive are adjusted to
      be in the UTC time zone. When Hive queries Parquet data files that it wrote, it adjusts
      the TIMESTAMP values back to the local time zone, while Impala does no
      conversion. Hive does no time zone conversion when it queries Impala-written Parquet
      files.
    
Impala DDL and queries for Parquet table:
[localhost:21000] > create table p1 stored as parquet as select x from t1;
+-------------------+
| summary           |
+-------------------+
| Inserted 2 row(s) |
+-------------------+
[localhost:21000] > select x from p1;
+-------------------------------+
| x                             |
+-------------------------------+
| 2015-04-07 15:43:02.892403000 |
| 2015-04-08 15:43:02.892403000 |
+-------------------------------+
Hive DDL and queries for Parquet table:
hive> create table h1 (x timestamp) stored as parquet;
OK
hive> insert into h1 select * from p1;
...
OK
Time taken: 35.573 seconds
hive> select x from p1;
OK
2015-04-07 15:43:02.892403
2015-04-08 15:43:02.892403
Time taken: 0.324 seconds, Fetched: 2 row(s)
hive> select x from h1;
OK
2015-04-07 15:43:02.892403
2015-04-08 15:43:02.892403
Time taken: 0.197 seconds, Fetched: 2 row(s)
      The discrepancy arises when Impala queries the Hive-created Parquet table. The underlying
      values in the TIMESTAMP column are different from the ones written by
      Impala, even though they were copied from one table to another by an INSERT ...
      SELECT statement in Hive. Hive did an implicit conversion from the local time
      zone to UTC as it wrote the values to Parquet.
    
Impala query for TIMESTAMP values from Impala-written and Hive-written data:
[localhost:21000] > select * from p1;
+-------------------------------+
| x                             |
+-------------------------------+
| 2015-04-07 15:43:02.892403000 |
| 2015-04-08 15:43:02.892403000 |
+-------------------------------+
Fetched 2 row(s) in 0.29s
[localhost:21000] > select * from h1;
+-------------------------------+
| x                             |
+-------------------------------+
| 2015-04-07 22:43:02.892403000 |
| 2015-04-08 22:43:02.892403000 |
+-------------------------------+
Fetched 2 row(s) in 0.41s
Underlying integer values for Impala-written and Hive-written data:
[localhost:21000] > select cast(x as bigint) from p1;
+-------------------+
| cast(x as bigint) |
+-------------------+
| 1428421382        |
| 1428507782        |
+-------------------+
Fetched 2 row(s) in 0.38s
[localhost:21000] > select cast(x as bigint) from h1;
+-------------------+
| cast(x as bigint) |
+-------------------+
| 1428446582        |
| 1428532982        |
+-------------------+
Fetched 2 row(s) in 0.20s
      When the -convert_legacy_hive_parquet_utc_timestamps setting is enabled,
      Impala recognizes the Parquet data files written by Hive, and applies the same
      UTC-to-local-timezone conversion logic during the query as Hive uses, making the contents
      of the Impala-written P1 table and the Hive-written H1
      table appear identical, whether represented as TIMESTAMP values or the
      underlying BIGINT integers:
    
[localhost:21000] > select x from p1;
+-------------------------------+
| x                             |
+-------------------------------+
| 2015-04-07 15:43:02.892403000 |
| 2015-04-08 15:43:02.892403000 |
+-------------------------------+
Fetched 2 row(s) in 0.37s
[localhost:21000] > select x from h1;
+-------------------------------+
| x                             |
+-------------------------------+
| 2015-04-07 15:43:02.892403000 |
| 2015-04-08 15:43:02.892403000 |
+-------------------------------+
Fetched 2 row(s) in 0.19s
[localhost:21000] > select cast(x as bigint) from p1;
+-------------------+
| cast(x as bigint) |
+-------------------+
| 1428446582        |
| 1428532982        |
+-------------------+
Fetched 2 row(s) in 0.29s
[localhost:21000] > select cast(x as bigint) from h1;
+-------------------+
| cast(x as bigint) |
+-------------------+
| 1428446582        |
| 1428532982        |
+-------------------+
Fetched 2 row(s) in 0.22s
Conversions:
        Impala automatically converts STRING literals of the
        correct format into TIMESTAMP values. Timestamp values
        are accepted in the format "yyyy-MM-dd HH:mm:ss.SSSSSS",
        and can consist of just the date, or just the time, with or without the
        fractional second portion. For example, you can specify TIMESTAMP
        values such as '1966-07-30', '08:30:00',
        or '1985-09-25 17:45:30.005'.
      
Leading zeroes are not required in the numbers representing the date
        component, such as month and date, or the time component, such as
        hour, minute, and second. For example, Impala accepts both
        "2018-1-1 01:02:03" and
          "2018-01-01 1:2:3" as valid.
STRING literal to TIMESTAMP,
        you can use the following separators between the date part and the time part:
        - 
            One or more space characters Example: CAST ('2001-01-09 01:05:01' AS TIMESTAMP)
- 
            The character âTâ Example: CAST ('2001-01-09T01:05:01' AS TIMESTAMP)
      
          Casting an integer or floating-point value N to
        TIMESTAMP produces a value that is N seconds past the start of the epoch
        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
        If the setting --use_local_tz_for_unix_timestamp_conversions=true is in effect,
        the resulting TIMESTAMP represents a date and time in the local time zone.
        
    
      In Impala 1.3 and higher, the FROM_UNIXTIME() and
      UNIX_TIMESTAMP() functions allow a wider range of format strings, with
      more flexibility in element order, repetition of letter placeholders, and separator
      characters. In Impala 2.3 and higher, the
      UNIX_TIMESTAMP() function also allows a numeric timezone offset to be
      specified as part of the input string. See
      Impala Date and Time Functions for details.
    
        In Impala 2.2.0 and higher, built-in functions that accept or return integers representing TIMESTAMP values
        use the BIGINT type for parameters and return values, rather than INT.
        This change lets the date and time functions avoid an overflow error that would otherwise occur
        on January 19th, 2038 (known as the
        "Year 2038 problem" or "Y2K38 problem").
        This change affects the from_unixtime() and unix_timestamp() functions.
        You might need to change application code that interacts with these functions, change the types of
        columns that store the return values, or add CAST() calls to SQL statements that
        call these functions.
      
Partitioning:
      Although you cannot use a TIMESTAMP column as a partition key, you can
      extract the individual years, months, days, hours, and so on and partition based on those
      columns. Because the partition key column values are represented in HDFS directory names,
      rather than as fields in the data files themselves, you can also keep the original
      TIMESTAMP values if desired, without duplicating data or wasting storage
      space. See Partition Key Columns for more
      details on partitioning with date and time values.
    
[localhost:21000] > create table timeline (event string) partitioned by (happened timestamp);
ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column type in column: happened
        NULL considerations: Casting any unrecognized STRING value to this type produces a
        NULL value.
      
        Partitioning: Because this type potentially has so many distinct values, it is often not a sensible
        choice for a partition key column. For example, events 1 millisecond apart would be stored in different
        partitions. Consider using the TRUNC() function to condense the number of distinct values,
        and partition on a new column with the truncated values.
      
HBase considerations: This data type is fully compatible with HBase tables.
Parquet considerations: This type is fully compatible with Parquet tables.
Text table considerations: Values of this type are potentially larger in text tables than in tables using Parquet or other binary formats.
Internal details: Represented in memory as a 16-byte value.
Added in: Available in all versions of Impala.
        Column statistics considerations: Because this type has a fixed size, the maximum and average size
        fields are always filled in for column statistics, even before you run the COMPUTE STATS
        statement.
      
Sqoop considerations:
 If you use Sqoop to
        convert RDBMS data to Parquet, be careful with interpreting any
        resulting values from DATE, DATETIME,
        or TIMESTAMP columns. The underlying values are
        represented as the Parquet INT64 type, which is
        represented as BIGINT in the Impala table. The Parquet
        values represent the time in milliseconds, while Impala interprets
          BIGINT as the time in seconds. Therefore, if you have
        a BIGINT column in a Parquet table that was imported
        this way from Sqoop, divide the values by 1000 when interpreting as the
          TIMESTAMP type.
Restrictions:
      If you cast a STRING with an unrecognized format to a
      TIMESTAMP, the result is NULL rather than an error. Make
      sure to test your data pipeline to be sure any textual date and time values are in a
      format that Impala TIMESTAMP can recognize.
    
        Currently, Avro tables cannot contain TIMESTAMP columns. If you need to store date and
        time values in Avro tables, as a workaround you can use a STRING representation of the
        values, convert the values to BIGINT with the UNIX_TIMESTAMP() function,
        or create separate numeric columns for individual date and time fields using the EXTRACT()
        function.
      
Kudu considerations:
TIMESTAMP
        columns in Kudu tables, instead of representing the date and time as a BIGINT
        value. The behavior of TIMESTAMP for Kudu tables has some special considerations:
        - 
            Any nanoseconds in the original 96-bit value produced by Impala are not stored, because Kudu represents date/time columns using 64-bit values. The nanosecond portion of the value is rounded, not truncated. Therefore, a TIMESTAMPvalue that you store in a Kudu table might not be bit-for-bit identical to the value returned by a query.
- 
            The conversion between the Impala 96-bit representation and the Kudu 64-bit representation introduces some performance overhead when reading or writing TIMESTAMPcolumns. You can minimize the overhead during writes by performing inserts through the Kudu API. Because the overhead during reads applies to each query, you might continue to use aBIGINTcolumn to represent date/time values in performance-critical applications.
- 
            The Impala TIMESTAMPtype has a narrower range for years than the underlying Kudu data type. Impala can represent years 1400-9999. If year values outside this range are written to a Kudu table by a non-Impala client, Impala returnsNULLby default when reading thoseTIMESTAMPvalues during a query. Or, if theABORT_ON_ERRORquery option is enabled, the query fails when it encounters a value with an out-of-range year.
Examples:
      The following examples demonstrate using TIMESTAMP values with built-in
      functions:
    
select cast('1966-07-30' as timestamp);
select cast('1985-09-25 17:45:30.005' as timestamp);
select cast('08:30:00' as timestamp);
select hour('1970-01-01 15:30:00');         -- Succeeds, returns 15.
select hour('1970-01-01 15:30');            -- Returns NULL because seconds field required.
select hour('1970-01-01 27:30:00');         -- Returns NULL because hour value out of range.
select dayofweek('2004-06-13');             -- Returns 1, representing Sunday.
select dayname('2004-06-13');               -- Returns 'Sunday'.
select date_add('2004-06-13', 365);         -- Returns 2005-06-13 with zeros for hh:mm:ss fields.
select day('2004-06-13');                   -- Returns 13.
select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates?
select now();                               -- Returns current date and time in local timezone.
      The following examples demonstrate using TIMESTAMP values with
      HDFS-backed tables:
    
create table dates_and_times (t timestamp);
insert into dates_and_times values
  ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());
      The following examples demonstrate using TIMESTAMP values with Kudu
      tables:
    
create table timestamp_t (x int primary key, s string, t timestamp, b bigint)
  partition by hash (x) partitions 16
  stored as kudu;
-- The default value of now() has microsecond precision, so the final 3 digits
-- representing nanoseconds are all zero.
insert into timestamp_t values (1, cast(now() as string), now(), unix_timestamp(now()));
-- Values with 1-499 nanoseconds are rounded down in the Kudu TIMESTAMP column.
insert into timestamp_t values (2, cast(now() + interval 100 nanoseconds as string), now() + interval 100 nanoseconds, unix_timestamp(now() + interval 100 nanoseconds));
insert into timestamp_t values (3, cast(now() + interval 499 nanoseconds as string), now() + interval 499 nanoseconds, unix_timestamp(now() + interval 499 nanoseconds));
-- Values with 500-999 nanoseconds are rounded up in the Kudu TIMESTAMP column.
insert into timestamp_t values (4, cast(now() + interval 500 nanoseconds as string), now() + interval 500 nanoseconds, unix_timestamp(now() + interval 500 nanoseconds));
insert into timestamp_t values (5, cast(now() + interval 501 nanoseconds as string), now() + interval 501 nanoseconds, unix_timestamp(now() + interval 501 nanoseconds));
-- The string representation shows how underlying Impala TIMESTAMP can have nanosecond precision.
-- The TIMESTAMP column shows how timestamps in a Kudu table are rounded to microsecond precision.
-- The BIGINT column represents seconds past the epoch and so if not affected much by nanoseconds.
select s, t, b from timestamp_t order by t;
+-------------------------------+-------------------------------+------------+
| s                             | t                             | b          |
+-------------------------------+-------------------------------+------------+
| 2017-05-31 15:30:05.107157000 | 2017-05-31 15:30:05.107157000 | 1496244605 |
| 2017-05-31 15:30:28.868151100 | 2017-05-31 15:30:28.868151000 | 1496244628 |
| 2017-05-31 15:34:33.674692499 | 2017-05-31 15:34:33.674692000 | 1496244873 |
| 2017-05-31 15:35:04.769166500 | 2017-05-31 15:35:04.769167000 | 1496244904 |
| 2017-05-31 15:35:33.033082501 | 2017-05-31 15:35:33.033083000 | 1496244933 |
+-------------------------------+-------------------------------+------------+
Related information:
- Timestamp Literals.
- 
        To convert to or from different date formats, or perform date arithmetic, use the date
        and time functions described in
        Impala Date and Time Functions. In
        particular, the from_unixtime()function requires a case-sensitive format string such as"yyyy-MM-dd HH:mm:ss.SSSS", matching one of the allowed variations of aTIMESTAMPvalue (date plus time, only date, only time, optional fractional seconds).
- 
        See SQL Differences Between Impala and Hive for
        details about differences in TIMESTAMPhandling between Impala and Hive.