S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)
Speeds up INSERT
operations on tables or partitions residing on the
Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind
if an error occurs partway through the operation.
By default, Impala write operations to S3 tables and partitions involve a two-stage process. Impala writes intermediate files to S3, then (because S3 does not provide a "rename" operation) those intermediate files are copied to their final location, making the process more expensive as on a filesystem that supports renaming or moving files. This query option makes Impala skip the intermediate files, and instead write the new data directly to the final destination.
Usage notes:
If a host that is participating in the INSERT
operation fails partway through
the query, you might be left with a table or partition that contains some but not all of the
expected data files. Therefore, this option is most appropriate for a development or test
environment where you have the ability to reconstruct the table if a problem during
INSERT
leaves the data in an inconsistent state.
The timing of file deletion during an INSERT OVERWRITE
operation
makes it impractical to write new files to S3 and delete the old files in a single operation.
Therefore, this query option only affects regular INSERT
statements that add
to the existing data in a table, not INSERT OVERWRITE
statements.
Use TRUNCATE TABLE
if you need to remove all contents from an S3 table
before performing a fast INSERT
with this option enabled.
Performance improvements with this option enabled can be substantial. The speed increase might be more noticeable for non-partitioned tables than for partitioned tables.
Type: Boolean; recognized values are 1 and 0, or true
and
false
; any other value interpreted as false
Default: true
(shown as 1 in output of SET
statement)
Added in: Impala 2.6.0
Related information: