When Impala writes Parquet data files using the INSERT
statement, the underlying compression
is controlled by the COMPRESSION_CODEC
query option.
PARQUET_COMPRESSION_CODEC
. In Impala 2.0 and
later, the PARQUET_COMPRESSION_CODEC
name is not recognized. Use the more general name
COMPRESSION_CODEC
for new code.
Syntax:
SET COMPRESSION_CODEC=codec_name; // Supported for all codecs.
SET COMPRESSION_CODEC=codec_name:compression_level; // Only supported for ZSTD.
The allowed values for this query option are SNAPPY
(the default), GZIP
,
ZSTD
, LZ4
, and NONE
.
ZSTD
also supports setting a compression level. The lower the level, the faster the speed at
the cost of compression ratio. Compression levels from 1 up to 22 are supported for ZSTD
.
The default compression level 3 is used, if one is not passed using the compression_codec
query option.
COMPRESSION_CODEC=NONE
is still typically smaller than the
original data, due to encoding schemes such as run-length encoding and dictionary encoding that are applied
separately from compression.
The option value is not case-sensitive.
If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option
setting, not just queries involving Parquet tables. (The value BZIP2
is also recognized, but
is not compatible with Parquet tables.)
Type: STRING
Default: SNAPPY
Examples:
set compression_codec=lz4;
insert into parquet_table_lz4_compressed select * from t1;
set compression_codec=zstd; // Default compression level 3.
insert into parquet_table_zstd_default_compressed select * from t1;
set compression_codec=zstd:12; // Compression level 12.
insert into parquet_table_zstd_highly_compressed select * from t1;
set compression_codec=gzip;
insert into parquet_table_highly_compressed select * from t1;
set compression_codec=snappy;
insert into parquet_table_compression_plus_fast_queries select * from t1;
set compression_codec=none;
insert into parquet_table_no_compression select * from t1;
set compression_codec=foo;
select * from t1 limit 5;
ERROR: Invalid compression codec: foo
Related information:
For information about how compressing Parquet data files affects query performance, see Compressions for Parquet Data Files.