Authorization determines which users are allowed to access which resources, and what
operations they are allowed to perform. You use Apache Ranger for authorization.
By default, when authorization is not enabled, Impala does all read and
write operations with the privileges of the impala
user, which is
suitable for a development/test environment but not for a secure production environment.
When authorization is enabled, Impala uses the OS user ID of the user who runs
impala-shell or other client programs, and associates various
privileges with each user.
See the following sections for details about using the Impala authorization features.
Privileges can be granted on different objects in the schema. Any privilege that can be granted is associated with a level in the object hierarchy. If a privilege is granted on a parent object in the hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other database systems.
The objects in the Impala schema hierarchy are:
Server
URI
Database
Table
Column
The table-level privileges apply to views as well. Anywhere you specify a table name, you can specify a view name instead.
In Impala 2.3 and higher, you can specify privileges for individual columns.
SELECT
, INSERT
,
CREATE
, ALTER
, DROP
,
or REFRESH
privilege.
SELECT
, INSERT
,
CREATE
, ALTER
, DROP
,
and REFRESH
privileges.
TABLE
scope, the
same privilege granted on DATABASE
and SERVER
will
allow the user to execute the specified SQL statement.
SQL Statement | Privileges |
Object Type / Resource Type |
SELECT | SELECT | TABLE |
WITH SELECT | SELECT | TABLE |
EXPLAIN SELECT | SELECT | TABLE |
INSERT | INSERT | TABLE |
EXPLAIN INSERT | INSERT | TABLE |
TRUNCATE | INSERT | TABLE |
LOAD | INSERT | TABLE |
ALL | URI | |
CREATE DATABASE | CREATE | SERVER |
CREATE DATABASE LOCATION | CREATE | SERVER |
ALL | URI | |
CREATE TABLE | CREATE | DATABASE |
CREATE TABLE LIKE | CREATE | DATABASE |
SELECT, INSERT, or REFRESH | TABLE | |
CREATE TABLE AS SELECT | CREATE | DATABASE |
INSERT | DATABASE | |
SELECT | TABLE | |
EXPLAIN CREATE TABLE AS SELECT | CREATE | DATABASE |
INSERT | DATABASE | |
SELECT | TABLE | |
CREATE TABLE LOCATION | CREATE | TABLE |
ALL | URI | |
CREATE VIEW | CREATE | DATABASE |
SELECT | TABLE | |
ALTER DATABASE SET OWNER | ALL WITH GRANT | DATABASE |
ALTER TABLE | ALTER | TABLE |
ALTER TABLE SET LOCATION | ALTER | TABLE |
ALL | URI | |
ALTER TABLE RENAME | CREATE | DATABASE |
ALL | TABLE | |
ALTER TABLE SET OWNER | ALL WITH GRANT | TABLE |
ALTER VIEW | ALTER | TABLE |
SELECT | TABLE | |
ALTER VIEW RENAME | CREATE | DATABASE |
ALL | TABLE | |
ALTER VIEW SET OWNER | ALL WITH GRANT | VIEW |
DROP DATABASE | DROP | DATABASE |
DROP TABLE | DROP | TABLE |
DROP VIEW | DROP | TABLE |
CREATE FUNCTION | CREATE | DATABASE |
ALL | URI | |
DROP FUNCTION | DROP | DATABASE |
COMPUTE STATS | ALTER and SELECT | TABLE |
DROP STATS | ALTER | TABLE |
INVALIDATE METADATA | REFRESH | SERVER |
INVALIDATE METADATA <table> | REFRESH | TABLE |
REFRESH <table> | REFRESH | TABLE |
REFRESH AUTHORIZATION | REFRESH | SERVER |
REFRESH FUNCTIONS | REFRESH | DATABASE |
COMMENT ON DATABASE | ALTER | DATABASE |
COMMENT ON TABLE | ALTER | TABLE |
COMMENT ON VIEW | ALTER | TABLE |
COMMENT ON COLUMN | ALTER | TABLE |
DESCRIBE DATABASE | SELECT, INSERT, or REFRESH | DATABASE |
DESCRIBE <table/view> | SELECT, INSERT, or REFRESH | TABLE |
If the user has the SELECT privilege at the COLUMN level, only the columns the user has access will show. | SELECT | COLUMN |
USE | ANY | TABLE |
SHOW DATABASES | ANY | TABLE |
SHOW TABLES | ANY | TABLE |
SHOW FUNCTIONS | SELECT, INSERT, or REFRESH | DATABASE |
SHOW PARTITIONS | SELECT, INSERT, or REFRESH | TABLE |
SHOW TABLE STATS | SELECT, INSERT, or REFRESH | TABLE |
SHOW COLUMN STATS | SELECT, INSERT, or REFRESH | TABLE |
SHOW FILES | SELECT, INSERT, or REFRESH | TABLE |
SHOW CREATE TABLE | SELECT, INSERT, or REFRESH | TABLE |
SHOW CREATE VIEW | SELECT, INSERT, or REFRESH | TABLE |
SHOW CREATE FUNCTION | SELECT, INSERT, or REFRESH | DATABASE |
SHOW RANGE PARTITIONS (Kudu only) | SELECT, INSERT, or REFRESH | TABLE |
UPDATE (Kudu only) | ALL | TABLE |
EXPLAIN UPDATE (Kudu only) | ALL | TABLE |
UPSERT (Kudu only) | ALL | TABLE |
WITH UPSERT (Kudu only) | ALL | TABLE |
EXPLAIN UPSERT (Kudu only) | ALL | TABLE |
DELETE (Kudu only) | ALL | TABLE |
EXPLAIN DELETE (Kudu only) | ALL | TABLE |
Privileges are managed via the GRANT
and REVOKE
SQL
statements that require the Ranger service enabled.
If you change privileges outside of Impala, e.g. adding a user, removing a user,
modifying privileges, you must clear the Impala Catalog server cache by running the
REFRESH AUTHORIZATION
statement. REFRESH AUTHORIZATION
is not required if you make the changes to privileges within Impala.
Object ownership for tables, views and databases is enabled by default in Impala.
To define owner specific privileges, go to ranger UI and define appropriate policies on
the {OWNER}
user.
The CREATE
statements implicitly make the user running the statement
the owner of the object. For example, if User A creates a database,
foo, via the CREATE DATABASE
statement,
User A now owns the foo database and is authorized
to perform any operation on the foo database.
An ownership can be transferred to another user or role via the ALTER
DATABASE
, ALTER TABLE
, or ALTER VIEW
with the
SET OWNER
clause.
SHOW
TABLES
statement The owner can still query the table.
To enable authorization in an Impala cluster using Ranger:
IMPALA_SERVER_ARGS
and the
IMPALA_CATALOG_ARGS
settings in the
/etc/default/impala configuration file:
-server_name
: Specify the same name for all
impalad nodes and the catalogd
in the cluster. -ranger_service_type=hive
-ranger_app_id
: Set it to the Ranger application id. -authorization_provider=ranger
catalogd
and all impalad daemons.
You set up privileges through the GRANT
and REVOKE
statements in either Impala or Hive.
For information about using the Impala GRANT
and
REVOKE
statements, see GRANT Statement (Impala 2.0 or higher only)
and REVOKE Statement (Impala 2.0 or higher only).
If you make a change to privileges in Ranger from outside of Impala, e.g. adding a user, removing a user, modifying privileges, there are two options to propagate the change:
ranger.plugin.hive.policy.pollIntervalMs
property to
specify how often to do a Ranger refresh. The property is specified in
ranger-hive-security.xml
in the conf
directory
under your Impala home directory.
INVALIDATE METADATA
or REFRESH
AUTHORIZATION
statement to force a refresh.
If you make a change to privileges within Impala, INVALIDATE METADATA
is not required.
INVALIDATE METADATA
is an expensive operation, you should use it
judiciously.
URIs represent the file paths you specify as part of statements such as CREATE
EXTERNAL TABLE
and LOAD DATA
. Typically, you specify what look
like UNIX paths, but these locations can also be prefixed with hdfs://
to
make clear that they are really URIs. To set privileges for a URI, specify the name of a
directory, and the privilege applies to all the files in that directory.
hdfs://
, s3a://
,
adl://
, or file://
. If a URI starts with an absolute
path, the path will be appended to the default filesystem prefix. For example, if you
specify:
GRANT ALL ON URI '/tmp';
The above statement effectively becomes the following where the default filesystem is
HDFS.
GRANT ALL ON URI 'hdfs://localhost:20500/tmp';
GRANT ALL ON URI file:///path/to/dir TO <role>
GRANT ALL ON URI hdfs://namenode:port/path/to/dir TO <role>
GRANT ALL ON URI hdfs://ha-nn-uri/path/to/dir TO <role>
The following examples show how to set up authorization to grant privileges on objects to groups of users via roles.
If a user has no privileges at all, that user cannot access any schema objects in the system. The error messages do not disclose the names or existence of objects that the user is not authorized to read.
This is the experience you want a user to have if they somehow log into a system where they are not an authorized Impala user. Or in a real deployment, a user might have no privileges because they are not a member of any of the authorized groups.
In this example, the SQL statements grant the entire_server
role all
privileges on both the databases and URIs within the server.
CREATE ROLE entire_server;
GRANT ROLE entire_server TO GROUP admin_group;
GRANT ALL ON SERVER server1 TO ROLE entire_server;
If a user has privileges for specific tables in specific databases, the user can
access those things but nothing else. They can see the tables and their parent databases
in the output of SHOW TABLES
and SHOW DATABASES
,
USE
the appropriate databases, and perform the relevant actions
(SELECT
and/or INSERT
) based on the table
privileges. To actually create a table requires the ALL
privilege at
the database level, so you might define separate roles for the user that sets up a
schema and other users or applications that perform day-to-day operations on the tables.
CREATE ROLE one_database;
GRANT ROLE one_database TO GROUP admin_group;
GRANT ALL ON DATABASE db1 TO ROLE one_database;
CREATE ROLE instructor;
GRANT ROLE instructor TO GROUP trainers;
GRANT ALL ON TABLE db1.lesson TO ROLE instructor;
# This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
CREATE ROLE student;
GRANT ROLE student TO GROUP visitors;
GRANT SELECT ON TABLE db1.training TO ROLE student;
When data is being inserted through the LOAD DATA
statement or is
referenced from an HDFS location outside the normal Impala database directories, the
user also needs appropriate permissions on the URIs corresponding to those HDFS
locations.
In this example:
external_table
role can insert into and query the Impala
table, external_table.sample
. staging_dir
role can specify the HDFS path
/user/impala-user/external_data with the LOAD
DATA
statement. When Impala queries or loads data files, it operates on all
the files in that directory, not just a single file, so any Impala
LOCATION
parameters refer to a directory rather than an individual
file. CREATE ROLE external_table;
GRANT ROLE external_table TO GROUP impala_users;
GRANT ALL ON TABLE external_table.sample TO ROLE external_table;
CREATE ROLE staging_dir;
GRANT ROLE staging TO GROUP impala_users;
GRANT ALL ON URI 'hdfs://127.0.0.1:8020/user/impala-user/external_data' TO ROLE staging_dir;
To create a database, you need the full privilege on that database while day-to-day operations on tables within that database can be performed with lower levels of privilege on a specific table. Thus, you might set up separate roles for each database or application: an administrative one that could create or drop the database, and a user-level one that can access only the relevant tables.
In this example, the responsibilities are divided between users in 3 different groups:
supergroup
group have the
training_sysadmin
role and so can set up a database named
training
. impala_users
group have the
instructor
role and so can create, insert into, and query any
tables in the training
database, but cannot create or drop the
database itself. visitor
group have the student
role and so can query those tables in the training
database. CREATE ROLE training_sysadmin;
GRANT ROLE training_sysadmin TO GROUP supergroup;
GRANT ALL ON DATABASE training TO ROLE training_sysadmin;
CREATE ROLE instructor;
GRANT ROLE instructor TO GROUP impala_users;
GRANT ALL ON TABLE training.course1 TO ROLE instructor;
CREATE ROLE student;
GRANT ROLE student TO GROUP visitor;
GRANT SELECT ON TABLE training.course1 TO ROLE student;
In your role definitions, you must specify privileges at the level of individual databases and tables, or all databases or all tables within a database. To simplify the structure of these rules, plan ahead of time how to name your schema objects so that data with different authorization requirements are divided into separate databases.
If you are adding security on top of an existing Impala deployment, you can rename
tables or even move them between databases using the ALTER TABLE
statement.
Because of the extra emphasis on granular access controls in a secure deployment, you
should move any important or sensitive information out of the DEFAULT
database into a named database. Sometimes you might need to give privileges on the
DEFAULT
database for administrative reasons, for example, as a place
you can reliably specify with a USE
statement when preparing to drop a
database.
Ranger column masking hides sensitive columnar data in Impala query output. For example, you can define a policy that reveals only the first or last four characters of column data. Column masking is enabled by default. The Impala behavior mimics Hive behavior with respect to column masking. For more information, see the Apache Ranger documentation.
Type | Name | Description | Transformer |
---|---|---|---|
MASK | Redact | Replace lowercase with 'x', uppercase with 'X', digits with '0' | mask({col}) |
MASK_SHOW_LAST_4 | Partial mask: show last 4 | Show last 4 characters; replace rest with 'x' | mask_show_last_n({col}, 4, 'x', 'x', 'x', -1, '1') |
MASK_SHOW_FIRST_4 | Partial mask: show first 4 | Show first 4 characters; replace rest with 'x' | mask_show_first_n({col}, 4, 'x', 'x', 'x', -1, '1') |
MASK_HASH | Hash | Hash the value | mask_hash({col}) |
MASK_NULL | Nullify | Replace with NULL | N/A |
MASK_NONE | Unmasked (retain original value) | No masking | N/A |
MASK_DATE_SHOW_YEAR | Date: show only year | Date: show only year | mask({col}, 'x', 'x', 'x', -1, '1', 1, 0, -1) |
CUSTOM | Custom | Custom | N/A |
The mask functions in Hive are implemented through GenericUDFs. Even though Impala users can call Hive UDFs, Impala does not yet support Hive GenericUDFs, so you cannot use Hive's mask functions in Impala. However, Impala has builtin mask functions that are implemented through overloads. In Impala, when using mask functions, not all parameter combinations are supported. These mask functions are introduced in Impala 3.4
The following list includes all the implemented overloads.
int
type for full functionality. Char
argument needs to be converted to their ASCII value.To list the available overloads, use the following query:
show functions in _impala_builtins like "mask*";