Impala Authorization

The Privilege Model

Privileges can be granted on different objects in the schema. Any privilege that can be granted is associated with a level in the object hierarchy. If a privilege is granted on a parent object in the hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other database systems.

The objects in the Impala schema hierarchy are:

Server
    URI
    Database
        Table
            Column

The table-level privileges apply to views as well. Anywhere you specify a table name, you can specify a view name instead.

In Impala 2.3 and higher, you can specify privileges for individual columns.

The table below lists the minimum level of privileges and the scope required to execute SQL statements in Impala 3.0 and higher. The following notations are used:

The SERVER resource type in Ranger implies all databases, all tables, all columns, all UDFs, and all URIs.
ANY denotes the SELECT, INSERT, CREATE, ALTER, DROP, or REFRESH privilege.
ALL privilege denotes the SELECT, INSERT, CREATE, ALTER, DROP, and REFRESH privileges.
The owner of an object effectively has the ALL privilege on the object.
The parent levels of the specified scope are implicitly supported where a scope refers to the specific level in the object hierarchy that the privilege is granted. For example, if a privilege is listed with the TABLE scope, the same privilege granted on DATABASE and SERVER will allow the user to execute the specified SQL statement.

SQL Statement	Privileges	Object Type / Resource Type
SELECT	SELECT	TABLE
WITH SELECT	SELECT	TABLE
EXPLAIN SELECT	SELECT	TABLE
INSERT	INSERT	TABLE
EXPLAIN INSERT	INSERT	TABLE
TRUNCATE	INSERT	TABLE
LOAD	INSERT	TABLE
	ALL	URI
CREATE DATABASE	CREATE	SERVER
CREATE DATABASE LOCATION	CREATE	SERVER
	ALL	URI
CREATE TABLE	CREATE	DATABASE
CREATE TABLE LIKE	CREATE	DATABASE
	SELECT, INSERT, or REFRESH	TABLE
CREATE TABLE AS SELECT	CREATE	DATABASE
	INSERT	DATABASE
	SELECT	TABLE
EXPLAIN CREATE TABLE AS SELECT	CREATE	DATABASE
	INSERT	DATABASE
	SELECT	TABLE
CREATE TABLE LOCATION	CREATE	TABLE
	ALL	URI
CREATE VIEW	CREATE	DATABASE
	SELECT	TABLE
ALTER DATABASE SET OWNER	ALL WITH GRANT	DATABASE
ALTER TABLE	ALTER	TABLE
ALTER TABLE SET LOCATION	ALTER	TABLE
	ALL	URI
ALTER TABLE RENAME	CREATE	DATABASE
	ALL	TABLE
ALTER TABLE SET OWNER	ALL WITH GRANT	TABLE
ALTER VIEW	ALTER	TABLE
	SELECT	TABLE
ALTER VIEW RENAME	CREATE	DATABASE
	ALL	TABLE
ALTER VIEW SET OWNER	ALL WITH GRANT	VIEW
DROP DATABASE	DROP	DATABASE
DROP TABLE	DROP	TABLE
DROP VIEW	DROP	TABLE
CREATE FUNCTION	CREATE	DATABASE
	ALL	URI
DROP FUNCTION	DROP	DATABASE
COMPUTE STATS	ALTER and SELECT	TABLE
DROP STATS	ALTER	TABLE
INVALIDATE METADATA	REFRESH	SERVER
INVALIDATE METADATA <table>	REFRESH	TABLE
REFRESH <table>	REFRESH	TABLE
REFRESH AUTHORIZATION	REFRESH	SERVER
REFRESH FUNCTIONS	REFRESH	DATABASE
COMMENT ON DATABASE	ALTER	DATABASE
COMMENT ON TABLE	ALTER	TABLE
COMMENT ON VIEW	ALTER	TABLE
COMMENT ON COLUMN	ALTER	TABLE
DESCRIBE DATABASE	SELECT, INSERT, or REFRESH	DATABASE
DESCRIBE <table/view>	SELECT, INSERT, or REFRESH	TABLE
If the user has the SELECT privilege at the COLUMN level, only the columns the user has access will show.	SELECT	COLUMN
USE	ANY	TABLE
SHOW DATABASES	ANY	TABLE
SHOW TABLES	ANY	TABLE
SHOW FUNCTIONS	SELECT, INSERT, or REFRESH	DATABASE
SHOW PARTITIONS	SELECT, INSERT, or REFRESH	TABLE
SHOW TABLE STATS	SELECT, INSERT, or REFRESH	TABLE
SHOW COLUMN STATS	SELECT, INSERT, or REFRESH	TABLE
SHOW FILES	SELECT, INSERT, or REFRESH	TABLE
SHOW CREATE TABLE	SELECT, INSERT, or REFRESH	TABLE
SHOW CREATE VIEW	SELECT, INSERT, or REFRESH	TABLE
SHOW CREATE FUNCTION	SELECT, INSERT, or REFRESH	DATABASE
SHOW RANGE PARTITIONS (Kudu only)	SELECT, INSERT, or REFRESH	TABLE
UPDATE (Kudu only)	ALL	TABLE
EXPLAIN UPDATE (Kudu only)	ALL	TABLE
UPSERT (Kudu only)	ALL	TABLE
WITH UPSERT (Kudu only)	ALL	TABLE
EXPLAIN UPSERT (Kudu only)	ALL	TABLE
DELETE (Kudu only)	ALL	TABLE
EXPLAIN DELETE (Kudu only)	ALL	TABLE

Privileges are managed via the GRANT and REVOKE SQL statements that require the Ranger service enabled.

If you change privileges outside of Impala, e.g. adding a user, removing a user, modifying privileges, you must clear the Impala Catalog server cache by running the REFRESH AUTHORIZATION statement. REFRESH AUTHORIZATION is not required if you make the changes to privileges within Impala.

Object Ownership in Ranger

Object ownership for tables, views and databases is enabled by default in Impala.

To define owner specific privileges, go to ranger UI and define appropriate policies on the {OWNER} user.

The CREATE statements implicitly make the user running the statement the owner of the object. For example, if User A creates a database, foo, via the CREATE DATABASE statement, User A now owns the foo database and is authorized to perform any operation on the foo database.

An ownership can be transferred to another user or role via the ALTER DATABASE, ALTER TABLE, or ALTER VIEW with the SET OWNER clause.

Note: Currently, due to a known issue (IMPALA-8937), until the ownership information is fully loaded in the coordinator catalog cache, the owner of a table might not be able to see the table when executing the

SHOW
        TABLES

statement The owner can still query the table.

Starting Impala with Ranger Authorization Enabled

To enable authorization in an Impala cluster using Ranger:

Add the following options to the IMPALA_SERVER_ARGS and the IMPALA_CATALOG_ARGS settings in the /etc/default/impala configuration file:
- -server_name: Specify the same name for all impalad nodes and the catalogd in the cluster.
- -ranger_service_type=hive
- -ranger_app_id: Set it to the Ranger application id.
- -authorization_provider=ranger
Restart the catalogd and all impalad daemons.

Managing Privileges

You set up privileges through the GRANT and REVOKE statements in either Impala or Hive.

For information about using the Impala GRANT and REVOKE statements, see GRANT Statement (Impala 2.0 or higher only) and REVOKE Statement (Impala 2.0 or higher only).

Changing Privileges from Outside of Impala

If you make a change to privileges in Ranger from outside of Impala, e.g. adding a user, removing a user, modifying privileges, there are two options to propagate the change:

Use the ranger.plugin.hive.policy.pollIntervalMs property to specify how often to do a Ranger refresh. The property is specified in ranger-hive-security.xml in the conf directory under your Impala home directory.
Run the INVALIDATE METADATA or REFRESH AUTHORIZATION statement to force a refresh.

If you make a change to privileges within Impala, INVALIDATE METADATA is not required.

Warning: As INVALIDATE METADATA is an expensive operation, you should use it judiciously.

Granting Privileges on URI

URIs represent the file paths you specify as part of statements such as CREATE EXTERNAL TABLE and LOAD DATA. Typically, you specify what look like UNIX paths, but these locations can also be prefixed with hdfs:// to make clear that they are really URIs. To set privileges for a URI, specify the name of a directory, and the privilege applies to all the files in that directory.

URIs must start with hdfs://, s3a://, adl://, or file://. If a URI starts with an absolute path, the path will be appended to the default filesystem prefix. For example, if you specify:


GRANT ALL ON URI '/tmp';

The above statement effectively becomes the following where the default filesystem is HDFS.


GRANT ALL ON URI 'hdfs://localhost:20500/tmp';

When defining URIs for HDFS, you must also specify the NameNode. For example:

GRANT ALL ON URI file:///path/to/dir TO <role>
GRANT ALL ON URI hdfs://namenode:port/path/to/dir TO <role>

Warning: Because the NameNode host and port must be specified, it is strongly recommended that you use High Availability (HA). This ensures that the URI will remain constant even if the NameNode changes. For example:

GRANT ALL ON URI hdfs://ha-nn-uri/path/to/dir TO <role>

Examples of Setting up Authorization for Security Scenarios

The following examples show how to set up authorization to grant privileges on objects to groups of users via roles.

A User with No Privileges

If a user has no privileges at all, that user cannot access any schema objects in the system. The error messages do not disclose the names or existence of objects that the user is not authorized to read.

This is the experience you want a user to have if they somehow log into a system where they are not an authorized Impala user. Or in a real deployment, a user might have no privileges because they are not a member of any of the authorized groups.

Examples of Privileges for Administrative Users

In this example, the SQL statements grant the entire_server role all privileges on both the databases and URIs within the server.

CREATE ROLE entire_server;
GRANT ROLE entire_server TO GROUP admin_group;
GRANT ALL ON SERVER server1 TO ROLE entire_server;

A User with Privileges for Specific Databases and Tables

If a user has privileges for specific tables in specific databases, the user can access those things but nothing else. They can see the tables and their parent databases in the output of SHOW TABLES and SHOW DATABASES, USE the appropriate databases, and perform the relevant actions (SELECT and/or INSERT) based on the table privileges. To actually create a table requires the ALL privilege at the database level, so you might define separate roles for the user that sets up a schema and other users or applications that perform day-to-day operations on the tables.


CREATE ROLE one_database;
GRANT ROLE one_database TO GROUP admin_group;
GRANT ALL ON DATABASE db1 TO ROLE one_database;

CREATE ROLE instructor;
GRANT ROLE instructor TO GROUP trainers;
GRANT ALL ON TABLE db1.lesson TO ROLE instructor;

# This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
CREATE ROLE student;
GRANT ROLE student TO GROUP visitors;
GRANT SELECT ON TABLE db1.training TO ROLE student;

Privileges for Working with External Data Files

When data is being inserted through the LOAD DATA statement or is referenced from an HDFS location outside the normal Impala database directories, the user also needs appropriate permissions on the URIs corresponding to those HDFS locations.

In this example:

The external_table role can insert into and query the Impala table, external_table.sample.
The staging_dir role can specify the HDFS path /user/impala-user/external_data with the LOAD DATA statement. When Impala queries or loads data files, it operates on all the files in that directory, not just a single file, so any Impala LOCATION parameters refer to a directory rather than an individual file.

CREATE ROLE external_table;
GRANT ROLE external_table TO GROUP impala_users;
GRANT ALL ON TABLE external_table.sample TO ROLE external_table;

CREATE ROLE staging_dir;
GRANT ROLE staging TO GROUP impala_users;
GRANT ALL ON URI 'hdfs://127.0.0.1:8020/user/impala-user/external_data' TO ROLE staging_dir;

Separating Administrator Responsibility from Read and Write Privileges

To create a database, you need the full privilege on that database while day-to-day operations on tables within that database can be performed with lower levels of privilege on a specific table. Thus, you might set up separate roles for each database or application: an administrative one that could create or drop the database, and a user-level one that can access only the relevant tables.

In this example, the responsibilities are divided between users in 3 different groups:

Members of the supergroup group have the training_sysadmin role and so can set up a database named training.
Members of the impala_users group have the instructor role and so can create, insert into, and query any tables in the training database, but cannot create or drop the database itself.
Members of the visitor group have the student role and so can query those tables in the training database.

CREATE ROLE training_sysadmin;
GRANT ROLE training_sysadmin TO GROUP supergroup;
GRANT ALL ON DATABASE training TO ROLE training_sysadmin;

CREATE ROLE instructor;
GRANT ROLE instructor TO GROUP impala_users;
GRANT ALL ON TABLE training.course1 TO ROLE instructor;

CREATE ROLE student;
GRANT ROLE student TO GROUP visitor;
GRANT SELECT ON TABLE training.course1 TO ROLE student;

Setting Up Schema Objects for a Secure Impala Deployment

In your role definitions, you must specify privileges at the level of individual databases and tables, or all databases or all tables within a database. To simplify the structure of these rules, plan ahead of time how to name your schema objects so that data with different authorization requirements are divided into separate databases.

If you are adding security on top of an existing Impala deployment, you can rename tables or even move them between databases using the ALTER TABLE statement.

The DEFAULT Database in a Secure Deployment

Because of the extra emphasis on granular access controls in a secure deployment, you should move any important or sensitive information out of the DEFAULT database into a named database. Sometimes you might need to give privileges on the DEFAULT database for administrative reasons, for example, as a place you can reliably specify with a USE statement when preparing to drop a database.

Ranger Column Masking

Ranger column masking hides sensitive columnar data in Impala query output. For example, you can define a policy that reveals only the first or last four characters of column data. Column masking is enabled by default. The Impala behavior mimics Hive behavior with respect to column masking. For more information, see the Apache Ranger documentation.

The following table lists all supported, built-in mask types for defining column masking in a policy using the Ranger REST API.

Type	Name	Description	Transformer
MASK	Redact	Replace lowercase with 'x', uppercase with 'X', digits with '0'	mask({col})
MASK_SHOW_LAST_4	Partial mask: show last 4	Show last 4 characters; replace rest with 'x'	mask_show_last_n({col}, 4, 'x', 'x', 'x', -1, '1')
MASK_SHOW_FIRST_4	Partial mask: show first 4	Show first 4 characters; replace rest with 'x'	mask_show_first_n({col}, 4, 'x', 'x', 'x', -1, '1')
MASK_HASH	Hash	Hash the value	mask_hash({col})
MASK_NULL	Nullify	Replace with NULL	N/A
MASK_NONE	Unmasked (retain original value)	No masking	N/A
MASK_DATE_SHOW_YEAR	Date: show only year	Date: show only year	mask({col}, 'x', 'x', 'x', -1, '1', 1, 0, -1)
CUSTOM	Custom	Custom	N/A

Limitations on Mask Functions

The mask functions in Hive are implemented through GenericUDFs. Even though Impala users can call Hive UDFs, Impala does not yet support Hive GenericUDFs, so you cannot use Hive's mask functions in Impala. However, Impala has builtin mask functions that are implemented through overloads. In Impala, when using mask functions, not all parameter combinations are supported. These mask functions are introduced in Impala 3.4

The following list includes all the implemented overloads.

Overloads used by Ranger default masking policies,
Overloads with simple arguments,
Overload with all arguments in int type for full functionality. Char argument needs to be converted to their ASCII value.

To list the available overloads, use the following query:

show functions in _impala_builtins like "mask*";

Note:

An error message that states "No matching function with signature: mask..." implies that Impala does not contain the corresponding overload.