Impala supports an enterprise-grade authentication system called Kerberos. Kerberos provides strong security benefits including capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of impersonation by never sending a user's credentials in cleartext over the network. For more information on Kerberos, visit the MIT Kerberos website.
The rest of this topic assumes you have a working Kerberos Key Distribution Center (KDC) set up. To enable Kerberos, you first create a Kerberos principal for each host running impalad or statestored.
impala
). To implement
user-level access to different databases, tables, columns, partitions, and so on, use
the Sentry authorization feature, as explained in
Impala Authorization.
An alternative form of authentication you can use is LDAP, described in Enabling LDAP Authentication for Impala.
sudo yum install python-devel openssl-devel python-pip
sudo pip-python install ssl
If you plan to use Impala in your cluster, you must configure your KDC to allow
tickets to be renewed, and you must configure krb5.conf to
request renewable tickets. Typically, you can do this by adding the
max_renewable_life
setting to your realm in
kdc.conf, and by adding the renew_lifetime
parameter to the libdefaults section of
krb5.conf. For more information about renewable tickets, see the
Kerberos documentation.
Start all impalad and statestored daemons with the
‑‑principal
and ‑‑keytab-file
flags set to the principal and full path name of the keytab
file
containing the credentials for the principal.
To enable Kerberos in the Impala shell, start the impala-shell
command using the -k
flag.
To enable Impala to work with Kerberos security on your Hadoop cluster, make sure you perform the installation and configuration steps in Authentication in Hadoop. Note that when Kerberos security is enabled in Impala, a web browser that supports Kerberos HTTP SPNEGO is required to access the Impala web console (for example, Firefox, Internet Explorer, or Chrome).
If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager, NodeManagers, HttpFS, Oozie, Impala, or Impala statestore services are configured to use Kerberos HTTP SPNEGO authentication, and two or more of these services are running on the same host, then all of the running services must use the same HTTP principal and keytab file used for their HTTP endpoints.
Enabling Kerberos authentication for Impala involves steps that can be summarized as follows:
serviceName/fully.qualified.domain.name@KERBEROS.REALM
.
In Impala 2.0 and later, user()
returns the full Kerberos principal
string, such as user@example.com
, in a Kerberized environment.
/etc/default/impala
to accommodate Kerberos authentication.
$ kadmin
kadmin: addprinc -requires_preauth -randkey impala/impala_host.example.com@TEST.EXAMPLE.COM
kadmin: addprinc -randkey HTTP/impala_host.example.com@TEST.EXAMPLE.COM
HTTP
component of the service principal must be uppercase as
shown in the preceding example.
keytab
files with both principals. For example:
kadmin: xst -k impala.keytab impala/impala_host.example.com
kadmin: xst -k http.keytab HTTP/impala_host.example.com
kadmin: quit
ktutil
to read the contents of the two keytab files and then
write those contents to a new file. For example:
$ ktutil
ktutil: rkt impala.keytab
ktutil: rkt http.keytab
ktutil: wkt impala-http.keytab
ktutil: quit
$ klist -e -k -t impala-http.keytab
impala
user. By default, the Impala user and
group are both named impala
. For example:
$ cp impala-http.keytab /etc/impala/conf
$ cd /etc/impala/conf
$ chmod 400 impala-http.keytab
$ chown impala:impala impala-http.keytab
IMPALA_SERVER_ARGS
and IMPALA_STATE_STORE_ARGS
variables. For example, you might add:
-kerberos_reinit_interval=60
-principal=impala_1/impala_host.example.com@TEST.EXAMPLE.COM
-keytab_file=/path/to/impala.keytab
For more information on changing the Impala defaults specified in /etc/default/impala, see Modifying Impala Startup Options.
A common configuration for Impala with High Availability is to use a proxy server to submit requests to the actual impalad daemons on different hosts in the cluster. This configuration avoids connection problems in case of machine failure, because the proxy server can route new requests through one of the remaining hosts in the cluster. This configuration also helps with load balancing, because the additional overhead of being the "coordinator node" for each query is spread across multiple hosts.
Although you can set up a proxy server with or without Kerberos authentication, typically users set up a secure Kerberized configuration. For information about setting up a proxy server for Impala, including Kerberos-specific steps, see Using Impala through a Proxy for High Availability.
Your web browser must support Kerberos HTTP SPNEGO. For example, Chrome, Firefox, or Internet Explorer.
To configure Firefox to access a URL protected by Kerberos HTTP SPNEGO:
about:config
page.
network.negotiate-auth.trusted-uris
.
network.negotiate-auth.trusted-uris
preference and
enter the hostname or the domain of the web server that is protected by Kerberos HTTP
SPNEGO. Separate multiple domains and hostnames with a comma.
See Configuring Impala Delegation for Clients for details about the delegation feature that lets certain users submit queries using the credentials of other users.
You can use Kerberos authentication, TLS/SSL encryption, or both to secure connections from JDBC and ODBC applications to Impala. See Configuring Impala to Work with JDBC and Configuring Impala to Work with ODBC for details.
Prior to Impala 2.5, the Hive JDBC driver did not support connections that use both Kerberos authentication and SSL encryption. If your cluster is running an older release that has this restriction, use an alternative JDBC driver that supports both of these security features.
For applications that need direct access to Impala APIs, without going through the
HiveServer2 or Beeswax interfaces, you can specify a list of Kerberos users who are
allowed to call those APIs. By default, the impala
and
hdfs
users are the only ones authorized for this kind of access. Any
users not explicitly authorized through the
internal_principals_whitelist
configuration setting are blocked from
accessing the APIs. This setting applies to all the Impala-related daemons, although
currently it is primarily used for HDFS to control the behavior of the catalog server.
auth_to_local
setting, specified through the HDFS configuration setting
hadoop.security.auth_to_local
. This feature is disabled by default, to
avoid an unexpected change in security-related behavior. To enable it:
Specify ‑‑load_auth_to_local_rules=true
in the
impalad and catalogd configuration settings.