You can use Impala's ai_generate_text function to access Large Language Models (LLMs) in SQL queries. This function enables you to input a prompt, retrieve the LLM response, and include it in results. You can create custom UDFs for complex tasks like sentiment analysis and translation.
ai_generate_text
function>Impala introduces a built-in AI function called ai_generate_text
that
enables direct access to and utilization of Large Language Models (LLMs) in SQL queries.
With this function, you can input a prompt, which may include data. The function
communicates with a supported LLM endpoint, sends the prompt, retrieves the response, and
includes it in the query result.
Alternatively, seamlessly integrate LLM intelligence into your Impala workflow by creating
custom User Defined Functions (UDFs) on top of ai_generate_text
. This
allows you to use concise SQL statements for sending prompts to an LLM and receiving
responses. You can define UDFs for complex tasks like sentiment analysis, language
translation, and generative contextual analysis.
The following example of a built-in AI function demonstrates the use of the OpenAI API as a large language model. Currently, OpenAI's public endpoint and Azure OpenAI endpoints are supported.
ai_generate_text_default(prompt)
ai_generate_text(ai_endpoint, prompt, ai_model, ai_api_key_jceks_secret, additional_params)
The ai_generate_text
function uses the values you provide as an argument
in the function for ai_endpoint
, ai_model
, and
ai_api_key_jceks_secret
. If any of the arguments are left empty or set to
NULL, the function uses the default values defined at the instance level. These default
values correspond to the flag settings configured in the Impala instance. For example, if
the ai_endpoint
argument is NULL or empty, the function will use the value
specified by the ai_endpoint
flag as the default.
When using the ai_generate_text_default
function, make sure to
set all parameters (ai_endpoint
, ai_model
, and
ai_api_key_jceks_secret
) in the coordinator/executor flagfiles with
appropriate values.
ai_api_key_jceks_secret
. To do this, set the
hadoop.security.credential.provider.path
property in the
core-site
configuration for both the executor and coordinator.ai_generate_text_default
.> select ai_generate_text_default('hello');
Response:
Hello! How can I assist you today?
> select customer_id, star_rating, ai_generate_text_default(CONCAT('Classify the following review as positive, neutral, or negative', and only include the uncapitalized category in the response: ', review_body)) AS review_analysis, review_body from amazon_book_reviews where product_title='Artificial Superintelligence' order by customer_id LIMIT 1;
Response:
+--+------------+------------+----------------+------------------+
| |customer_id |star_rating |review_analysis |review_body |
+--+------------+------------+----------------+------------------+
|1 |4343565 | 5 |positive |What is this book |
| | | | |all about ………… |
+--+------------+------------+----------------+------------------+
Instead of writing the prompts in a SQL query, you can build a UDF with your intended
prompt. Once you build your custom UDF, pass your desired prompt within your custom UDF into
the ai_generate_text_default
built-in Impala function.
Example: Classify input customer reviews
The following UDF uses the Amazon book reviews database as the input and requests the LLM to classify the sentiment.
Classify input customer reviews:
IMPALA_UDF_EXPORT
StringVal ClassifyReviews(FunctionContext* context, const StringVal& input) {
std::string request =
std::string("Classify the following review as positive, neutral, or negative")
+ std::string(" and only include the uncapitalized category in the response: ")
+ std::string(reinterpret_cast<const char*>(input.ptr), input.len);
StringVal prompt(request.c_str());
const StringVal endpoint("https://api.openai.com/v1/chat/completions");
const StringVal model("gpt-3.5-turbo");
const StringVal api_key_jceks_secret("open-ai-key");
const StringVal params("{\"temperature\": 0.9, \"model\": \"gpt-4\"}");
return context->Functions()->ai_generate_text(
context, endpoint, prompt, model, api_key_jceks_secret, params);
}
Now you can define these prompt building UDFs and build them in Impala. Once you have them running, you can query your datasets using them.
Creating analyze_reviews
function:
> CREATE FUNCTION analyze_reviews(STRING)
RETURNS STRING
LOCATION ‘s3a://dw-...............’
SYMBOL=’ClassifyReviews’
Using SELECT query for Sentiment analysis to classify Amazon book reviews
> SELECT customer_id, star_rating, analyze_reviews(review_body) AS review_analysis, review_body from amazon_book_reviews where product_title='Artificial Superintelligence' order by customer_id;
Response:
+--+------------+------------+----------------+----------------------+
| |customer_id |star_rating |review_analysis |review_body |
+--+------------+------------+----------------+----------------------+
|1 |44254093 | 5 |positive |What is this book all |
| | | | |about? It is all about|
| | | | |a mind-blowing |
| | | | |universal law of |
| | | | |nature. Mind-blow… |
+--+------------+------------+----------------+----------------------+
|2 |50050072 | 5 |positive |The two tightly- |
| | | | |connected ideas strike|
| | | | |you as amazed. In the |
| | | | |first place, what has |
| | | | |never bef… |
+--+------------+------------+----------------+----------------------+
|3 |50050072 | 5 |positive |The two tightly- |
| | | | |connected ideas strike|
| | | | |you as amazed. In the |
| | | | |first place, what has |
| | | | |never bef… |
+--+------------+------------+----------------+----------------------+
|4 |52932308 | 1 |negative |This book is seriously|
| | | | |flawed. I could not |
| | | | |work out if the author|
| | | | |was a mathemetician |
| | | | |dabbi… |
+--+------------+------------+----------------+----------------------+
|5 |52971961 | 1 |negative |Abdoullaev's |
| | | | |exploration of |
| | | | |Al issues appears to |
| | | | |be very technological |
| | | | |and straightforward… |
+--+------------+------------+----------------+----------------------+
|6 |53008416 | 4 |positive |As Co Founder of |
| | | | |ArtilectWorld:ultra |
| | | | |intelligent machine, |
| | | | |I recommend reading |
| | | | |this book! |
+--+------------+------------+----------------+----------------------+