VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions
An aggregate function that returns the variance of a set of numbers. This is a mathematical property that signifies how far the values spread apart from the mean. The return value can be zero (if the input is a single value, or a set of identical values), or a positive number otherwise.
Syntax:
{ VARIANCE | VAR[IANCE]_SAMP | VAR[IANCE]_POP } ([DISTINCT | ALL] expression)This function works with any numeric data type.
        Return type: DOUBLE in Impala 2.0 and higher; STRING in earlier
        releases
      
This function is typically used in mathematical formulas related to probability distributions.
      The VARIANCE_SAMP() and VARIANCE_POP() functions compute the sample
      variance and population variance, respectively, of the input values. (VARIANCE() is an alias
      for VARIANCE_SAMP().) Both functions evaluate all input rows matched by the query. The
      difference is that STDDEV_SAMP() is scaled by 1/(N-1) while
      STDDEV_POP() is scaled by 1/N.
    
      The functions VAR_SAMP() and VAR_POP() are the same as
      VARIANCE_SAMP() and VARIANCE_POP(), respectively. These aliases are
      available in Impala 2.0 and later.
    
      If no input rows match the query, the result of any of these functions is NULL. If a single
      input row matches the query, the result of any of these functions is "0.0".
    
Examples:
      This example demonstrates how VARIANCE() and VARIANCE_SAMP() return the
      same result, while VARIANCE_POP() uses a slightly different calculation to reflect that the
      input data is considered part of a larger "population".
    
[localhost:21000] > select variance(score) from test_scores;
+-----------------+
| variance(score) |
+-----------------+
| 812.25          |
+-----------------+
[localhost:21000] > select variance_samp(score) from test_scores;
+----------------------+
| variance_samp(score) |
+----------------------+
| 812.25               |
+----------------------+
[localhost:21000] > select variance_pop(score) from test_scores;
+---------------------+
| variance_pop(score) |
+---------------------+
| 811.438             |
+---------------------+
      This example demonstrates that, because the return value of these aggregate functions is a
      STRING, you convert the result with CAST if you need to do further
      calculations as a numeric value.
    
[localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
+-------------------+
| summary           |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
[localhost:21000] > desc score_stats;
+--------------------+--------------+---------+
| name               | type         | comment |
+--------------------+--------------+---------+
| standard_deviation | decimal(7,4) |         |
| variance           | decimal(7,4) |         |
+--------------------+--------------+---------+
Restrictions:
        This function cannot be used in an analytic context. That is, the OVER() clause is not allowed at all with this function.
      
Related information:
      The STDDEV(), STDDEV_POP(), and STDDEV_SAMP() functions
      compute the standard deviation (square root of the variance) based on the results of
      VARIANCE(), VARIANCE_POP(), and VARIANCE_SAMP()
      respectively. See STDDEV, STDDEV_SAMP, STDDEV_POP Functions for details about the standard deviation property.