approx_distinct
function using the HyperLogLog data structure.
hyperloglog_type
implicitly casts to
p4hyperloglog_type
, while one can
explicitly cast HyperLogLog
to P4HyperLogLog
:
varbinary
.
This allows them to be stored for later use. Combined with the ability
to merge multiple sketches, this allows one to calculate
approx_distinct
of the elements of a partition of a query, then for the entirety of a query with very little
cost.
For example, calculating the HyperLogLog
for daily unique users will
allow weekly or monthly unique users to be calculated incrementally by
combining the dailies. This is similar to computing weekly revenue by
summing daily revenue. Uses of approx_distinct
with GROUPING SETS
can be converted to use HyperLogLog
.approx_set(x)
→ HyperLogLog
Returns the HyperLogLog
sketch of the input data set of x
. This data
sketch underlies approx_distinct
and can be stored and used later by calling cardinality()
.
cardinality(hll)
→ bigint
This will perform approx_distinct
on
the data summarized by the hll
HyperLogLog data sketch.
empty_hll()
→ HyperLogLog
Returns an empty HyperLogLog
.
merge(hyperloglog)
→ HyperLogLog
Returns the HyperLogLog
of the aggregate union of the individual hll
HyperLogLog structures.