More on ThetaSketches
A Theta sketch object can be thought of as a Set data structure. At query time, sketches are read and aggregated (set unioned) together. By default, you receive the estimate of the number of unique entries in the sketch object.
Also, you can use post aggregators to do union, intersection or difference on sketch columns in the same row. This means you can create distinct sets of users and compare them against each other. This is necessary for retention and funnel queries.
{
"fieldName": "clientUser",
"name": "count",
"type": "thetaSketch"
}
cardinality
computes the cardinality of a dimension.
{
"byRow": false,
"fields": ["clientUser"],
"name": "a0",
"round": true,
"type": "cardinality"
}
doubleFirst
computes the first value of all metric values.
{ "type" : "doubleFirst", "name" : <output_name>, "fieldName" : <metric_name> }
doubleLast
computes the last value of all metric values.
{ "type" : "doubleLast", "name" : <output_name>, "fieldName" : <metric_name> }
stringFirst
computes the first value of all metric values.
{ "type" : "stringFirst", "name" : <output_name>, "fieldName" : <metric_name> }
stringLast
computes the last value of all metric values.
{ "type" : "stringLast", "name" : <output_name>, "fieldName" : <metric_name> }
Returns any value including null. This aggregator can simplify and optimize the performance by returning the first encountered value (including null).
doubleAny returns any double metric value.
{ "type" : "doubleAny", "name" : <output_name>, "fieldName" : <metric_name> }
Returns any value including null. This aggregator can simplify and optimize the performance by returning the first encountered value (including null).
stringAny returns any string metric value.
{ "type" : "stringAny", "name" : <output_name>, "fieldName" : <metric_name> }
A filtered aggregator wraps any given aggregator, but only aggregates the values for which the given dimension filter matches.
This makes it possible to compute the results of a filtered and an unfiltered aggregation simultaneously, without having to issue multiple queries, and use both results as part of post-aggregations.
Note: If only the filtered results are required, consider putting the filter on the query itself, which will be much faster since it doesn’t require scanning all the data.
{
"type": "filtered",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "type",
"value": "InsightShown"
}
]
},
"aggregator": {
"type": "thetaSketch",
"name": "InsightShown",
"fieldName": "clientUser"
}
}