Calculate Aggregates and Group Results
- Capella Operational
- how-to
How to calculate aggregates and group the results.
Introduction
You can use aggregate functions to perform calculations over multiple values. Grouping enables you to display the results in groups.
If you want to try out the examples in this section, follow the instructions given in Create an Account and Deploy Your Free Tier Operational Cluster to create a free account, deploy a cluster, and load a sample dataset. Read the following for further information about the tools available for editing and executing queries:
Aggregate Functions
To take multiple values from documents, perform calculations, and return a single value as the result, use an aggregate function, such as AVG(), COUNT(), MIN(), MAX(), or SUM().
For example, the following query finds the average altitude of airports in the airport keyspace.
Set the query context to the inventory
scope in the travel sample dataset.
For more information, see Setting the Query Context.
SELECT AVG(geo.alt) AS AverageAltitude FROM airport;
[
{
"AverageAltitude": 870.1651422764228
}
]
For more information and examples, refer to Aggregate Functions.
Aggregating Distinct Values
To aggregate all values, omit the aggregate quantifier, or optionally include the ALL
keyword before the function arguments.
For example, the following query finds the average number of stops per route.
Set the query context to the inventory
scope in the travel sample dataset.
For more information, see Setting the Query Context.
SELECT AVG(ALL stops) AS AvgAllStops FROM route;
[
{
"AvgAllStops": 0.0002
}
]
Results in 0.0002 since nearly all routes have 0 stops.
To aggregate distinct values only, include the DISTINCT
keyword before the function arguments.
For example, the following query finds the average of the distinct numbers of stops.
Set the query context to the inventory
scope in the travel sample dataset.
For more information, see Setting the Query Context.
SELECT AVG(DISTINCT stops) AS AvgDistinctStops FROM route;
[
{
"AvgDistinctStops": 0.5
}
]
Results in 0.5 since the routes contain only 1 or 0 stops.
For more information, refer to Aggregate Quantifier.
Filtering the Aggregates
To filter the values used by an aggregate function, use the FILTER clause after the function.
For example, the following query finds the minimum value of a string field, only including strings that start with "A"
or greater.
Set the query context to the inventory
scope in the travel sample dataset.
For more information, see Setting the Query Context.
SELECT MIN(name) FILTER (WHERE SUBSTR(name,0)>="A") AS MinName
FROM hotel;
[
{
"MinName": "AIRE NATURELLE LE GROZEAU Aire naturelle"
}
]
For more information, refer to FILTER Clause.
Grouping the Results
By default, an aggregate function returns a single result for all the documents that the query selects. It is often more useful to group the documents (by a different field) and return the aggregate result for each group.
To group the results of an aggregate query, use the GROUP BY clause.
For example, the following query groups unique landmarks by city.
Set the query context to the inventory
scope in the travel sample dataset.
For more information, see Setting the Query Context.
SELECT city City, COUNT(DISTINCT name) LandmarkCount
FROM landmark
GROUP BY city
ORDER BY LandmarkCount DESC
LIMIT 4;
[
{
"City": "San Francisco",
"LandmarkCount": 797
},
{
"City": "London",
"LandmarkCount": 443
},
{
"City": "Los Angeles",
"LandmarkCount": 284
},
{
"City": "San Diego",
"LandmarkCount": 197
}
]
For more information and examples, refer to GROUP BY Clause.
Filtering the Groups
To filter the groups by an aggregate function, use the HAVING clause within the GROUP BY clause.
For example, the following query groups unique landmarks by city, and specifies cities that have more than 180 landmarks.
Set the query context to the inventory
scope in the travel sample dataset.
For more information, see Setting the Query Context.
SELECT city City, COUNT(DISTINCT name) LandmarkCount
FROM landmark
GROUP BY city
HAVING COUNT(DISTINCT name) > 180;
[
{
"City": "London",
"LandmarkCount": 443
},
{
"City": "Los Angeles",
"LandmarkCount": 284
},
{
"City": "San Francisco",
"LandmarkCount": 797
},
{
"City": "San Diego",
"LandmarkCount": 197
}
]
For more information and examples, refer to HAVING Clause.
Defining an Expression within the GROUP BY Clause
To define an expression for use within the GROUP BY clause, use the LETTING clause before the HAVING clause.
For example, the following clause uses an expression to define the minimum number of landmarks for each city.
Set the query context to the inventory
scope in the travel sample dataset.
For more information, see Setting the Query Context.
SELECT city City, COUNT(DISTINCT name) LandmarkCount
FROM landmark
GROUP BY city
LETTING MinimumThingsToSee = 400
HAVING COUNT(DISTINCT name) > MinimumThingsToSee;
[
{
"City": "London",
"LandmarkCount": 443
},
{
"City": "San Francisco",
"LandmarkCount": 797
}
]
For more information and examples, refer to LETTING Clause.