Pushdowns
Index Pushdowns are performance optimizations where the Query engine pushes more of the work down to the Indexer.
The GSI Indexes are a vital part of query performance and are tightly coupled with the Query engine. The Query engine implements many query processing optimizations to achieve the best possible performance for N1QL queries.
Query Indexer not only indexes data, it also supports various operations such as point scans, range scans, array indexing, sort order, and pagination. The Query engine tries to leverage the indexer functionality as much as possible by pushing down operations to the indexer as part of the index scan. This helps performance, predominantly, in two ways:
-
Minimize the amount of data transferred from Indexer nodes to Query nodes.
-
Minimize the amount of processing done at Query nodes.
Index Projection
When processing a SELECT or DML with a WHERE clause, the Query engine picks one or more qualified indexes to be used for the query.
Note that each index will have document field names explicitly specified as index-keys, and some metadata fields, such as meta().id
, implicitly stored in the index.
In earlier releases, the Indexer used to return all index-keys available in the index for the matching documents.
From Couchbase Server 5.0, the Query engine requests the exact list of fields needed for the query.
For a covered query, it is the fields referred in projection, predicate, GROUP BY clauses, ORDER BY clauses, HAVING clauses, ON key, and subqueries of the query.
For non-covered queries, it is just META().id
.
For example, consider the following index and query:
This example uses the def_inventory_route_route_src_dst_day
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_route_route_src_dst_day
ON `travel-sample`.inventory.route(sourceairport, destinationairport,
(DISTINCT (ARRAY (v.day) FOR v IN schedule END)));
EXPLAIN SELECT sourceairport FROM `travel-sample`.inventory.route
USE INDEX (def_inventory_route_route_src_dst_day)
WHERE sourceairport = "SFO"
LIMIT 1;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "DistinctScan",
"limit": "1",
"scan": {
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`route`.`sourceairport`))",
"cover ((`route`.`destinationairport`))",
"cover ((distinct (array (`v`.`day`) for `v` in (`route`.`schedule`) end)))",
"cover ((meta(`route`).`id`))"
],
"filter": "(cover ((`route`.`sourceairport`)) = \"SFO\")",
"index": "def_inventory_route_route_src_dst_day",
"index_id": "e7eb4b4555f90179",
"index_projection": {
"entry_keys": [ (1)
0
],
"primary_key": true (2)
},
// ...
The query refers to fields sourceairport
and type
.
The index is wider in scope, that is, it has sourceairport
, destinationairport
, schedule.day
, and type
fields.
So, for each matching document, the query requires only a subset of the data stored in the index. With index-projection support, the Query engine indicates the exact data requested as part of the index-scan. In this example,
1 | The entry_keys in the EXPLAIN output indicate the exact index-key fields that should be returned in the index-scan result.
This has only one entry (0 ) indicating the first index-key sourceairport . |
2 | Also, the primary_key indicates whether the index should return the primary key meta().id of the matching document. |
Note that in some cases (such as when distinctScan
or intersectScan
are used, as in this example), the primary key meta().id
may be retrieved even though the query doesn’t explicitly specify it in the query.
Without this optimization, index-scan would return all the index-keys defined in the index.
If the index_projection
field is missing in the EXPLAIN output, then Indexer would return all index-keys.
Predicate Pushdown
The Query engine and GSI indexes support many optimizations for efficiently processing predicate push-downs. In general, this performance optimization is leveraged when the Query engine decides to use an Index-scan for processing a query, and whole or partial predicates can be pushed to the indexer to filter documents of interest to the query.
For example, in the above query Example 1 with a simple WHERE clause, the predicate (sourceairport = "SFO"
) is pushed to the index def_inventory_route_route_src_dst_day
with the following single span
and range
.
These attributes exactly define different characteristics of the index scan:
The span
and range
from the EXPLAIN output of Example 1.
"spans": [
{
"exact": true,
"range": [
{
"high": "\"SFO\"",
"inclusion": 3,
"low": "\"SFO\""
}
]
}
],
// ...
-
Each Span defines details about one index-key summarizing corresponding predicate conditions into a range-scan lookup for the index. In this example, the predicate condition (
sourceairport = "SFO"
) translates to one span with one range that specifies bothlow
andhigh
values of "SFO" (to imply equals condition). -
Refer to section Scans for more information.
Composite Predicate Pushdown
Compound or composite predicates are those with multiple conditions on different fields of the document.
When the predicate is conjunctive with multiple AND conditions, then a single span
with multiple ranges
are specified in the index-scan request.
When the predicate is disjunctive, then multiple spans
are specified.
See Scans
for more details and examples on how predicate pushdown works for various types of index-scans as well as the conjunctive predicate AND and the disjunctive predicate OR.
Index Key Order and Structure
Composite indexes have more than one index key, and the order of the index keys is important for any lookup or scan of the index, because the indexes structure all the indexed entries in linearized default collation sorted order of all the index-keys. For example, consider the following hypothetical index:
CREATE INDEX `idx_age_name` ON users(age, name);
Various age and name values are stored in the index in a tree like structure (represented by the triangle in the above picture) with all the index key values linearly sorted as ordered pairs. For instance,
-
The above picture shows index-entries with all names in sorted order with an age of 20 followed by the entries for age 21 and related names.
-
The arrowed paths logically depicts how an index lookup or scan would find entries in the index.
-
A point lookup query for
age=20 AND name="joe"
may follow arrows labelled p1. -
Similarly, a range scan for
(age BETWEEN 20 and 21) AND (name="joe")
may find entries of interest between the paths labelled p1 and p2 (highlighted in green).This range may include some unwanted entries (such as "mark", "abby", "anne") which will be filtered subsequently. Queries with predicates such as (age = 20) AND (name BETWEEN "joe" and "mark")
will need all the entries found using range scans.
In general, when the predicate has a range condition on prefixing index-keys (such as age
) may produce unwanted results from the range-scan index-lookups.
In Couchbase Server 5.0 and later, the Query service and Indexer are enhanced with complete and accurate predicate pushdown to filter such unnecessary results in the Indexer itself.
This improves query performance as it saves the additional overhead in transferring the unwanted data/results to query nodes and subsequently filtering the results.
This is explained with an example in the following section: Composite Predicate with Range Scan on Prefix Index-Keys.
In Couchbase Server 4.x, the indexer would return such extraneous unwanted results and the Query service would filter them, thus guaranteeing accurate final query results. |
Composite Predicate with Range Scan on Prefix Index-Keys
The Query engine supports efficient predicate pushdown to indexes in the cases when the WHERE clause has a range predicate on any of the prefixing index-keys.
Consider the following query, which finds all destination airports within 2000 miles of LAX.
CREATE INDEX idx_route_src_dst_dist
ON `travel-sample`.inventory.route(distance, sourceairport, destinationairport);
EXPLAIN SELECT destinationairport
FROM `travel-sample`.inventory.route
USE INDEX (idx_route_src_dst_dist)
WHERE distance < 2000 AND sourceairport = "LAX"; (1)
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`route`.`distance`))",
"cover ((`route`.`sourceairport`))",
"cover ((`route`.`destinationairport`))",
"cover ((meta(`route`).`id`))"
],
"filter": "((cover ((`route`.`distance`)) < 2000) and (cover ((`route`.`sourceairport`)) = \"LAX\"))",
"index": "idx_route_src_dst_dist",
"index_id": "6a502445eefe20b5",
"index_projection": {
"entry_keys": [
0,
1,
2
]
},
"keyspace": "route",
"namespace": "default",
"scope": "inventory",
"spans": [ (2)
{
"exact": true,
"range": [ (3)
{
"high": "2000", (4)
"inclusion": 0,
"low": "null"
},
{
"high": "\"LAX\"", (5)
"inclusion": 3,
"low": "\"LAX\""
}
]
}
],
"using": "gsi"
},
// ...
In this query:
1 | The predicate has the range condition on the first index-key distance and an equality predicate on the 2nd index-key sourceairport . |
2 | The predicate is accurately represented and pushed-down to the indexer, as shown in the spans attribute of the EXPLAIN query plan output. |
3 | The range[] attribute is an array of the predicate bounds for individual index-keys involved in the compound predicate. |
4 | The first element of range[] corresponds to the index-key distance with (low , high ) values (null , 2000 ) respectively. |
5 | The second element of range[] corresponds to the index-key sourceairport with (low , high ) values ("LAX" , "LAX" ) representing equals condition. |
The Indexer processes the lookup request and exactly returns only the documents matching the predicate conditions.
For example, when you enable monitoring with the configuration parameter profile = "timings"
for this query, you can see that the indexer returns 165 documents, which is the same as the final result set of the query.
// ...
"~children": [
{
"#operator": "IndexScan3",
"#stats": {
"#heartbeatYields": 54,
"#itemsOut": 165,
"#phaseSwitches": 663,
"execTime": "462.968µs",
"kernTime": "638.333µs",
"servTime": "38.110289ms"
},
// ...
{
"#operator": "Stream",
"#stats": {
"#itemsIn": 165,
"#itemsOut": 165,
"#phaseSwitches": 166,
"execTime": "354.652µs"
}
}
// ...
Composite Predicate with Skip-Key Range Scan
In Example 3 above, the query has a composite predicate on continuous leading index-keys, namely the index-keys distance
and sourceairport
, which occur together at the start of the index definition.
In Couchbase Server 6.5 and later, it is possible to push down a composite predicate on non-continuous leading index-keys.
There must be a predicate on the first key in the index definition, and then on any of the subsequent index-keys.
Consider the following query, which is a variant of the query in Example 3 above, and uses the same index.
EXPLAIN SELECT sourceairport
FROM `travel-sample`.inventory.route
USE INDEX (idx_route_src_dst_dist)
WHERE distance < 2000 AND destinationairport = "LAX"; (1)
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`route`.`distance`))",
"cover ((`route`.`sourceairport`))",
"cover ((`route`.`destinationairport`))",
"cover ((meta(`route`).`id`))"
],
"filter": "((cover ((`route`.`distance`)) < 2000) and (cover ((`route`.`destinationairport`)) = \"LAX\"))",
"index": "idx_route_src_dst_dist",
"index_id": "6a502445eefe20b5",
"index_projection": {
"entry_keys": [
0,
1,
2
]
},
"keyspace": "route",
"namespace": "default",
"scope": "inventory",
"spans": [ (2)
{
"exact": true,
"range": [
{
"high": "2000", (3)
"inclusion": 0,
"low": "null"
},
{
"inclusion": 0 (4)
},
{
"high": "\"LAX\"", (5)
"inclusion": 3,
"low": "\"LAX\""
}
]
}
],
"using": "gsi"
},
// ...
In this query:
1 | The query has a range predicate on the first index-key distance and an equality predicate on the 3rd index-key destinationairport .
There is no predicate on the 2nd index-key sourceairport . |
2 | The predicate is pushed-down to the indexer, as shown in the EXPLAIN query plan output. |
3 | The first element of range[] corresponds to the index-key distance with (low , high ) values (null , 2000 ) respectively. |
4 | The second element of range[] corresponds to the index-key sourceairport with no low or high bounds, i.e. a whole range scan. |
5 | The third element of range[] corresponds to the index-key destinationairport with (low , high ) values ("LAX" , "LAX" ) representing equals condition. |
By skipping the index-key that does not have a predicate, and including all the non-continuous index-keys that do have predicates, the query uses the index more effectively and reduces the number of documents that need to be fetched. This increases query preparation time a little, but vastly speeds up the execution of the query.
Pagination Pushdown
Pagination in N1QL queries is achieved by using the LIMIT and OFFSET clauses, and both of the operators can be pushed to indexer whenever possible. These operators may not always be pushed to Indexer, depending on the following factors:
-
Whether or not the whole predicate in the WHERE clause can be completely and accurately pushed to a single index.
-
When using IntersectScan, the Query engine uses multiple indexes to process the query. As such, LIMIT/OFFSET will need to be processed in the Query engine at a later stage of query processing, and hence cannot be pushed to the Indexer.
-
Whether or not the SELECT query has other clauses that may impact pagination, such as ORDER BY or JOIN. For example,
-
When ORDER BY key in the query is different from that of the index order, then the query layer will need to process the sort; and hence, in those cases, the pagination cannot be pushed to the indexer as shown in Example 7 below.
-
For JOIN queries, index scans can be used only for the left side keyspace. Subsequent JOIN phrases may filter some documents, after which only LIMIT/OFFSET can be applied. Hence, the pagination operators cannot be pushed when a query has JOIN clauses.
-
When using a secondary index, both LIMIT and OFFSET operators are pushed to the index.
This example uses the def_inventory_landmark_city
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_landmark_city ON `travel-sample`.inventory.landmark(city);
EXPLAIN SELECT * FROM `travel-sample`.inventory.landmark
WHERE city = "San Francisco"
OFFSET 4000 LIMIT 10000;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"index": "def_inventory_landmark_city",
"index_id": "39eb8e83720948f",
"index_projection": {
"primary_key": true
},
"keyspace": "landmark",
"limit": "10000", (1)
"namespace": "default",
"offset": "4000", (2)
"scope": "inventory",
"spans": [
{
"exact": true,
"range": [
{
"high": "\"San Francisco\"",
"inclusion": 3,
"low": "\"San Francisco\""
}
]
}
],
"using": "gsi"
},
// ...
1 | The IndexScan3 operator handles limit . |
2 | The IndexScan3 operator handles offset . |
When using a primary index, both LIMIT and OFFSET operators are pushed to the Indexer.
This example uses the def_inventory_landmark_primary
index that comes pre-installed and can be made by the statement:
CREATE PRIMARY INDEX `def_inventory_landmark_primary`
ON `travel-sample`.inventory.landmark;
EXPLAIN SELECT * FROM `travel-sample`.inventory.landmark
OFFSET 4000 LIMIT 10000;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "PrimaryScan3",
"bucket": "travel-sample",
"index": "def_inventory_landmark_primary",
"index_projection": {
"primary_key": true
},
"keyspace": "landmark",
"limit": "10000", (1)
"namespace": "default",
"offset": "4000", (2)
"scope": "inventory",
"using": "gsi"
},
// ...
1 | The PrimaryScan3 operator handles limit . |
2 | The PrimaryScan3 operator handles offset . |
LIMIT and OFFSET operators are not pushed to the Indexer when the index order is different from that specified in the ORDER BY.
This example uses the def_inventory_landmark_city
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_landmark_city ON `travel-sample`.inventory.landmark(city);
EXPLAIN SELECT * FROM `travel-sample`.inventory.landmark
USE INDEX(def_inventory_landmark_city)
WHERE city = "San Francisco"
ORDER BY name
OFFSET 4000 LIMIT 10000;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3", (1)
"bucket": "travel-sample",
"index": "def_inventory_landmark_city",
"index_id": "39eb8e83720948f",
"index_projection": {
"primary_key": true
},
"keyspace": "landmark",
"namespace": "default",
"scope": "inventory",
"spans": [
{
"exact": true,
"range": [
{
"high": "\"San Francisco\"",
"inclusion": 3,
"low": "\"San Francisco\""
}
]
}
],
"using": "gsi"
},
// ...
{
"#operator": "Order",
"limit": "10000",
"offset": "4000",
"sort_terms": [
{
"expr": "(`landmark`.`name`)"
}
]
},
{
"#operator": "Offset", (2)
"expr": "4000"
},
{
"#operator": "Limit", (3)
"expr": "10000"
}
// ...
1 | The IndexScan3 operator does not handle offset or limit . |
2 | The Offset operator is handled by the Query engine. |
3 | The Limit operator is handled by the Query engine. |
Using Index Order
The Query engine may avoid ORDER BY processing in cases where the ordering of entries in the index can be leveraged for the query. The Query engine carefully evaluates each query to decide whether ORDER BY keys are aligned with the index key order. For example, ORDER BY may not be pushed down when the ORDER BY fields are not aligned with the index-key order defining the index.
Find all cities that start with "San", and sort the results by the city name in ascending order.
This example uses the def_inventory_landmark_city
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_landmark_city ON `travel-sample`.inventory.landmark(city);
EXPLAIN SELECT city FROM `travel-sample`.inventory.landmark
WHERE city LIKE "San%"
ORDER BY city;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`landmark`.`city`))",
"cover ((meta(`landmark`).`id`))"
],
"filter": "(cover ((`landmark`.`city`)) like \"San%\")",
"index": "def_inventory_landmark_city",
// ...
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "cover ((`landmark`.`city`))"
}
]
} (1)
// ...
1 | In this example, you can see that the query plan does not have an ORDER operator before the final projection. That means order pushdown is being leveraged, and the query is relying on the index order. |
Find all cities that start with "San", and sort the results by the document primary key in ascending order.
This example uses the def_inventory_landmark_city
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_landmark_city ON `travel-sample`.inventory.landmark(city);
EXPLAIN SELECT city FROM `travel-sample`.inventory.landmark
WHERE city LIKE "San%"
ORDER BY meta().id;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`landmark`.`city`))",
"cover ((meta(`landmark`).`id`))"
],
"filter": "(cover ((`landmark`.`city`)) like \"San%\")",
"index": "def_inventory_landmark_city",
// ...
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "cover ((`landmark`.`city`))"
}
]
}
]
}
}
]
},
{
"#operator": "Order", (1)
"sort_terms": [
{
"expr": "cover ((meta(`landmark`).`id`))"
}
]
}
// ...
1 | In this example, you can see an additional ORDER operator before the final projection, because the ORDER BY field meta().id is different from the index order key city . |
Limitation
Currently the Query engine supports order pushdown only when the ORDER BY keys are aligned with the Index order. But reverse-scan of the index is not supported.
For example, in Example 8 above, you can see that the query plan does not have an ORDER operator before the final projection, because the index order is the same as the ascending order specified in the query.
Similarly, in the following example Example 10, the ORDER BY clause has DESC, and that matches with the index order defined by the index idx_landmark_city_desc
.
However, the ASC order in Example 8 will not be able to leverage the index order in the index idx_landmark_city_desc
, nor will the DESC order in Example 10 be able to leverage the index order in the index def_inventory_landmark_city
.
Descending variation of Example 8.
CREATE INDEX idx_landmark_city_desc ON `travel-sample`.inventory.landmark(city DESC);
EXPLAIN SELECT city FROM `travel-sample`.inventory.landmark
WHERE city LIKE "San%"
ORDER BY city DESC;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`landmark`.`city`))",
"cover ((meta(`landmark`).`id`))"
],
"filter": "(cover ((`landmark`.`city`)) like \"San%\")",
"index": "idx_landmark_city_desc",
"index_id": "efc36547ec1c0f00",
"index_order": [
{
"desc": true,
"keypos": 0
}
// ...
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "cover ((`landmark`.`city`))"
}
]
}
// ...
Operator Pushdowns
The Query engine tries to avoid unnecessary processing operators such as MIN(), MAX(), and COUNT(), which can be processed by the Indexer much more efficiently. In such cases, the Query engine pushes down necessary hints or options to the Indexer to process the following operators.
MAX() Pushdown
This function returns the highest value of the input field based on the default collation rules. (For details, see Data types and Comparison Operators.)
Prior to Couchbase Server 5.5, MAX() could be pushed to the Indexer only when the index was created with matching index keys in descending order. This limitation has been removed in Couchbase Server 5.5 and later.
Find the alphabetically highest city name in the landmark
collection.
This example uses the def_inventory_landmark_city
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_landmark_city ON `travel-sample`.inventory.landmark(city);
EXPLAIN SELECT MAX(city)
FROM `travel-sample`.inventory.landmark
USE INDEX (def_inventory_landmark_city)
WHERE city IS NOT NULL;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`landmark`.`city`))",
"cover ((meta(`landmark`).`id`))",
"cover (max(cover ((`landmark`.`city`))))"
],
"index": "def_inventory_landmark_city",
"index_group_aggs": {
"aggregates": [
{
"aggregate": "MAX",
"depends": [
0
],
"expr": "cover ((`landmark`.`city`))",
"id": 2,
"keypos": 0
}
// ...
MIN() Pushdown
This function returns the lowest value of the input field based on the default collation rules. (For details, see Data types and Comparison Operators.)
Prior to Couchbase Server 5.5, MIN() could be pushed to the Indexer only when the index was created with matching index keys in ascending order. This limitation has been removed in Couchbase Server 5.5 and later.
Find the alphabetically lowest city name in the landmark
collection.
This example uses the def_inventory_landmark_city
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_landmark_city ON `travel-sample`.inventory.landmark(city);
EXPLAIN SELECT MIN(city)
FROM `travel-sample`.inventory.landmark
USE INDEX (def_inventory_landmark_city)
WHERE city IS NOT NULL;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`landmark`.`city`))",
"cover ((meta(`landmark`).`id`))",
"cover (min(cover ((`landmark`.`city`))))"
],
"index": "def_inventory_landmark_city",
"index_group_aggs": {
"aggregates": [
{
"aggregate": "MIN",
"depends": [
0
],
"expr": "cover ((`landmark`.`city`))",
"id": 2,
"keypos": 0
}
// ...
COUNT() Pushdown
This function returns the total number of non-Null values of an input field from the matching documents of an index scan.
Find the number of cities entered in the landmark
collection.
This example uses the def_inventory_landmark_city
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_landmark_city ON `travel-sample`.inventory.landmark(city);
SELECT COUNT(city) AS NumberOfCities
FROM `travel-sample`.inventory.landmark
USE INDEX (def_inventory_landmark_city)
WHERE city IS NOT NULL;
[
{
"NumberOfCities": 4479
}
]
The details behind Example 13.
EXPLAIN SELECT COUNT(city) AS NumberOfCities
FROM `travel-sample`.inventory.landmark
USE INDEX (def_inventory_landmark_city)
WHERE city IS NOT NULL;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3", (1)
"bucket": "travel-sample",
"covers": [
"cover ((`landmark`.`city`))",
"cover ((meta(`landmark`).`id`))",
"cover (count(cover ((`landmark`.`city`))))"
],
"index": "def_inventory_landmark_city",
"index_group_aggs": {
"aggregates": [
{
"aggregate": "COUNT",
"depends": [
0
],
"expr": "cover ((`landmark`.`city`))",
"id": 2,
"keypos": 0
}
// ...
1 | In Couchbase Server 5.5 and later, the index operator IndexCountScan3 counts values so the Query Service does not need to do additional processing. |
COUNT(DISTINCT) Pushdown
This function returns the total number of unique non-Null values of an input field from the matching documents of an index scan.
Find the number of unique city names in the landmark
collection.
This example uses the def_inventory_landmark_city
index that comes pre-installed and can be made by the statement:
CREATE INDEX def_inventory_landmark_city ON `travel-sample`.inventory.landmark(city);
SELECT COUNT (DISTINCT city) AS NumberOfDistinctCities
FROM `travel-sample`.inventory.landmark
USE index (def_inventory_landmark_city)
WHERE city IS NOT NULL;
[
{
"NumberOfDistinctCities": 625
}
]
The details behind Example 15.
EXPLAIN SELECT COUNT (DISTINCT city) AS NumberOfDistinctCities
FROM `travel-sample`.inventory.landmark
USE index (def_inventory_landmark_city)
WHERE city IS NOT NULL;
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3", (1)
"bucket": "travel-sample",
"covers": [
"cover ((`landmark`.`city`))",
"cover ((meta(`landmark`).`id`))",
"cover (count(DISTINCT cover ((`landmark`.`city`))))"
],
"index": "def_inventory_landmark_city",
"index_group_aggs": {
"aggregates": [
{
"aggregate": "COUNT",
"depends": [
0
],
"distinct": true,
"expr": "cover ((`landmark`.`city`))",
"id": 2,
"keypos": 0
}
// ...
1 | In Couchbase Server 5.5 and later, the index operator IndexCountScan3 counts distinct values so the Query Service does not need to do additional processing. |