Querying with SQL++

      +
      You can query for documents in Couchbase using the SQL++ query language, a language based on SQL, but designed for structured and flexible JSON documents.

      On this page we dive straight into using the Query Service API from the Python Columnar SDK. For a deeper look at the concepts, to help you better understand the Query Service, and the SQL++ language, see the links in the Further Information section at the end of this page.

      Here we show queries against the Travel Sample collection, at cluster and scope level, and give links to information on adding other collections to your data.

      Before You Start

      This page assumes that you have installed the Python Columnar SDK, added your IP address to the allowlist, and created a Columnar cluster.

      Create a collection to work upon by importing the travel-sample dataset into your cluster.

      Querying Your Dataset

      Most queries return more than one result, and you want to iterate over the results:

      Scope Level Queries

      • Sync API

      • Async API

      scope = cluster.database('travel-sample').scope('inventory')
      
      query = """
              SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
              FROM route
              GROUP BY airline
              ORDER BY route_count DESC
              """
      
      res = scope.execute_query(query)
      
      print('Rows:')
      for row in res.rows():
          print(row)
      
      print(f'\nMetadata: {res.metadata()}')
      scope = cluster.database('travel-sample').scope('inventory')
      
      query = """
              SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
              FROM route
              GROUP BY airline
              ORDER BY route_count DESC
              """
      
      res = await scope.execute_query(query)
      
      print('Rows:')
      async for row in res.rows():
          print(row)
      
      print(f'\nMetadata: {res.metadata()}')

      Cluster Level Queries

      • Sync API

      • Async API

      query = """
              SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
              FROM `travel-sample`.inventory.route
              GROUP BY airline
              ORDER BY route_count DESC
              """
      
      res = cluster.execute_query(query)
      query = """
              SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
              FROM `travel-sample`.inventory.route
              GROUP BY airline
              ORDER BY route_count DESC
              """
      
      res = await cluster.execute_query(query)

      Positional and Named Parameters

      Supplying parameters as individual arguments to the query allows the query engine to optimize the parsing and planning of the query. You can either supply these parameters by name or by position.

      Positional Parameters

      Execute a query with positional arguments:

      • Sync API

      • Async API

      from couchbase_columnar.options import QueryOptions
      
      query = """
              SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
              FROM route
              WHERE sourceairport=$1 AND distance>=$2
              GROUP BY airline
              ORDER BY route_count DESC
              """
      
      res = scope.execute_query(query, QueryOptions(positional_parameters=['SFO', 1000]))
      from acouchbase_columnar.options import QueryOptions
      
      query = """
              SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
              FROM route
              WHERE sourceairport=$1 AND distance>=$2
              GROUP BY airline
              ORDER BY route_count DESC
              """
      
      res = await scope.execute_query(query, QueryOptions(positional_parameters=['SFO', 1000]))

      Named Parameters

      Execute a query with named arguments:

      • Sync API

      • Async API

      query = """
              SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
              FROM route
              WHERE sourceairport=$source_airport AND distance>=$min_distance
              GROUP BY airline
              ORDER BY route_count DESC
              """
      
      res = scope.execute_query(query, QueryOptions(named_parameters={'source_airport': 'SFO', 'min_distance': 1000}))
      query = """
              SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
              FROM route
              WHERE sourceairport=$source_airport AND distance>=$min_distance
              GROUP BY airline
              ORDER BY route_count DESC
              """
      
      res = await scope.execute_query(query, QueryOptions(named_parameters={'source_airport': 'SFO', 'min_distance': 1000}))

      Using the Query Result

      Results from the Couchbase Columnar SDK can easily be used with several common Data Analytics Python libraries, including Pandas and PyArrow.

      Importing the result to a pandas DataFrame.
      import pandas as pd
      
      res = scope.execute_query(query)
      df = pd.DataFrame.from_records(res.rows(), index='airline')
      
      print(df.head())
      #          route_count  avg_route_distance
      # airline
      # AA              2354         2314.884359
      # UA              2180         2350.365407
      # DL              1981         2350.494112
      # US              1960         2101.417609
      # WN              1146         1397.736500
      Importing the query result to a PyArrow table.
      import pyarrow as pa
      
      res = scope.execute_query(query)
      table = pa.Table.from_pylist(res.get_all_rows())
      
      print(table.to_string())
      # pyarrow.Table
      # route_count: int64
      # avg_route_distance: double
      # airline: string

      Further Information

      The SQL++ for Analytics Reference offers a complete guide to the SQL++ language for both of our analytics services, including all of the latest additions.