Access and Organize Data in Capella Columnar Services
- Capella Columnar
- concept
This topic introduces the database objects that you use to access and organize data in Capella Columnar.
Sources of Data
All of the data that you use Capella Columnar to query originates from some other source. Data sources include:
-
Remote operational databases, which are typically subject to rapid, ongoing modification. Capella Columnar connects to Couchbase Capella operational and Couchbase Server databases.
-
Kafka distributed streaming platform to stream data from other data sources such as databases.
-
External data stores, logs, and archives to support analyses of historical data. Capella Columnar can query data stored in Amazon S3 and S3-compatible object stores.
-
Other resources, such as JSON, CSV, TSV, Parquet, or Avro files, and query results on other sources. These you directly load into Capella Columnar.
-
Data Lakes such as
delta
, residing in S3 buckets, can be queried.
Database Objects
To support your analytical queries and manage access to data sources, you add the following objects:
You also create links, which store the credentials for accessing data sources outside of Capella Columnar.
Clusters
A Capella Columnar cluster is an analytical database set in a cloud environment. It can ingest large volumes of data from a Couchbase database, or other data sources, enabling it to run complex queries.
Within a Capella organization, set up projects and add 1 or more Columnar clusters. Add clusters to a project as needed using the UI.
When you create a cluster, you can choose your compute and node configuration options.
Databases
In a Capella Columnar cluster, a database is the top-level container for organizing related information.
When you create a cluster, Capella Columnar automatically creates a database named Default
.
You can add more databases as needed using the UI or a SQL++ CREATE DATABASE statement.
Scopes
Scopes are intermediary containers that exist within a database to group related objects like collections, indexes, links, and functions.
When you create a cluster, Capella Columnar automatically creates a scope named Default
in the database named Default
.
You can add more scopes as needed using the UI or a SQL++ CREATE SCOPE statement.
You must make scope names unique within a database, but you can use the same scope name across different databases.
Collections
Collections are containers that can contain metadata and data that you can query and manipulate.
You must make collection names unique within a scope. Collections with the same name can exist in different scopes, either in the same database or across different databases.
Capella Columnar has three types of collections:
-
Remote collections contain a shadow or mirror copy of data streamed from a remote data source. The remote data source can be a Kafka pipeline or a Couchbase database. A remote collection is associated with a link that provides authentication and connection information for the remote data source. When the link is connected to the remote source, Capella Columnar streams data from the remote source into the collection. This streaming means that the remote collection has a local replica of the data in the data source. When the link is disconnected, the collection retains the data as it was when the link disconnected. Queries on remote collections are efficient because of the local shadow copy of the streamed data.
The remote collection also contains metadata about the data format of the remote source as well as optional data filters.
You can use the Capella Columnar UI or the SQL++ CREATE COLLECTION statement to add a remote collection.
-
External collections let you query data stored in an S3 bucket. Like remote collections, they are associated with a link. Unlike remote collections, Capella Columnar does not copy data from the external data source into the external collection. Instead, every query reads data from the external storage location. The external collection contains just the metadata necessary to read data from the S3 bucket. As a result, Capella Columnar cannot index external collections.
You can use the Capella Columnar UI or a CREATE EXTERNAL COLLECTION SQL++ statement to add an external collection.
-
Standalone collections allow you to assemble and manipulate groups of documents on an as-needed basis. These are stored, manipulated, and managed locally. Standalone collections do not use links.
You populate these collections with data by importing data files or by using SQL++ statements to INSERT, COPY INTO, and otherwise add and update documents in a purpose-built collection.
You can use the Capella Columnar UI or a CREATE COLLECTION SQL++ statement to add a standalone collection.
Links
A link is a metadata store for the authorization and authentication credentials that Capella Columnar uses when connecting to a remote or external data source. Links exist outside of the database > scope > collection hierarchy in a Capella Columnar cluster. You can associate multiple collections in different scopes with a single link.
There are two types of links:
-
Remote links have connected and disconnected states. When connected, the link provides continuous, real-time updates to the data shadowed in its associated Capella Columnar remote collections.
You incur charges when you connect a remote link.
-
External links contain the credentials Capella Columnar needs to access an external storage location. These links do not have connected or disconnected states. Instead, each time you query an associated external collection, Capella Columnar connects to the external data storage to read its data.
You use the Capella Columnar UI to add links. See Stream Data from Remote Sources or Set Up an External Data Source.
Other Objects
At the same hierarchical level as collections—within a database and scope—you create views and tabular views, synonyms, and user-defined indexes and functions.
-
To create views and tabular views, you can use the Capella Columnar UI or a CREATE VIEW SQL++ statement.
-
You use SQL++ statements to create synonyms and user-defined functions.
-
You also create indexes on individual remote and standalone collections with SQL++ statements. See Indexes.