Cosmos DB indexing fundamentals – SQL on the edge episode 15
As Microsoft's premier NoSQL cloud offering, Cosmos DB offers some interesting design decisions and trade-offs that are necessary to understand in order to maximize the benefits of the product. In this blog post, we are going to cover indexing as it is one of the features that really illustrates this point.
Index ManagementIn a relational database, you create a schema and then define your indexes on top of it. The exact definition, quantity and storage of those indexes can often times have a significant impact on the overall performance of the database. For the Cosmos DB team, it was important that their service "just worked" right off the bat. This means that you are able to load data and start querying, skipping both the declarative schema creation as well as the index management piece. To achieve this, Cosmos DB indexes all fields by default and starts using them right away as soon as data is loaded into the database. Indexes also work transparently out of the gate and do not have to be declared explicitly to use them. This is not the case in other competitor products such as Amazon's DynamoDB where the index to be used has to be declared with the request. This gives Cosmos DB more flexibility since the querying code is completely independent of the underlying implementation, as it should be. If an administrator does not like this "index all by default" policy then they are also free to change it. This again is the beauty of this design; you don't have to micromanage indexing, but if you want to, you can do it. This is done in a declarative way through a JSON policy that specifies which paths of the database have to be indexed and how. The indexing can even be micromanaged to the level of the individual document. For example, you can change indexing from "automatic" to "manual" and then only the documents that you request to be indexed will be added to the index. This is likely overkill for many people but if you need this level of granular control, you have access to it.
Indexing ConsistencyAnother interesting option provided by Cosmos DB is the ability to specify whether the indexes are updated in two different ways:
- Consistent: changes to the index happen immediately. Query consistency is respected and Request Unit consumption is higher.
- Lazy: changes to the index happen asynchronously through a background process. Query consistency is eventual and Request Unit consumption is lower.
Index TypesCurrently the service supports 3 different index types:
- Hash: useful for equality and inequality predicates.
- Range: useful for ordering and range searches.
- Spatial: useful for spatial queries (within, distance, etc.)
Indexing ChangesThe last thing to understand about Cosmos DB is what happens when the indexing policy changes. There are three main characteristics:
- Online: there is no blocking or query throttling while the index is being changed.
- No performance impact: this is a big one, Cosmos DB does not take Request Units to change the indexing policy.
- Consistency: while the index transformation is happening, queries will be eventually consistent.