Get Started with Azure Cosmos db - SQL API

Azure Cosmos DB

Introduction

An Azure Cosmos DB is a document-oriented and a non-relational database. It does not model data in tabular format instead the data resides in a document form which is a JSON format. So unlike the SQL server, the data in cosmos DB does not reside in the table-column structure.
It extends Azure DocumentDB which was released in 2014. it is schema-less and generally classified as a NoSQL database. Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service "for managing data at the planet-scale" launched in May 2017. It uses query language similar to SQL and supports ACID property of transactions.

Benefits of Azure cosmos DB
There are several benefits of Azure Cosmos DB, most of them are written below -
  • Turnkey global distribution-replication across any number of Azure regions
  • High Availability and responsiveness and mission-critical application
  • Multiple data models and multi API support including graph and table API
  • On-demand storage( In any region worldwide) as per the size requirement
  • Elastically and independently scale throughput (easily scale DB throughput at a per-second granularity)
  • Index management
  • No worry about the database schema
  • Latency guarantees (90% of read in less than 10 milliseconds)
  • Tuneable consistency with the ACID-compliant transaction
  • Backup every four hour
  • Azure active directory support
  • High scalability0 Horizontal partitioning
  • .Net SDK supported libraries
  • Local cosmos DB emulator 

STEP 1: CREATE A COSMOS DB ACCOUNT ON AZURE PORTAL

Go to your Azure portal, and Under All services, search Cosmos DB:


As soon as you type cosmos, you will see “Azure Cosmos DB”.
Click on it. In the next screen click on +Add button:
Add button opens up a new create azure account blade as below.

Enter the subscription, resource group, a unique account name, API, Location etc. details.
Cosmos DB has compatibility with several different APIs like SQL, Mongo DB, Cassandra etc. Here SQL API is the native cosmos DB language.

I am selecting SQL as an API for my cosmos DB account. I am not enabling Geo-Redundancy because this is something we consider for a real production environment.

And click on Create-


Note: Step 1 will not ask for a pricing plan, because setting an account is just like setting a server. We don’t pay for the server, but we pay for the database inside this server.

So, here in the below screen, you can see, I have created a new Cosmos DB Account “anucosmosdb” in West Europe location, which is already highlighted in blue in selected region map view in the Overview Blade.


STEP 2: Create COLLECTIONS INSIDE THIS COSMOS DB ACCOUNT


Now we have to create a collection, and for this setup, we have to pay to Microsoft.
A cosmos DB may have several databases, and each database may have one or more collections, and collection holds by a number of documents.

Add Collection


Click on Add Collection button under the Overview blade of your cosmos DB.


This will open a new Blade in the right aide area:


I named my database id as cosmosDB1 and to the collection as cosmosCollection1. You can give it any name.

Storage Capacity

  •      Fixed capacity – Choose this option if you do not expect to grow your data in the future or you are planning to do partitioning your own. It has one partition and it can’t grow dynamically.
  •      Unlimited Capacity – Cosmos DB automatically handles the partitioning server-side(on Azure),  and it is fully simulated on the local emulator.

Throughput

This is a request Unit (RU of throughput. This is a performance level which you want this collection to be delivered. The minimum throughput value is 400. Lower value delivers lower cost and lower performance and higher value deliver higher cost and high performance.

Partition Key

This is used to group multiple documents together within the physical partition.

Here I have created a collection cosmosCollection1.
Now click on that collection and then expand this collection. Let’s create a new document by clicking on the “New Document” button as below screen :


New Document will show an area, where I can type in my first document data in JSON. Here each document needs a unique Id, let’s start with ID and other data. If we leave this part, cosmos DB generates a GUID key for this document.
I am using sample data of the library management system and click on save.


As we can see a  unique GUID is generated by cosmos DB.
Cosmos DB will automatically assign an ID property to a GUID if you don't supply one yourself but in addition, Cosmos DB includes a few more auto-generated properties like _rid, _self, _etag, _ts etc.
  •  _rid : This is a unique ID that you cannot set and it never changes. So, every resource really has two IDs, the regular ID which you can set and even modify, and the resource ID which is always set automatically by Cosmos DB and is immutable. It won't ever change.
  •  _ts : It holds the date and time that the resource was last updated.
  •  _etag : This is used internally by Cosmos DB for optimistic concurrency checks. The etag is a binary value that always changes whenever the resource changes and so Cosmos DB can tell when one user is going to overwrite changes by another user by detecting a change in the etag. When this happens, the second user's update gets rejected which protects against overwriting the first user's changes.
  •   _self : The self-property represents the full path of the resource, expresses a hierarchy of nested resource IDs.
  •   _attachments : This property which when appended to the self-link, points to the attachments for the resource.
Now I have created one more document with the same data, but I added one property Author in the last. And We can see, cosmos DB created a new document with a newly added guid.


So now we can see there are two documents added with two different unique guid IDs.(See on the left side area)

Now we can query using SQL, here c is an alias which belongs to our collection. And in Result, we get one matching document. 
below is our Select statement to see how it works with where clause:


In the above example, we fetched the document using where clause by ISBN number. And in the result, we got a matching document.
This was the basic idea of how collection and SQL query works. But in general, we do not write our documents like this. This editor sometimes can be useful for updating a little piece of code.

API Used: Here in the above example we used SQL API, which treats entities as JSON documents and these are hierarchical text documents formatted in JSON.
There are other APIs too, we will learn them in detail.

Performance measuring features in CosmosDB


LATENCY

Latency means wait time or the time of getting a response to our request.
Response for a given request is faster in cosmos DB because data is distributed globally. Cosmos DB supports a range of consistency levels which provides control over latency. Consistency means the data you are reading any point of time is the latest or not.
These consistency levels are :

HOW TO SET CONSISTENCY LEVEL IN AZURE COSMOS DB

By default, “Session” consistency has been set on Azure cosmos DB, but as per your database need you can overwrite it by clicking on “Default Consistency” link. You can do this change from .net SDK also by choosing the overloaded document class constructor which accepts consistency level enum. From the Azure portal end, you will see the following screen where you can overwrite it -


THROUGHPUT

Throughput means how many requests can be served within a specific period of time. It ensures how cosmos DB handles concurrent database requests. Its unit is RU. RU depends upon how many numbers of request units per second that you asked for and pay for when you create a data container in cosmos DB.

REQUEST UNITS

Request Unit (RU) can be said as a throughput currency because it’s collectively measures of computational cost for example CPU, memory, disk, I/O. network I/O etc.
Note: Request-Units are not requested but rather all requests get charged in the form of request units. All requests are not equal, because all requests have different computational cost. 
For example: Write is always cost more than Read, running a query on indexed data is always cost less than query over non-indexed data.

But they are deterministic if you run the same request again and again because they require the same number of RUs every time.
Request charges are very transparent. For example in our previous query we can see RUs by clicking on Query stats tab :



So we can see here, the RU cost is 2.96RUs.


Multi-model, multi-API support



Azure Cosmos DB natively supports multiple data models including Document(JSON/BSON), graph, table, column-family, but the core content-model of database engine based on is ARS which is atom record sequence.


Difference between a Relational database and Document DB



Relational database
Document database
Tabular format. Rows
Documents in JSON format
Column
properties
Strongly typed schema
Schema-free. No pre-defined schema
Highly normalized. No column duplication. Data retrieval from multiple tables by using joins.
Typically denormalized. No need for ORM nor multiple joins. JSON document uses an array of embedded objects.

Cosmos DB resource Model

Documents in the cosmos DB are managed by a well-defined hierarchical structure of resources. These resources are highly available while allowing us to address each resource by a unique URI. We can understand this resource hierarchy by the following diagram :

At the top level, there is Account. Account’s name is used to compose the URI to access all child resources. After that, there is a Database, which is nothing but a logical container for collections. One or more containers can reside inside a single database. After that, there is a collection, which holds a number of related documents. It’s a level where you select a pricing tier. After this, there is a Document in the hierarchy which is a single entity represented by JSON.

RESOURCE GRANULARITY

Databases can contain Users who access to resources in the database. Each user can also contain Permissions that can restrict access to specific resources or operations. For example, you can restrict a user for read-only access to a collection.
A collection stores various programmatic logic in the form of stored procedure, functions, triggers more like a SQL which you can reuse throughout various queries. But they are scoped only within the partition key.


Different APIs


Azure Cosmos DB can be accessed using four different APIs.
  • Azure Cosmos DB (SQL) API
  • MongoDB API
  • Graph (Gremlin) API
  • Tables (Key/Value) API

Introduction to SQL API


The SQL API in Azure Cosmos DB is a JavaScript and JSON native API based on the DocumentDB database engine. The SQL API also provides a query capability which is rooted in the structured query language. 

But SQL is designed for relational databases, and Cosmos DB is a NoSQL database. So, what is the difference?
Well, they adapted the SQL syntax for cosmos DB but rooting this SQL into JavaScript and JSON. The SQL API supports the execution of JavaScript logic within the database in the form of stored procedures, triggers, and user-defined functions. JavaScript logic can be executed in a transactional manner directly within the database engine.

The language still looks like a familiar SQL, but the semantics are all based on hierarchical JSON documents rather than relational tables, and you work with the JavaScript data types rather than SQL datatypes. Expressions are evaluated as JavaScript expression rather than T-SQL expression. If you're using the .NET SDK, then there's also a LINQ provider that can translate language-integrated queries that you write in C# into Cosmos DB SQL.
At the end, you will see several keywords from SQL, but JavaScript keywords are very limited.

HOW SQL WORKS IN DENORMALIZED DATABASE WORLD OF COSMOS DB?

In cosmos DB, we don’t deal with row-column based structure of data, but we deal with semi-structured hierarchical documents with nested arrays and child objects. Here we use dotted notation in our queries to navigate a path to any property no matter how deeply nested it may be within the document.

REST API

Azure Cosmos DB exposes resources through the REST APIs that can be called by any language capable of making HTTP/HTTPS requests. The basic HTTP methods used in REST can also be used to access resources in Azure Cosmos DB.
The SQL API provides a rich SQL query interface for retrieving JSON documents and server-side support for transactional execution of JavaScript application logic. The query and script execution interfaces are available through all platform libraries as well as the REST APIs.
Microsoft offers programming libraries for several popular languages for the SQL API. Currently available libraries are :
  • .Net SDK
  • Java SDK
  • JavaScript SDK
  • Python SDK
Accessing Resources from REST-based URIs Concept

As described previously (See figure-2 ), each resource in the hierarchy can be accessed using a URI, its concept is based on REST API design. We can understand this URI concept by the following diagram :

Above examples shows how these URIs work:


Note: Here we took sample IDs, but the ID of a resource is not based on the name you give in the portal or using a script. It is uniquely generated by the Azure Cosmos DB system.

Here in the above example, you may have noticed that the IDs used by Azure Cosmos DB “build” upon each other. For example, the Collection id begins with the Database id as a prefix. 


Have a great learning with Azure Cosmos DB basics..!!

Comments