Sunday, March 6, 2016

What is non clustered index in SQL Server with examples

Consider on this query:

SELECT * FROM Student WHERE Age = 18

Our data has sorted on column RollNo not on Age :( How to search the students in the table tblSudent (clustered index) for a give age efficiently? We got same problem as we had got for RollNo in the Heap. If problem is same then solution should be same that is store the data of the table tblSudent in the sorted order of column Age. But this time problem is little complex. If we will try to store data in the sorted order of column Age then sorting on column RollNo will disturb.

We need data to be stored on the basis of column RollNo for query:

SELECT * FROM Student WHERE RollNo = 111

We also need data to be stored on the basis of column Age for query:

SELECT * FROM Student WHERE Age = 18

We want both!! How is it possible? Any object can have only one physical organization.  Then what is solution? I'm telling what sql server does. It creates copy of table and stored data in a sorted order of column Age in the copied table and it doesn't change anything in tblSudent(clustered index). This new copied table is called non-clustered index.  Let's assume  name of copied table is tblStdentNCI_1 then copied or subset table will look like something:

tblStudent (Clustered index): Sorted on RollNo

tblStdentNCI_1 (Non-clustered index): Sorted on Age

FAQ about no-clustered index:

Q. What is non-clustered index?

Answer: It is a copy of clustered index table sorted on some columns. It is subset of clustered since we can add some or all columns of clustered index table in the non-clustered index tables.

Q. A table can have how many non-clustered indexes?

Answer: We can create any number of copies of a table. So a table (more accurately clustered index) can have any number of non-clustered indexes. But there is some max limitation. For example in sql server 2012, 999 non-clustered indexes can be created per table.

Q. Does all non-clustered indexes occupy disk spaces?

Answer: Yes

Q. What will happen if data is modified in base table (clustered index)?

Answer:If data is inserted, deleted or updated in clustered index same data is also inserted, deleted or updated in all non-clustered indexes.  There is no need to worry about data synchronization between clustered index and non-clustered indexes since sql server take this responsibility.

Q. Does non-clustered indexes are good or bad?

Answer: It may good for select statements or it may bad for insert, delete and update statement

Non-clustered index in depth:

It is logical organization of data in a table. A table can have 1023 non-clustered index. A non-clustered index can be of two types as:

1. Heap

2. Based on clustered index.

If table has clustered index then leaf node of non-clustered index keeps the key columns of clustered index. If the table has no clustered index then leaf node of non-clustered index keeps RID which is unique to each row of table. Such non-clustered index is called heap.

Non-clustered index based on clustered index:

Description of above image:

It is the B-tree representation of non-clustered index NCI_Student_Age where lower section of dotted line is also B-tree of clustered index CL_Student_Roll. Top node of non-clustered index is called root node and bottom nodes are called leaf node. All the nodes in between root node and leaf nodes are called intermediate nodes. A B-tree can have many intermediate nodes in case table has too many records. In this example there is no intermediate node and we have assumed size of each node is four i.e. each node can keep maximum five Age.

Important points about non-clustered index based on clustered index:

1. All the nodes keep key columns of non-clustered index and key column of unique clustered index.

2. If clustered index is not unique then instead of key column of clustered index it keeps a unique row identifier of the actual data rows of table.

3. Apart from points above, a leaf node also keeps the included columns of non-clustered index. (We will discuss about included column later)

Now if we will execute following sql query:

SELECT * FROM Student WHERE RollNo = 111

Since Student table has clustered index, now it will go for index scan instead of table scan to search the RollNo = 111. Scan will start from root node in the clustered index CL_Student_Roll.

SELECT Country

FROM Student WITH (INDEX (NCI_Student_Age))

WHERE Age = 31

Note: Here we have added WITH clause only because sql server uses the non-clustered index NCI_Student_Age,  otherwise it will not use non-clustered index or will use if table has too many records. We will discuss why is it so later.

Execution plan of above query will look like this:

Meaning of above execution plan:

Sql server will first use non-clustered index NCI_Student_Age and find out the corresponding value of key column i.e. RollNumber of clustered index (It is called key look up) using nested loop and find out the value of column Country.

Scan 1: 22 < 31 so it will go to next element of root node.

Scan 2: 24 < 31 so it will go to the next element of root node. Since there is not any other element in root node so, it will follow right most link or path.

Scan 3: In leaf node, 24 < 31 o it will go to next element of leaf node.

Scan 4: In leaf node, 24 < 31 o it will go to next element of leaf node.

Scan 5: In leaf node, 27 < 31 o it will go to next element of leaf node.

Scan 6: In leaf node, 27 < 31 o it will go to next element of leaf node.

Scan 7: In leaf node, 31 = 31. It will get value key column of non clustered index i.e. RollNo which it equal to 109.

Scan 9: From here it will use clustered index CL_Student_Roll to get the values of column Country name of RollNo = 31. In root node of clustered index 107 < 109 so, it will go to next element of root node.

Scan 10: 110 > 109 so it will go through this link.

Scan 11: In leaf node, 108 < 109 so, it will go to next element of leaf node.

Scan 12: In leaf node, 107 < 109 so, it will go to next element of leaf node.

Scan 13: In leaf node, 109 = 109 so it will follow this path and get the actual data that is value of column Country which is USA.

Simple syntax of creating a non-clustered index

CREATE [UNIQUE] NONCLUSTERED INDEX <NCI_IndexName>

    ON <ObjectName>(<ColumnName_Key>[ASC|DESC] [ ,...n ])

    [INCLUDE (<ColumnName_Include> [,...n])]

<IndexName>: It is any valid name of index.

<ObjectName>: It is name of the table or view on which we want to create the non-clustered index.

<ColumnName_Key>: It is name of the key columns for clustered index separated by comma. Values of these columns are present in each node of a clustered index. It has following restrictions:

1. We can specify maximum 16 column names.

2. Sum of size of the columns cannot be more than 900 bytes.

3. All columns must belong to same table.

4. Data type of columns cannot be ntext, text, varchar (max), nvarchar (max), varbinary (max), xml, or image

5. It cannot be non-deterministic computed column.

<ColumnName_Include>: Due to above restrictions on <ColumnName_Key> columns, we can include more columns of table or view in the INCLUDE section. Date of include columns are only present in the leaf node of clustered index. It has also some restrictions:

1. Data type of columns cannot be text, ntext, and image.

2.  A column cannot be present in key columns as well as can not include column at the same time. Also repetition of columns are not allowed.

Good practice: If possible we should try to add all columns of WHERE clause in key columns list and all columns of SELECT clause in the INCLUDE columns of a non-clustered index.

Covering of queries:

If we include all the the columns of a SELECT statement either as key columns or INCLUDE columns in non-clustered index such queries are called covered query. For example consider on query:

  1. SELECT Name,Country
  2. FROM Student
  3. WHERE RollNo = 108 AND Age = 27

Non-clustered index 1:

CREATE NONCLUSTERED INDEX NCI_1

ON Student (RollNo,Age)

This index doesn't cover the above SELECT statement.

Non-clustered index 2:

CREATE NONCLUSTERED INDEX NCI_2

ON Student (RollNo,Age,Name)

This index doesn't cover the above SELECT statement.

Non-clustered index 3:

CREATE NONCLUSTERED INDEX NCI_3

ON Student (RollNo,Age,Name,Country)

This index COVERS the above SELECT statement.

Non-clustered index 4:

CREATE NONCLUSTERED INDEX NCI_4

ON Student (RollNo,Age)

INCLUDE (Name,Country)

This index covers the above SELECT statement.

Now consider on last index we have created that is

CREATE NONCLUSTERED INDEX NCI_Student_Age

ON Student (Age)

This index doesn't cover the SQL query:

SELECT Country

FROM Student

WHERE Age = 31

Due to which SQL server doesn't use the index NCI_Student_Age. Execution plan of this query will look like:

 

 

SQL server follows Index scan which is same as table scan and it is a slow process.

If we will use forcefully non-clustered index NCI_Student_Age as

SELECT Country

FROM Student WITH (INDEX (NCI_Student_Age))

WHERE Age = 31

 

Its execution plan will look like:

 

Here problem is it is partiality index seek process that is, to get value of column Country it has to go through (key look up) the clustered index CL_Student_Roll since non-clustered index doesn't keep the value of column Country. We can optimize this query by creating a new index which covers the SELECT statement.

CREATE NONCLUSTERED INDEX NCI_Student_Country

ON Student (Age)

INCLUDE (Country)

 

After creating this index, execution plan will look like:

 

Now there is no key look up and it is fully index seek. Structures of index NCI_Student_Country will look like this:

We can observe key columns of clustered index keep all the nodes while INCLUDE columns are only kept by leaf node. To perform this query:

SELECT Country

FROM Student

WHERE Age = 31

SQL server will not use clustered index since data of Country columns are present in non-clustered index NCI_Student_Country. So, there is no need of key look up.

Non-clustered index based on heap:

A table without any clustered index is called heap. It is possible to create a non-clustered index on such tables  which do not have any clustered index.  Main difference between non-clustered index on clustered index and non-clustered index on heap is, it keeps the RID (i.e. Row ID) in the leaf node of non-clustered index instead of key column of clustered index which is pointer to the actual data rows. Suppose we have a Student table which do not have any clustered index. Now we want to create a non-clustered index:

CREATE INDEX NCI_Heap

ON Student (Age)

If we will check the execution plan of this query:

SELECT Country

FROM Student WITH (INDEX (NCI_Heap))

WHERE Age = 31

 

It will look something like this:

 

From execution plan, we can observe that it is using RID look up instead of key look up.

Note: RID of each row of a table is generated by SQL server using file identifier, page number and number of the row on the page etc.

Organizations of non-clustered index on heap will look something like this:

We can remove the RID look up in the execution plan by covering the query in non-clustered index on heap that is:

CREATE INDEX NCI_Heap_Country

ON Student (Age)

INCLUDE (Country)

 

Now execution plan of query:

SELECT Country FROM Student WHERE Age = 31

Will look like:

 

 

 

URL: URL: http://www.exacthelp.com/2012/01/what-is-non-clustered-index-in-sql.html

 

No comments:

Post a Comment