Unique identifer or INT as clustered index

Today I was asked about what is best to use for a clustered index and a primary key in a table created on e.g. SQL Server 2008 R2. Is it a Unique Identifier or an Int.

As usual my answer was… It Depends, but if we only look at index maintenance and other performanceissues, then I would always prefer an Int instead of an Unique Identifier.

So what else can we to than try to test different scenarios.

First we create a database.

 

use master
go

-- First let's see if the database
-- already exists and if we have to delete it.
if DATABASEPROPERTYEX('keyTest','Version') is not null
begin
    drop database KeyTest
end

-- Lets create the database
Create database keyTest
go

use keyTest
go

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

First step is to test if the database KeyTest already exists. If it does then we drop it and create it again.  Then we switch context to the keyTest database.

Now, lets create two tables that have the same data, but a clustered index that are build on different datatypes. We implement the clustered index by using the primary key keyword (there are a big difference between a clustered index and a primary key, but if we create a table with the primary key keyword, then a clustered index is also created.

-- Lets create the two tables
Create table IdentifierTBL        
        (    ID        uniqueidentifier        primary key
default newID() , Postalcode char(4) , Name char(50) ) Create table IntTBL ( ID int identity primary key , Postalcode char(4) , Name char(50) )

The two tables has the same columns, but in the first table the ID is a uniqueidentifier, that are 16 bytes long and in the first table it is an int, that are 4 bytes long.

Now let try to insert 10.000 rows in each of the tables.

 

-- Let's insert some data
insert into IdentifierTBL values (newid(), '1234', 
Replicate('A', 50)) go 10000 insert into IntTBL values ('1234', Replicate('A', 50)) go 10000

Thats it. We have two tables with 1000 rows. The only test we want to carry out now is to see the fragmentation and how many pages they take up.

select    t1.object_id
        , t2.name
        , t1.avg_fragmentation_in_percent
        , index_level, page_count
        , avg_record_size_in_bytes
from sys.dm_db_index_physical_stats(DB_ID(), 
null, null, null, 'Detailed') t1 inner join sys.objects t2 on t1.object_id=t2.object_id where t2.name in ('IdentifierTBL', 'IntTBL')

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

The result looks like this:

image

We can see that both tables consists of two levels, a leaf level and a root level. We can see that if we use an Unique identifier, then fragmentation is approx. 99%, and when we use an integer, then it’s approx. 2%.

We can also see that the table with an Int as the clustered key, takes up 92 pages and the average record size is 65 bytes. The same figures for the table with the unique identifier is 154 pages and 77 bytes in average recordsize.

But it does not stop here. What if we create a nonclustered index on the postalcode. What will happen to our physical structures?

USE [keyTest]
GO
CREATE NONCLUSTERED INDEX 
        [Nonclustered] ON [dbo].[IntTBL] 
(
    [Postalcode] ASC
)

CREATE NONCLUSTERED INDEX 
        [Nonclustered] ON [dbo].[IdentifierTBL] 
(
    [Postalcode] ASC
)

 

Lets see the characteristica

 

select    t1.object_id
        , t2.name
        , t3.name
        , t1.avg_fragmentation_in_percent
        , index_level, page_count
        , avg_record_size_in_bytes
from sys.dm_db_index_physical_stats(DB_ID(), null, 
null, null, 'Detailed') t1 inner join sys.objects t2 on t1.object_id=t2.object_id inner join sys.indexes t3 on t1.object_id=t3.object_id
and t1.index_id=t3.index_id where t2.name in ('IdentifierTBL', 'IntTBL') and t3.name='NonClustered'

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

 

The result is

image

What I am interested in showing you is the page count and the record size. As you can see, then the index is much bigger with a unique identifier (it uses 36 pages compared to the 20 pages). That is because the recordsize is  24 bytes compared to the 12 bytes that are used in the IntTBL. The difference is 12 bytes, and it is the difference between the 16 bytes of a unique identifier and the 4 bytes of an int.

What we can conclude is that the clustered index key is added to the nonclustered index and if it is 16 bytes, then 16 bytes is added; if it is 4 bytes, then it is 4 bytes that are added.

Why should we ever then use Unique identifier, because I have now showed you how much better an integer performs as a clustered key.

Well……. If I e.g. should have a database with disconnected users, e.g. a CRM system, then it could be a good idea to use a unique identifier. But it is not within the scope of this blog to write about that.

Søren

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s