Primary key recommandations


I’m doing a project for a customer at the moment where database cleanup is a large part of the project. There are many issues in the project database, and primary key constructions are just one of them.

Definition:

A primary key is used to uniquely identify each row in a table. A primary key can consist of one or more fields on a table. When multiple fields are used as a primary key, they are called a composite key.

My guess is the 80-90% of all SQL tables world-wide use a single column primary key with some kind of key generation. The primary key should in most cases have auto-increamental vales when a row is created. This means that we don’t have to supply the value for the primary key column during inserting rows. At SQL Server the auto-increment function is given by the IDENTITY(n,m) keyword set in the CREATE TABLE statement.

CREATE TABLE Customer
(
    Id integer IDENTITY(1,1) PRIMARY KEY,
    Name varchar(50) NOT NULL
);

The datatype of the Id column is set to Integer. Unless you have more than 2 billion customers, the integer datatype is suffient.  But if your table most likeshall have over 2 billion row, the datatype bigint should be considered for the PK column. The DDL for this is shown below.

CREATE TABLE Person
(
     Id bigint IDENTITY(1,1) PRIMARY KEY,
     FirstName varchar(50) NOT NULL,
     LastName varchar(50) NOT NULL
);

 The datatype uniqueidentifier can also be used for a PK column. This datatype is commonly used when syncronization between databases, servers and clusters are important in the daily dba operations. This column can be populated equally the IDENTITY(n,m). To populate an uniqueidentifer column upon INSERT, set the default value to newid().

CREATE TABLE CurrencyRate
(
      Id uniqueidentifier NOT NULL PRIMARY KEY
                    DEFAULT newid(),
      RateDate datetime NOT NULL,
      FromCode varchar(10) NOT NULL,
      ToCode varchar(10) NOT NULL,
      RateValue [float] NOT NULL
);

For tables with large amount of data, the datatype uniqueidentifier should be reconsidered, and maybe using integer or bigint as datatype.The reason for this is that integer datatype on PK columns is better due to increased performance.

Hope this helps someone.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s