Monday, February 7, 2011

SQL Server Fundamentals of Data Storage

How SQL Server stores your data is what I am trying to explain in this post.

First lets start from the beginning , say first lets create a new database called test and specify the initial size to be 4MB for .mdf file which is called the primary file for the db and you can have .ndf files called secondary files for data but lets keep it to .mdf files and log file of say 1MB.



You can see my primary file is of 4 MB now as seen below.






Now we will go ahead and create a simple table called employee and see how the data for it is stored.

CREATE TABLE DBO.EMPLOYEE
(
EMPLOYEEID INT IDENTITY(1,1),
FIRSTNAME VARCHAR(50) NOT NULL,
LASTNAME VARCHAR(50) NOT NULL,
DATE_HIRED DATETIME NOT NULL,
IS_ACTIVE BIT NOT NULL DEFAULT 1,
CONSTRAINT PK_EMPLOYEE PRIMARY KEY (EMPLOYEEID),
CONSTRAINT UQ_EMPLOYEE_LASTNAME UNIQUE (LASTNAME, FIRSTNAME)
)

So from the above it is clear that we have a primary key and a unique keys created, now lets add some data to it.
GO
INSERT INTO DBO.EMPLOYEE (FIRSTNAME,LASTNAME,DATE_HIRED)
SELECT 'George', 'Washington', '1999-03-15'
GO
INSERT INTO DBO.EMPLOYEE (FIRSTNAME,LASTNAME,DATE_HIRED)
SELECT 'Benjamin', 'Franklin', '2001-07-05'
GO
INSERT INTO DBO.EMPLOYEE (FIRSTNAME,LASTNAME,DATE_HIRED)
SELECT 'Thomas', 'Jefferson', '2002-11-10'
GO

Now lets get into the details and find how the data is stored.

DBCC IND ('test',employee, -1); gives output like below








Col1 - PageFID - is 1 as we only had one mdf file and no .ndf file so all the pages are sitting on the .mdf file.
Col2 - PagePID - so our 4MB initial file size is divided into 512 pages (512 * 8kb = 4096 kb = 4MB), So our page numbers 153, 154,155,156 are being used for the Employee table data.
Co10l - PageType - 1,2,10 these are the 3 different page types shown.

In SQL pages are either heap or B-tree, if heap then the pages just sit one after the other so to find data the database engine will have to walk through each page as they are
not stored as per index or any other criterion. Where as pages that are stored as B-tree are the ones which have clustered index or non-clustered index defined on them.

Now if there was any heap page used in this table to store the data the Index ID column would have 0 which is the heap index ID, 1 for clustered and 2 for non-clustered index pages.

Now pageType of 1 along with indexID 1 tells us that this is a data page with data stored using clustered index, whereas pageType 2 with indexId 2 tells us that the data in this page is stored
As a B-tree only but non-clustered index is used..for storing data, even though we did not specify any non clustered index the unique key constraint creates the non-clustered index..


And Page Type 10 are IAM page (out of scope for now but they maintain as to whether an extent belongs to the allocation unit that the IAM page is sitting on…this is useful to
Quickly tracking pages when needed for storing data in a allocation unit….)

You can further check the data in any page for example you wanna see how data is stored in the page 153 the data page…

DBCC PAGE ('test', 1,153,3) WITH TABLERESULTS ;

You need to scroll down to see the records that you entered in the table…as most of the top part of the page shows you the header which contains page no and details
Of the GAM,SGAM PFS and other pages in the .mdf file and other imp information like

AllocUnitId 72057594039697408
PartitionId 72057594038779904
IndexId 1
ObjectId 2105058535







Hope it helped I wrote this pretty quickly a bit busy with something, If u u understood something and want things in more detail let me know….i will write some more structured articles on this a bit later

Doubts are welcome….


Cheers,
Bala






No comments:

Post a Comment