Learning Corner #2 - Metadata storage

Hello,

In this post, I want to mention a few things about the data storage metadata in SQL Server, or metadata storage, as the title mentiones. What’s that, you say? 🙂

Well, to be more exact, my goal is to cover the following DMVs, with some meaningful examples and explanations. And maybe at the end, I will draw a pretty picture, to keep this functionality in mind:

indexes
partitions
allocation_units (+ sys.system_internals_allocation_units)

In order to check what can be found in these views, I will follow the next steps:

Create an empty database (I am using SSMS for this, because it’s a lot faster than doing it from a query, but remember: the most secure way to work in SQL Server is by using code snippets, which you can save and reuse at a later point in time. I will stress that as many times as I have the opportunity). If you want to create it from a script, just paste the code below:

CREATE DATABASE TestDB2

Check the values from the three views, when there isn’t anything in the database yet:

SELECT * FROM sys.indexes
SELECT * FROM sys.partition
SELECT * FROM sys.allocation_units

What we see, is that there are some rows in these views, even though the database has just been created. The indexes view has 146 rows on my new database, sys.partitions has 145 and the allocation units has 170.

The next step is to create a table and see what happens in this views afterwards. I am using the following query for table creation:

CREATE TABLE dbo.Employee
(
	LastName varchar(25) NOT NULL,
	FirstName varchar(15) NOT NULL,
	Address varchar(100) NOT NULL,
	PhoneNr char(12) NOT NULL,
	JobLevel smallint NOT NULL
);

And then I query again the view, to see what changed. The first change that we see is that the number of rows increased by 1 in the views. But let’s see with what.

For sys.indexes, I will use a WHERE clause, to search for changes related to my Employee table.

SELECT * FROM sys.indexes
WHERE OBJECT_ID = OBJECT_ID(N'dbo.Employee')

The following row was inserted in the indexes view:

The things that catch my eye in this row are the following:

The value from the name column is NULL
Index_id = 0
Type = 0
Type_desc = HEAP

After searching a bit on the sys.indexes table (https://msdn.microsoft.com/en-us/library/ms173760.aspx), I learn the following:

Column name

Data type

Description

object_id

int

ID of the object to which this index belongs.

name

sysname

Name of the index. name is unique only within the object.

(NULL = Heap)

index_id

int

ID of the index. index_id is unique only within the object.

0 = Heap

type

tinyint

Type of index:

0	Heap
1	Clustered
2	Nonclustered
3	XML
4	Spatial
5	Clustered columnstore
6	Nonclustered columnstore
7	Nonclustered hash

type_desc

nvarchar(60)

Description of the index

So, to sum up:

every time we create a new table in SQL Server, a new row is created in sys.indexes, even though the table does not (yet) have an index.
a table without a clustered index is called a HEAP.
The index_id is unique in the context of the table where it’s created. This means that if we see two rows with index_id = 13 for example, we MUST also look at the object_id, to match the table to which they belong.
Index_id = 0 means that that’s actually the row for the table (for the heap) and we can also take a look at the name column, to make sure that it’s NULL. For the other “real” indexes, the name column contains the name of the index.

Sys.partitions

 SELECT * FROM sys.partitions 
WHERE OBJECT_ID = OBJECT_ID(N'dbo.Employee')

These are the results:

This view contains a row for each partition of all the tables and most types of indexes in the database. Special index types such as Full-Text, Spatial, and XML are not included in this view. All tables and indexes in SQL Server contain at least one partition, whether or not they are explicitly partitioned.

And finally, let’s query the allocation units. Unfortunately, this view must be linked to the sys.partitions one, because otherwise it is not readable with ease. So I will make the following query:

SELECT
  OBJECT_NAME(object_id) AS TableName,
  partition_id AS PartitionId,
  partition_number AS PartitionNumber,
  rows AS NumberOfRows,
  allocation_unit_id AS AllocationUnitID,
  type_desc AS AllocationUnit_Description,
  total_pages AS TotalPages
FROM
  sys.partitions PART INNER JOIN
  sys.allocation_units ALLOC ON 
      PART.partition_id = ALLOC.container_id
WHERE object_id = object_id(N'dbo.Employee');

This simple table shows only one row because it has only one partition, no nonclustered indexes, and only one type of data (IN_ROW_DATA). Here is the result:

Every partition in a SQL Server table can contain 3 types of data, each stored on its own set of pages. And each of these types of pages is called an Allocation Unit. Below are the 3 types of Allocation Units.

IN_ROW_DATA
ROW_OVERFLOW_DATA
LOB_DATA

So, an allocation unit is basically just a set of particular type of pages.

Now, to test that values are added in these DMVs, let’s create add some more interesting data: larger columns, more indexes and data in the table.

Let’s add some new columns, that can’t be fit inside a single page (a page has a fixed length of 8KB and can contain rows of a maximum of 8060 bytes. Also, if a row’s length is more than 8060, it cannot be stored on a normal page – IN-ROW-DATA, but it will be stored on a row-overflow page).

ALTER TABLE dbo.Employee ADD LongAddress varchar(8000); 
ALTER TABLE dbo.Employee ADD FirstImpressions text;

The allocation units has new rows now:

Let’s add some indexes now:

 CREATE NONCLUSTERED INDEX [IDX_JobLevel_NC] 
ON [dbo].[Employee]( [Address] ASC, [JobLevel] ASC )

CREATE CLUSTERED INDEX [IDX_Name_Clustered] 
ON [dbo].[Employee]( [LastName] ASC, [FirstName] ASC )

The sys.indexes view has changed: it does not contain a single row for the heap anymore, but it contains two rows, each for the index that has been created. The name column is not NULL anymore, in neither of the two cases.

The partitions table also contains more rows now: the initial row is now used for the clustered index, while a new row has been added for the nonclustered index. Also, the index_id column has changed.

A query joining all these DMVs, which tells us some useful information is the following:

SELECT
OBJECT_NAME(IND.object_id) AS TableName,
IND.name AS IndexName,
IND.index_id IndexID,
IND.type_desc as IndexType,
PART.partition_id PartitionID,
PART.partition_number AS PartitionNumber,
PART.rows NumberOfRows,
ALLOC.allocation_unit_id AS AllocationUnitID,
ALLOC.type_desc as PageTypeDescription,
ALLOC.total_pages AS NumberOfPages
FROM
sys.indexes IND INNER JOIN
sys.partitions PART ON IND.object_id = PART.object_id 
   AND IND.index_id = PART.index_id INNER JOIN
sys.allocation_units ALLOC 
  ON PART.partition_id = ALLOC.container_id
WHERE IND.object_id = object_id(N'dbo.Employee');

The results of this query are:

What if we add some rows?

I have inserted 3 rows in this table, using the following query:

INSERT INTO [dbo].[Employee] (LastName, FirstName, Address, PhoneNr, JobLevel, LongAddress, FirstImpressions)
VALUES
  ('Mouse', 'Mickey', '7484 Roundtree Drive', '849-555-0139', 1, '7484 Roundtree Drive but with a much longer text here', 'I am writing the first impressions of a new employee')
 ,('Mouse', 'Minnie', '1399 Firestone Drive', '181-555-0156', 2, '1399 Firestone Drive but with a much longer text here', NULL)
 ,('Duck', 'Donald', '5415 San Gabriel Dr.', '330-555-2568', 3, NULL, NULL)

And now, we have the following information:

We can see that the actual number of rows is presented correctly in this view and number of pages that were used to store this information.