Home > SQL Server 2008 > Partitioned Table Spinlocks

Partitioned Table Spinlocks

I was recently working on development of a fact table with 24 partitions and needed to hold up to 5 billion rows of data. 

The data received in the warehouse is through a file being loaded into a staging heap table.  One heap table is created per file, and there are several thousand files per day.

In the process of testing the load, I came across what I think was an interesting find.

For this part we are going to adopt loading one slice into the mail fact table at a time. This way we have a very controllable method of consuming server resources along with making the process as parallel as we desire. For the initial cut, I will build an SSIS package that has 24 execute sql statements, where each one will firstly select into a variable the insert statement (this makes testing much simpler) and then each of these will flow to another 24 execute sql tasks which will run the insert.

clip_image001

Example of 4 threads shown here

If this is a successful method then we can easily add/change the control flow to operate off a stack of slices to load into the fact table.

On running the first test, while I expected some drop off when loading concurrently over 24 threads (one per partition) I noticed straight away that there was a large drop in loading speeds when, which was unexpected.

Load times for the run of 3.2 million rows was around 17 seconds over a number of runs, which given 1 thread running on it’s own took 1 second, we clearly had a problem. I then decided to try running the package on the server itself. Given that we are only doing two execute sql statements per thread I didn’t imagine to yield much improvement. This actually resulted in a slower execution time of 26.8 seconds.

clip_image001[5]

Interestingly, the CPU usage was quite a lot higher (around 10% higher) when this was executing on the server compared with my 2 core desktop.

I then started looking into wait stats, latches and spins which showed very little until it came to the spinlocks. Below is an screenshot of latches, waits and spins

image

It is clear from the spinlock SOS_OBJECT_STORE that something is happening that perhaps shouldn’t be. We have a spin count of over 2 billion. Other runs of the test actually showed a much higher number of over 4 billion.

At this point I decided to perform some research on this particular type of Spinlock and found very little information. However, a blog by Thomas Kejser from SQLCAT Team (http://blog.kejser.org/2011/05/30/diagnosing-and-fixing-sos_object_store-spins-for-singleton-inserts/) Although this is related to Singleton inserts, following the diagnostic path is useful.

I thought from here that xPerf would be a good tool to give me a hint of which direction to go in, so after installing Windows Performance Toolkit on the server, I started logging in xPerf with the command

xperf -on latency -stackwalk profile

I then reran the workload and stopped xPerf with

xperf -d c:\temp\my.etl

As this was the first time of using xPerf and xPerfViewer on the server I set the symbols and symbols cache paths and started xPerfviewer.

After loading the file into xperf and loading symbols, I added the CPU by Process to the display and chose a position where I saw high and erratic CPU and brought up the summary table. After drilling into the SQL Server process I finally saw what the source of the spinlocks could be. We have very high weightings for AcquireLocksForInsert, and also for InsertRecordOnPage (expected since we are actually inserting data), and LocatePageForInsert (Could be contention here due to the locking).

The screenshot below highlights the problem functions.

image

In order to resolve this, I decided to review what was actually happening on the inserts, i.e. what kind of locks were being taken.

It appeared in the lock dmv’s that millions of KEY locks were being taken during the load, which certainly explains the amount of time spent in the “AcquireLocksForInsert”

To resolve this two actions were taken:

  • Change the CLUSTERED INDEX to not allow row/page locks.

CREATE CLUSTERED INDEX ix_cu_Target_partitioned
ON Target_table_partitioned (surrogate_key)
WITH (ALLOW_ROW_LOCKS = OFF, ALLOW_PAGE_LOCKS = OFF)
ON [ps_hash_pos_24](partition_key)

  • Alter the destination table to set the LOCK_ESCALATION=AUTO, which according to BOL will allow lock escalation to the partition level.

ALTER TABLE [dbo].[Target_table_partitioned] SET (LOCK_ESCALATION = AUTO)

Running the test again we get much faster execution:

image

Also, much less time spent on “AcquireLocksForInsert”

image

This is also backed up by the locking during the inserts, where there are only 5 PAGE locks, which escalated to HOBT(partition) locks, 1 for each Partition

image

This gives us an insert speed into the fact table of around 1.2millions rows/second

 

Some other useful posts

Allen Mcquire SQL server spinlock reference

http://blog.kejser.org/2011/05/30/diagnosing-and-fixing-sos_object_store-spins-for-singleton-inserts/

http://sqlblog.com/blogs/adam_machanic/archive/2009/10/30/sql-server-2008-lock-escalation-inserts-and-a-potential-bug.aspx

http://msdn.microsoft.com/en-us/library/ms184286.aspx

http://msdn.microsoft.com/en-us/library/ms190273.aspx

http://jmkehayias.blogspot.com/2008/12/troubleshooting-locking-in-database.html

http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

About these ads
  1. Pavel Chocholous
    October 11, 2011 at 12:54 pm

    Great article. Great article. Great article.

    We were having similar troubles, but it was just in LOCK_ESCALATION configuration. Can you share a little bit about the HW and row size of the test? I am just curious how much are we squeezing comparable results from comparable HW in our little bit different scenario.

    • October 11, 2011 at 10:45 pm

      Hi Pavel,

      Thanks for comment.

      The hardware for this is a 16 core HP DL580-G5 with a shared tier 3 SAN. The server was also shared for other applications, so this really is a baseline speed, but should scale on better hardware and storage.

      The fact table was roughly 8 integer, 1 char 8, 2 numerics and 2 smallint’s. So reasonably wide.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 88 other followers

%d bloggers like this: