SQL Server and I/O

I've been doing a lot of reading up on optimal I/O configuration, for SQL Server in general.  Over the years, I've setup systems on many types of storage: dedicated SAN, shared SAN, DAS, and local RAID configurations.

A bit of background on SQL I/O:

SQL Server generally (except for non-mixed extents) allocates space an extent at a time. An extent is eight 8K pages, or 64K. SQL Server often performs reads in extent-sized (64K or 128K) boundaries and uses scatter-gather I/O operations.  SQL Server also takes advantage of read-ahead operations.  More information on Read-Ahead can be found here: http://msdn.microsoft.com/en-us/library/ms191475.aspx.

This is the best article I've read regarding SQL I/O: http://technet.microsoft.com/en-us/library/cc966412.aspx.

Below is a quick list of how to look up some of the parameters mentioned in the SQL Server Best Practices Article:

  1. Stripe unit size
    This must be obtained from SAN manager or possibly in the storage management software for local RAIDs.
  2. Partition Offset
    The following can be run from a command line to identify the Partition Offset:
    wmic partition get BlockSize, StartingOffset, Name, Index
  3. File Allocation Unit Size (bytes per cluster)
    The following can be run from a command line to identify the Allocation Unit Size:
    fsutil fsinfo ntfsinfo [DriveLetter]:\
  4. Partition Alignment
    Gathering the above values is useful for verifying partition alignment.  This used to be a bigger issue than it is now because since Windows 2008, Windows performs auto partition alignment.

    In order to verify Partition Alignment, the results of the following calculations must result in an integer value:
    Partition_Offset ÷ Stripe_Unit_Size
    Stripe_Unit_Size ÷ File_Allocation_Unit_Size
Below are the general guidelines I follow:

  1. I use a 64K allocation unit size for data, logs, and tempdb, unless there is planned storage of small files on the same volume.  A 4K file on a volume with a 64K allocation size will use 64K, so you’ll end up wasting a lot of space in that scenario.
  2. Verify a 64K stripe unit size.
  3. I also usually perform a simple I/O benchmark prior to installing SQL server, just to validate that there aren't any glaring issues with the initial storage configuration.  When I have time or a need to get very detailed information I run SQLIO.  In most situations however, I've recently found myself just running a quick test with CrystalDiskMark.  For comparison of my CrystalDiskMark results I check the charts on Toms Hardware:
If storage is obviously slow, here are some good places to start troubleshooting:
  1. Adding disks or re-configuring the RAID for local storage or a DAS. 
  2. Tiered SAN storage.  SANs usually are tiered for various levels of performance.  Fixing a slow storage issue can sometimes be as simple as asking the SAN administrators to move your storage to a  faster tier.
  3. HBA Queue depth tuning - testing has shown substantial gains in I/O performance when increasing this to 64 or even higher.
  4. HBA
    In some cases I have found it useful to probe a SAN administrator with these queries:
    How many HBAs are configured?
    What is the bandwidth of the HBAs?
    How many fiber channel switch ports and front-end fiber channel ports on the array?
    What is the bandwidth of the ports?
    What is the currently configured queue depth and cache ratio?
  5. LUN
    Asking the SAN manager questions regarding these configurations can get them taking a closer look and performing more in-depth tuning of the SAN.
    How many physical disks and at what rotational speed are backing each LUN?
    Is the access to the LUN load balanced across all service processors or are these statically assigned to a single service processor in the array?
    How is the cache configured at the LUN level? (SQL Server does not benefit as much from the read cache due to read ahead)
    Is the LUN sharing physical spindles with other applications that use the storage array (Shared SAN)
    Is multipath software used by the host and, if so, which mode is the software running (load balanced or failover only)?

No comments:

Post a Comment