Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract
The Storport driver, new to Microsoft Windows Server 2003, delivers greater performance in hardware RAID and SAN environments than the preexisting SCSIport driver was capable of delivering. This white paper begins by explaining the limitations of the original SCSIport driver architecture when used with interconnects for which it was not designed. The paper then details the architectural improvements of the new Storport driver, which has been developed to deliver high throughput and CPU-efficient I/O in high performance environments. This paper will be of interest to OEM hardware and software developers, as well as to customers who are interested in encouraging their storage vendors to support high performance solutions in a Windows environment.
Contents
Introduction..................................................................................................................................... 1 Windows Storage Drivers............................................................................................................... 2 SCSIport Driver........................................................................................................................ 3 Adapter I/O Limit................................................................................................................ 3 Sequential I/O Functioning................................................................................................. 4 Increased Miniport Load at Elevated IRQLs......................................................................4 Data Buffer Processing Overhead.....................................................................................5 I/O Queue Limitations........................................................................................................ 5 Impact on SCSI Performance............................................................................................ 6 Storport........................................................................................................................................... 7 Synchronous I/O Functioning................................................................................................... 8 Effective Miniport Functioning.................................................................................................. 9 Offloaded Build at Low IRQL.............................................................................................. 9 Single Pass Scatter/Gather List Creation...........................................................................9 Flexible Map Buffer Settings................................................................................................... 11 Queue Management............................................................................................................... 11 Improved Error and Reset Handling.......................................................................................12 Hierarchical Resets.......................................................................................................... 12 Improved Clustering by Using Hierarchical Resets..........................................................12 Fibre Channel Link Handling............................................................................................13 Ability to Run Deferred Procedure Calls (DPCs)....................................................................13 Registry Access...................................................................................................................... 13 Fibre Channel Management................................................................................................... 14 Easy Migration to Storport...................................................................................................... 14 Performance Comparisons........................................................................................................... 14 Measuring Storport Performance...........................................................................................15 Host-Based RAID Adapter...............................................................................................15 Summary...................................................................................................................................... 16 Related Resources....................................................................................................................... 16
Introduction
Storage adapters and storage subsystems are not all created equal. In environments where information transfers between the computer, and storage must be maximized for speed, efficiency, and reliabilitysuch as in banking or trading businesseshigh performance interconnects that maximize I/O throughput are critical. Such organizations use storage area networks (SANs), usually with Fibre Channel cabling and interconnects (links) in redundant configurations, and storage arrays with hardware based RAID (redundant array of independent disks) for high availability and high performance. Having the highest performing equipment helps to ensure that high performance needs can be met, but it doesnt guarantee them. The functioning of the storage network is also dependent on the capabilities of the host operating system, specifically the host operating system drivers that interface with the storage hardware to pass I/O requests to and from the storage devices. This is especially important in Fibre Channel SANs, where a complex system of switches and links between servers and storage requires an effective means of detecting link problems and eliciting the appropriate response from the operating system In the Microsoft Windows operating system, the SCSIport driver, in conjunction with vendorwritten adapter-specific miniport drivers, was for many years, the only driver delivering SCSI commands to the storage targets. The SCSIport driver, however, was designed to work optimally with the parallel SCSI interconnects used with direct attached storage. It was neither designed to meet the high performance standards of Fibre Channel SAN configurations, nor to work well with hardware RAID. As a consequence, organizations running mission critical Windows applications on their servers do not realize the maximum performance benefits or manageability of their Fibre Channel SANs or hardware RAID adapters (on both the host and storage arrays) when I/O passes between the host and storage target. These limitations have been overcome with the development of Storport, the new device driver designed to supplement SCSIport on Windows Server 2003 and beyond. Storport is a new port driver that delivers higher I/O throughput performance, enhanced manageability, and an improved miniport interface. Together, these changes help hardware vendors realize their high performance interconnect goals.
Figure 1. The I/O Request Path Through the Storage Stack. The I/O manager processes and/or passes the I/O request packet (IRP) on to each lower driver layer, which successively routes the I/O to the: Correct device class driver Correct I/O port driver (such as SCSI or USB) Adapter-specific miniport driver
Class drivers manage a specific class of devices, such as disk or tape, ensuring that I/O requests are sent to the appropriate device type in correct fashion. I/O requests are then passed on to a protocol-specific port driver. Windows provides drivers for a number of transport types, including SCSI, IDE and 1394. The port driver can do one of the following: 1) complete the request without passing the request on to lower layers (if no data transfer is necessary), 2) queue the request on behalf of the storage device controller if the hardware is busy; or 3) pass the requests on to a hardware-specific miniport driver (written by an adapter vendor), which directly controls access to the hardware. The miniport driver, the lowest layer in the storage stack, is the actual device driver that acts to translate the I/O request into a physical location on the vendors hardware. Once the I/O request has been carried out in the hardware, several I/O completion routines complete the I/O path.
SCSIport Driver
SCSIport is the Microsoft Windows system-supplied storage port driver, designed to manage SCSI transport on parallel SCSI interconnects. During the StartIo routine, the SCSIport driver translates the I/O request packet into a SCSI request block (SRB) and queues it to the miniport driver, which decodes the SRB and translates it into the specific format required by the storage controller. The start I/O phase of the request, which includes build and start (see Figure 2), takes microseconds; hundreds of SRBs can be queued, ready to be processed before even a single I/O request is completed. In fact, the longest phase of the I/O request is the data seek time (latency); if the data is not in cache, finding the correct blocks on the physical disk can take several milliseconds. (Note that the diagram shows relative time, not actual time units.) Once the hardware processes the I/O requestthat is, does the data transferthe controller generates a hardware interrupt indicating that the I/O has been completed. The interrupt is, in turn, processed by the HwInterrupt miniport routine (indicated as ISR or Interrupt Service Routine in the diagrams), which receives the completed requests and begins the whole process again. Data transfers are performed by the hardware itself (using Direct Memory Access or DMA) without operating system intervention.
Figure 2. Phases of an I/O Request (not to scale, relative durations shown) While the SCSIport driver is an effective solution for storage via parallel SCSI, it was not designed for either Fibre Channel or hardware RAID, and when used with these adapters, the full capabilities of the high performance interconnect cannot be realized. The nature of the performance limitations and their causes are detailed in the following sections.
Figure 3. SCSIport: Sequential I/O Functioning In single processor systems, the SCSIport requirement that the start I/O routine be synchronized with the interrupt service routineso that only one of these routines can execute at any one time has negligible impact. In multiprocessor systems, however, the impact is considerable. Although up to 64 processors may be available for I/O processing, SCSIport cannot exploit the parallel processing capabilities. The net result is considerably more I/O processing time than would be required if start I/Os and interrupts could be executed simultaneously rather than sequentially.
The HwStartIo routine always executes with the interrupt request level (IRQL) of the processor at the same priority level as the interrupt request of the device. Because all interrupts with the same or lower priority are masked to enable a higher priority process to complete without interruption, the elevated IRQL means that hardware interrupts accumulate rather than being processed. With parallel SCSI adapters, this has minimal impact, since there is very little additional work for the miniport driver to do. However, when using Fibre Channel or hardware RAID adapters, the workload on the miniport driver is much heavier; as a consequence, considerably more time is spent at an elevated IRQL. The net result of high numbers of accumulated interrupts is degraded system performance.
Figure 4. Successively More Restrictive Queuing in the SCSIport Driver Model The drawbacks to the SCSIport queuing process are several. First, each I/O request must queue for access to a spinlock not just once, but twice. Second, the adapter queue restricts I/O throughput to a maximum of 254 requests per adapter, on a first in-first out (FIFO) basis. For high performance adapters, which can process thousands of requests at a time, this can be a serious performance limitation. And third, SCSIport does not provide any means by which to manage device queues to improve performance under conditions of high load or to temporarily suspend I/O processing without accumulating errors. A consequence of this is that a busy device monopolizes the adapter queue while other devices might be able to respond without delay.
Figure 5. SCSIport I/O Performance This baseline provides the point of comparison with Storport functioning and performance, as discussed in the Storport section.
Storport
Given the inherent limitations of using SCSIport with high performance adapters for which it was not designed, Microsoft has developed a new port driver, Storport. Storport has been architected as a replacement for SCSIport, and designed to enable realization of the high performance capabilities of hardware RAID and Fibre Channel adapters. It is possible for hardware vendors to write their own class, filter, or even new port drivers, to bypass SCSIport. But, unlike Storport, these drivers may perform unreliably with the Windows platform because they are designed without in-depth knowledge of the operating system. While many of the routines in Storport are similar to SCSIport (which helps in a smooth transition from SCSIport to Storport), there are a number of critical differences. These differences, discussed in the remainder of this section, provide the advanced functionality of Storport that enables vendor miniport drivers and adapter hardware to function more effectively.
Figure 6. Storport I/O Processing with Full Duplex Mode In this example, Storport is able to make use of a multiprocessor system by enabling execution of the start I/O initiation phase with the completion phase, cutting I/O processing time, and enabling more requests to be processed in the same amount of time (compare to SCSIport performance in Figure 5). As more processors are added, more I/O requests can be handled in parallel, thereby improving performance. This higher performance capability, however, requires that vendors code their miniports (and perhaps even modify their firmware and hardware) to take advantage of the synchronous I/O functioning, and not all miniport drivers can effectively decouple this processing. (Given this limitation, miniports can still do more work without synchronization by using fully duplex mode and calling StorPortSynchronizeAccess only as needed.)
Figure 7. Storport I/O Processing with the New HwBuildIo Routine The effect of the HwBuildIo routine in combination with Storport full-duplex processing is shown in Figure 8. Note that both the Start I/O and the build on different processors can overlap in time, and can also overlap ISRs. Compared with the original SCSIport design, I/O processing is considerably more effective.
10
Queue Management
Unlike SCSIport, which can only queue a maximum of 254 outstanding I/O requests to an adapter supporting multiple storage devices, Storport does not limit the number of outstanding I/O requests that can be sent to an adapter. Instead, each logical storage unit (such as a virtual disk on a RAID array or a physical disk drive) can accept up to 254 outstanding requests. The number of requests an adapter can handle is the number of logical units x 254. Since large storage arrays with hundreds of disks can have thousands of logical units, it is obvious that removing the queuing from the adapter results in an enormous improvement in I/O throughput. This is especially important for organizations with high transaction processing needs and large numbers of physical and virtual disks. Storport also enables the miniport driver to implement basic queue management functions. These include pause/resume device, pause/resume adapter, busy/ready device, and busy/ready adapter, and the ability to control queue depth (the number of outstanding commands) on a per LUN basis, all of which can help ensure balanced throughput rather than overloaded I/O. Key scenarios that can take advantage of these capabilities include limited adapter resources, limited per LUN resources, or a busy LUN that prevents non-busy LUNs from receiving commands. Certain intermittent storage conditions, such as link disruptions or storage device upgrades, can be handled much more effectively when these controls are properly used in Storport miniports. (By contrast, a SCSI miniport cannot effectively control this queuing at all. A device may indicate a busy status; consequently commands will automatically be retried for a fixed amount of time; however, no controls are available on an adapter basis whatsoever.)
Storport in Windows Server 2003 11
Hierarchical Resets
When SCSIport detects certain interconnect or device errors or conditions, it will respond by using a SCSI bus reset. On parallel SCSI, there is an actual reset line; however, on serial interconnects and RAID adapters, there is no bus reset, so it must be emulated in the best way possible. Whichever way the bus reset is done, the code path always disrupts I/O to all devices and LUNs connected to the adapter, even if the problem is related to only a single device. Such disruption requires reissuing in-progress commands for all LUNs. In contrast, Storport has the ability to instruct the HBA to only reset the afflicted LUN; no other device on that bus is impacted. If the LUN reset does not accomplish the recovery action, Storport attempts to reset the target device; and, if that doesnt work, it emulates a bus reset. (In practice, the bus reset should not be seen except when Storport is used with parallel devices). This advanced reset capacity enables configurations that were not possible (or were unreliable) in the past with SCSIport.
12
Registry Access
An important part of the Windows operating system design is the use of the registry, a configuration database if you will. SCSIport does not allow free access to the registry from a miniport driver. A single string can be passed to the miniport driver, which must then parse that string to extract adapter-specific parameters. Furthermore, SCSIport cannot guarantee that multiple adapters using the same miniport will be able to use different sets of parameters. The total length of the parameter string passed is limited to 255 characters. The Storport model allows registry access from the miniport in a much less restricted fashion. One routine can be used to query for specific parameters in any location in the system hive of the registry; writing back to the registry is also supported. This allows solving the problem of adapter specific parameters, such as persistent binding information or queue depth limits.
13
Performance Comparisons
The performance of the port driver varies not only with the capabilities of the miniport driver, but also with the system and RAID configuration, the storage adapter cache settings, the I/O queue depth, and the type of I/O operation. The rest of this section provides a brief review of how these various factors affect performance. System configuration. Adding more physical RAM helps ensure that that server accesses data in RAM cache, rather than from disk. Host based data cachingI/O requests to the file are intercepted by the caching system. If the request is an unbuffered WRITE, data is sent directly to the disk device (without any increase in speed); if a request is READ and the data is in memory cache, the response if very fast (no disk I/O is necessary). RAID configuration. I/O processing performance depends both on the type of redundancy that is used, and the number of physical disks across which the I/O load is spread. (The greater the number of disks, the better the performance, since multiple disks can be accessed simultaneously.) Note that RAID-10 gives the fastest I/O performance while still supporting redundancy. Controller cache settings. I/O performance is strongly impacted by whether or not the storage device can cache data, since caching gives better performance. Adding faster or more I/O controllers also improves performance. In the case of HBA RAID adapters, caching on the adapter also improves performance.
14
I/O queue depth. Past a certain threshold, the more I/O requests there are in queue for each device, the slower the performance. Below that threshold, performance may actually increase, as the storage device can effectively reorder operations for greatest efficiency. A subjective measure of device stress is I/O load. According to StorageReview, a light load is 4-16 I/Os, moderate is 16-64, and high is 64-256. Consult product documentation for optimal queue depth for specific storage devices. File Type and Use. Files vary in their size and the extent to which they are used (as much as 95% of all I/O activity occurs with fewer than 5% of all files), both of which impact performance. Type of I/O operation. There are four types of I/O operation: random writes (RW), sequential writes (SW), random reads (RR) and sequential reads (SR). I/O read requests can be processed very rapidly with host-based data caching. Read and write performance can be improved by caching on the storage controller. (Write requests can be written to fast cache memory before being permanently written to disk.) In many cases, caches can be tuned to perform better for the workload (read vs write, random vs sequential). Disk Fragmentation. Just as with a single disk, files stored on RAID arrays can become fragmented, resulting in longer seek times during I/O operations.
15
Figure 9 summarizes overall system efficiency as measured by I/O per second over percent of CPU.
System Efficiency 80000 70000 60000 50000 40000 30000 20000 10000 0
R 4K W 32 RW K 25 R 6K W 51 RW 2B S 4K W 32 S W K 25 S 6K W 51 SW 2B R 4K R 32 RR K 25 RR 6K 51 R 2B R S 4K R 32 SR K 25 S 6K R SR
Efficiency
51
2B
Figure 9. Storport I/O Throughput Efficiency Storport (triangles) is about 30-50% more efficient than SCSIport (diamonds), passing through more I/O per second than SCSIport and using less CPU to do so.
Summary
Storport is the new Microsoft port driver recommended for use with hardware RAID storage arrays and high performance Fibre Channel interconnects. Storport overcomes the limitations of the legacy SCSIport design, while preserving enough of the SCSIport framework that porting to the Storport device is straightforward for most developers. Storport enables bidirectional (full duplex) transport of I/O requests, more effective interactions with vendor miniport drivers, and improved management capabilities. Storport should be the port driver of choice when deploying SAN or hardware RAID storage arrays in a Windows Server 2003 environment.
Related Resources
For more information on Windows Drivers see the Microsoft Developer Network (MSDN) website at http://msdn.microsoft.com/. For more information regarding the Driver Development Kit, see Microsoft Windows Driver Development Kits on the Microsoft Hardware and Driver Central website (http://go.microsoft.com/fwlink/?LinkId=19866). To locate appropriate support contacts, see WHQL Support Contacts on the Microsoft Hardware and Driver Central website (http://go.microsoft.com/fwlink/?LinkId=22256).
16
Windows Server System is the comprehensive, integrated server software that simplifies the development, deployment, and operation of agile business solutions. www.microsoft.com/windowsserversystem
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2003 Microsoft Corporation. All rights reserved. Microsoft and Windows Server 2003 are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
17