Storage News
Security News
Networking News 
FREE NEWSLETTERS
search
 

internet.commerce
Be a Commerce Partner















internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Storage Products
 CLARiiON (EMC Corp)
 dbQwikSync PRO (TheDevShop Ltd)
 3X Backup (3X Systems)
 Violin 1010 (Violin Memory, Inc)
 Apollo (Imation Corp)
 Tools for SQL Server (Idera (a division of BBS Technologies, Inc))
» Enterprise IT Planet » Storage » Storage Features

Let's Bid Adieu to Block Devices and SCSI

By Henry Newman
January 5, 2005

Email Print Digg This Add to del.icio.us
Continued from Page 1.

File System and Access Patterns

From what I have seen, in both databases and HPC computing, files are often read with skip increments. Most file systems, when reading data with buffered I/O, readahead if the file system is reading sequential addresses. Remember, just because it is a sequential address on the file system does not mean that the addresses are sequential on the devices. Given the latency differences over the last 25 years between CPUs and storage devices, you need to have large I/O requests that allow the RAID device to readahead sequential blocks (see Storage I/O and the Laws of Physics for more information on this issue).

If you are reading I/O with direct I/O (open with O_Direct), the issue is the same. File systems just do not issue readaheads except for sequential block addresses. The application could issue an asynchronous readahead, and for some databases this does happen, but file systems just do not have the intelligence built in to allow this.

Striping Many file systems use a volume manager to stripe the data across all of the devices in the file system. This defeats any potential for sequential allocation on each of the blocks on the individual disks in a LUN and therefore on the readahead cache on the RAID. It should be noted that a number of file systems have added round-robin allocation (see the Physics article cited above for more details) as an additional allocation method. Most Linux volume manager and file systems that are combined with volume managers do not support round-robin allocation, which means that most Linux I/O will not use the RAID cache efficiently.

RAID Rebuild

If a disk within a RAID LUN goes bad, the RAID set must be rebuilt. With RAID-5, this means reading in the data from the good disks left and writing them out again to the same disks plus an additional hot spare. Take the example of a 2Gb FC RAID-5 8+1, 146 GB drives with two 2Gb channels connecting the disks. To rebuild, most RAIDs read a stripe in and then write the same stripe out, one stripe at a time. Therefore you will have:

  • 400 MB/sec of bandwidth to read and write bandwidth at 100% efficiency
  • 1.168 TB to read (eight disks each at 146 GB)
  • 1.314 TB to write out (nine disks because you now have added the parity at 146 GB).
You cannot always read and write at 100% efficiency, so the table below estimates the time based on various efficiency factors such as the segment or stripe element per disk (bigger is better), other I/O being done on the RAID, tunables for rebuild within the RAID, and other factors, depending on the vendor.

Efficiency Read Time Estimated in Seconds Write Time Estimated in Seconds Total Time Estimated in Seconds
100% 3062 3445 6506
90% 3402 3827 7229
75% 4082 4593 8675
50% 6124 6889 13013
25% 12247 13778 26026
10% 30618 34446 65064
Table 1: 146 GB Rebuild time and efficiency.

Having 50 percent efficiency is certainly not unheard of, and having your RAID take more than three hours to rebuild is a long time to have application performance degradation and exposure to another disk failure. Think about what happens with 400GB SATA drives and instead of an 8+1, use a 15+1, which is commonly used on some of the multimedia systems that I have worked on:

Efficiency Read Time Estimated in Seconds Write Time Estimated in Seconds Total Time Estimated in Seconds
100% 15360 16384 31744
90% 17067 18204 35271
75% 20480 21845 42325
50% 30720 32768 63488
25% 61440 65536 126976
10% 153600 163840 317440
Table 2: 400 GB Drive Rebuild time and efficiency.

More than 17 hours for rebuild time at 50 pecent efficiency is unacceptable. This problem is going to get worse, not better, over time.

Conclusions

I hope have stated my case well enough so that I won't get too much hate mail, but in my heart of hearts I believe the time has come for SCSI and block devices to be replaced. In the next article, we will cover the technology that I think will replace both of these technologies, and, if adopted, could also replace file systems as we know them. All of this would be a good thing, in my opinion. That technology is called Object Storage Device, or OSD.

Feature courtesy of Enterprise Storage Forum.

Go to page: Prev  1  2  

Email Print Digg This Add to del.icio.us

Storage Features Archives






Latest Forum Thread
     Enterpriseitplanet Forum
Topic By Replies Updated
best way to erase an old harddisk ter 7 7-30-2008 06:23 PM
All PC Technicians in Texas, listen up! Planet 3 7-9-2008 06:05 PM
Sata Harddisk failure. plz help !!! abeyve 14 6-30-2008 07:11 AM
Best Practices to the Storage? eavanzi 2 6-21-2008 01:47 AM
online backup software for service providers backupman 11 6-19-2008 12:06 PM





JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers