So we shipped a whole bunch of products out into the field that use LSI raid cards. The controller cards have cache on them, as do the drives. The drive cache was left on and it ended up causing a bunch of headaches for us in the field. Obviously it is bad to cache on both the hard drive and the controller. If the drive loses power before it writes its cache to disk, the controller will think everything is ok when, in fact, the cached data was never written out. It has caused drives to fail in the field and sometimes arrays do not rebuild when a drive is replaced.
So, since I did the hardware abstraction layer, which interacts with the LSI cards through their “megalib” library, I was tasked with writing an application that would disable the drive cache on all of the hard drives. The drives in question are SATA drives, but the LSI card makes them appear as SCSI devices to Linux.
In order to disable the cache, I have to send a command directly to the drives using a pass through function in the controller cards. After finding some good resources on SCSI commands, I found it wasn’t actually a difficult task. But the LSI Library’s pass through is somewhat confusing to me and not very well documented. You pass in a CDB structure (command descriptor block) that tells the drive what you want it to do (in this case modify the drive cache settings).
The problem being that I need to modify just one bit of an entire byte. So I have to know the value of the rest of that byte. So I sent in a command to read the drive information. Using the serial output on the LSI card, I can see that it is reading the HDD configuration, but I see no change in the output with write cache enabled and disabled.
The pass through function provided by LSI requires you pass in a CDB which also has a pointer to memory used for input or output (depending on the command you send). When I try to just set the first three bytes (which are all I should need to modify the drive cache), it fails.
So, not only does it not seem to be reading the data properly, but it doesn’t seem to write it either. Most likely, I’m doing something wrong. But the fact that there is no satisfactory documentation, and that I have to wait to hear from one of their developers, kind of irritates me.
But, I cannot say a whole lot here because the code my work puts out almost completely lacks documentation. I’ve written general classes to do some basic things only to be told by someone later that we already had a class written to perform that task. Of course, I couldn’t have possibly known that unless I delved through our source. The problem being that we have millions of lines of code!
Edit: I forgot, I was going to put some links to some great resources that I found.
This one from answers.com talks about all the different commands you might want to issue to a SCSI device.
Here is one from Apple that shows where the WCE (write cache enable) bit is changed.