# Raid 5 array problem



## RamRam (Apr 26, 2005)

Hi, fellow hunters,

Here is a worthy problem, which I hope somebody will take pity on and help me out before I go completly gaga through trying to sort it.

I have recently purchased and been using a Highpoint 1820A card for setting up a RAID 5 array and have had no problems for about 2 months. Unfortunately, I seem to have run into a problem with the RAID setup and I hope you could please advise me on any way I could resolve it or any way in which I could troubleshoot it 
further.

Summed up: 
After getting an error on one drive all the other drives on the array fail. After this whenever the files which were being worked on at that time are read the same errors occur and the drives fail regardless of whether the array is rebuilt.

More detail: 
I have a hardware RAID 5 array setup on 8 - 160gb SATA drives all to the same controller. I setup the RAID 5 in the raid card bios and then setup the a ext3 filesystem on it. 
It has been working fine up until now and was about half full when I started getting the audio alert. At the time only Azureus(a bittorrent client) was running. The computer would not respond to keyboard or mouse for about 10 seconds but then recovered but I was unable to access the array. 

The raid gui tools error log showed that the hard drive on channel 5 had failed and the other drives were getting errors because of it. Syslog showed the scsi read errors as shown below. I was unable to access any of the array until I restarted. After backing everything up that I could I found that as soon as I tried to read the latest files on there, files that were being worked on and which must have been getting written to at the time, it would get the same problem - audio alert again and all drives would fail the same way. 

In the raid card bios it showed the drive on channel 5 as seperate from the rest of the array. I tried rebuilding in bios by readding the drive on channel 5 then choosing rebuild. From the rate it was going at it would have taken approximately 3 days, so instead I tried booting up and rebuilding in the gui raid tool and it took about 3 hours. Unfortunately, it still failed the exact same way once any attempt was made to read the problem files. 

If the array is rebuilt it always seems to fail on channel 5 (channel 4 to the syslog) whenever data is read from the problem files, once that drive fails the other drives all start getting errors and fail as well. If the array is not rebuilt it just fails on either 6,7 or 8 and fails the rest the same way. 

I have tried updating the raid card bios and drivers to the latest as well as the linux kernel with no luck (have tried both 2.4 and 2.6). I have run a non-destructive read-write test with badblocks with no result (e2fsck -c -c /dev/sda1 - though when scanning badblocks listed out of 273508625 whereas the scsi errors list at 1879053343 - Is it possible I am not scanning all the drives?).

I have a second 1820A card which I have tried with the same results.


Hopefully you could please answer the following questions for me:


Why would this array fail only when accessing certain files on it even after being rebuilt? Why would a read/write test not seem to trigger the errors?

Is there any further tools or procedures you can recommend for me to use in order to pinpoint the problem?

Is there a recommended way for testing the reliability of the array once it is built?

Is there a reason for the raid bios rebuild time taking much longer than the gui tools rebuild?

Any suggestions or advice at all would be happily received as I am becoming quite frustrated with this problem. Especially any recommendations on how to prevent this from happening again or ways of working out what caused this. 


Thank you for your time.



Further info & logs:


Spec:
Highpoint 1820A
8 x SATA Seagate 160gb drives
Tagan 440W power supply
NCCH-DL Dual processor board with 2x 2.8 ghz Xeons
1gb RAM
20gb Maxtor PATA drive for operating system - Debian Linux


My message log:	

Apr 10 22:26:31 chikyuu kernel: IAL: COMPLETION ERROR, adapter 0, channel 4, flags=104 
Apr 10 22:26:31 chikyuu kernel: ATA regs: error 10, sector count 1, LBA low ff, 
LBA mid ff, LBA high ff, device 4f, status 51 
Apr 10 22:26:31 chikyuu kernel: Retry on channel(4) 
Apr 10 22:26:31 chikyuu kernel: SCSI error : return code = 0x25050000 
Apr 10 22:26:31 chikyuu kernel: end_request: I/O error, dev sda, sector 1879053343 
Apr 10 22:26:32 chikyuu kernel: IAL: COMPLETION ERROR, adapter 0, channel 4, flags=104 
Apr 10 22:26:32 chikyuu kernel: ATA regs: error 10, sector count 1, LBA low ff, 
LBA mid ff, LBA high ff, device 4f, status 51 
Apr 10 22:26:32 chikyuu kernel: Retry on channel(4) 
Apr 10 22:26:32 chikyuu kernel: psmouse.c: Explorer Mouse at isa0060/serio1/input0 lost synchronization, 
throwing 2 bytes away. 
Apr 10 22:26:32 chikyuu kernel: SCSI error : return code = 0x25050000 
Apr 10 22:26:32 chikyuu kernel: end_request: I/O error, dev sda, sector 1879053351 
Apr 10 22:26:33 chikyuu kernel: IAL: COMPLETION ERROR, adapter 0, channel 4, flags=104 
Apr 10 22:26:33 chikyuu kernel: ATA regs: error 10, sector count 1, LBA low ff, 
LBA mid ff, LBA high ff, device 4f, status 51 
Apr 10 22:26:33 chikyuu kernel: Retry on channel(4) 


Once it has retried all channels repeatedly until it gives up it cycles through scsi error and end_request messages for several hundred entries. The psmouse entry occurs when the computer freezes temporarily when the problem files are accessed.


Excerpt from RAIDtools log:

RAID	I	04/10/2005 14:17:24	Array 'RAID_5_0' rebuilding started.
RAID	I	04/10/2005 16:26:52	Array 'RAID_5_0' rebuilding completed.
RAID	I	04/10/2005 21:59:32	User RAID(from 127.0.0.1) exited from system.
RAID	I	04/10/2005 22:25:46	User RAID(from 127.0.0.1) logged on system.
RAID	E	04/10/2005 22:26:32	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/10/2005 22:26:32	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/10/2005 22:26:35	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/10/2005 22:26:35	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/10/2005 22:26:35	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/10/2005 22:26:37	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/10/2005 22:26:37	Disk at Controller1-Channel5-Device1 failed.
RAID	E	04/10/2005 22:26:37	An error occured on the disk at Controller1-Channel4-Device1.
RAID	E	04/10/2005 22:26:37	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/10/2005 22:26:37	An error occured on the disk at Controller1-Channel2-Device1.
RAID	E	04/10/2005 22:26:37	An error occured on the disk at Controller1-Channel3-Device1.
RAID	E	04/10/2005 22:26:37	An error occured on the disk at Controller1-Channel6-Device1.
RAID	E	04/10/2005 22:26:37	An error occured on the disk at Controller1-Channel7-Device1.
RAID	E	04/10/2005 22:26:37	An error occured on the disk at Controller1-Channel8-Device1.
RAID	E	04/10/2005 22:26:39	Disk at Controller1-Channel8-Device1 failed.
RAID	I	04/10/2005 22:28:00	User RAID(from 127.0.0.1) exited from system.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel2-Device1.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel3-Device1.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel4-Device1.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel6-Device1.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel7-Device1.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel8-Device1.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/10/2005 22:43:37	An error occured on the disk at Controller1-Channel2-Device1.
RAID	I	04/10/2005 22:43:42	User RAID(from 127.0.0.1) logged on system.
RAID	I	04/10/2005 22:44:09	User RAID(from 127.0.0.1) exited from system.
RAID	I	04/10/2005 22:48:35	User RAID(from 127.0.0.1) logged on system.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel4-Device1.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel2-Device1.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel3-Device1.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel6-Device1.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel7-Device1.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel8-Device1.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/10/2005 22:49:39	An error occured on the disk at Controller1-Channel2-Device1.
RAID	E	04/10/2005 22:49:41	Disk at Controller1-Channel8-Device1 failed.
RAID	I	04/10/2005 22:56:18	User RAID(from 127.0.0.1) logged on system.
RAID	I	04/10/2005 22:56:24	Array 'RAID_5_0' rebuilding started.
RAID	I	04/11/2005 00:50:38	Array 'RAID_5_0' rebuilding completed.
RAID	E	04/11/2005 22:04:01	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/11/2005 22:04:03	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/11/2005 22:04:03	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/11/2005 22:04:03	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/11/2005 22:04:05	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/11/2005 22:04:05	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/11/2005 22:04:05	Disk at Controller1-Channel5-Device1 failed.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel7-Device1.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel2-Device1.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel3-Device1.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel4-Device1.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel6-Device1.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel8-Device1.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/11/2005 22:04:16	An error occured on the disk at Controller1-Channel2-Device1.
RAID	E	04/11/2005 22:04:18	Disk at Controller1-Channel8-Device1 failed.
RAID	I	04/11/2005 22:15:45	User RAID(from 127.0.0.1) logged on system.
RAID	I	04/11/2005 22:16:21	Deleting RAID 5 Array 'RAID_5_0' succeeded. 
RAID	I	04/11/2005 22:19:41	User RAID(from 127.0.0.1) logged on system.
RAID	I	04/11/2005 22:19:48	Array 'RAID_5_0' rebuilding started.
RAID	E	04/11/2005 22:51:52	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/11/2005 22:51:56	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/11/2005 22:52:01	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/11/2005 22:52:05	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/11/2005 22:52:09	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/11/2005 22:52:14	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/11/2005 22:52:18	Disk at Controller1-Channel1-Device1 failed.
RAID	W	04/11/2005 22:52:18	Array 'RAID_5_0' rebuilding failed.
RAID	I	04/11/2005 22:52:43	User RAID(from 127.0.0.1) exited from system.
RAID	I	04/16/2005 14:39:35	User RAID(from 127.0.0.1) logged on system.
RAID	E	04/16/2005 14:41:02	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/16/2005 14:41:05	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/16/2005 14:41:05	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/16/2005 14:41:05	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/16/2005 14:41:07	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/16/2005 14:41:07	An error occured on the disk at Controller1-Channel5-Device1.
RAID	E	04/16/2005 14:41:07	Disk at Controller1-Channel5-Device1 failed.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel8-Device1.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel2-Device1.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel3-Device1.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel4-Device1.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel6-Device1.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel7-Device1.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel1-Device1.
RAID	E	04/16/2005 14:41:17	An error occured on the disk at Controller1-Channel2-Device1.
RAID	E	04/16/2005 14:41:19	Disk at Controller1-Channel7-Device1 failed.
RAID	I	04/16/2005 14:47:27	Array 'RAID_5_0' rebuilding started.
RAID	I	04/16/2005 14:47:38	User RAID(from 127.0.0.1) logged on system.


The delay of 5 days at the end of the log is due to much longer time it takes the bios to rebuild instead of the gui tools.


Thanks for any help, this is driving me nuts trying to sort it out.


----------



## bremaria (Jun 1, 2005)

I'm having a similar problem, I have a raid5 array using the rocketraid 1820A card. I got the failed hard drive error, bought a new one and put it in. I thought this was going to be easy for once; I rebuilt the array and thought I was done. Now I get the same errors you are getting "An error occured on the disk at Controller1..." and the "Disk at Controller1-Channel3-Device1 failed" I don't think it's OS specific, I'm running Windows 2003 server. The only common thing I have with you is the Highpoint 1820A card. I do have Seagate barracudas, but they are 120gb.

There wasn't any problem for months after I built the system. Now I get those same errors you get over an over again for the raid5, the raid0 OS volume does fine.

I thought I might have loose SATA connections and tried using zip ties to secure the SATA to the power connector to keep them in better, but that didn't work. I tried different cables with the problem drive with no luck.

I didn't notice any connection with particular files or bios rebuilding taking longer, but that doesn't mean that it isn't happening. I have noticed that it happens when SQL Server 2000 runs it's db backup using sql server agent; so it could have something to do with a particular file being accessed.

Here are my error logs:

5/14/2005 12:00:55 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/14/2005 12:00:55 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/14/2005 12:00:55 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/15/2005 12:00:54 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/15/2005 12:00:54 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/15/2005 12:00:54 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/17/2005 12:01:01 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/17/2005 12:01:01 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/17/2005 12:01:01 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/18/2005 12:00:58 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/18/2005 12:00:58 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/18/2005 12:00:58 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/19/2005 12:01:59 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/19/2005 12:01:59 AM Disk at Controller1-Channel3-Device1 failed. 
5/19/2005 12:01:59 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/20/2005 12:00:50 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/21/2005 12:00:50 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/22/2005 12:00:59 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/22/2005 12:00:59 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/22/2005 12:00:59 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/23/2005 10:25:08 AM Disk at Controller1-Channel3-Device1 failed. 
5/23/2005 10:25:26 AM Plugging device detected.(Controller1-Channel3-Device1) 
5/24/2005 12:00:59 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/24/2005 12:00:59 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/24/2005 12:00:59 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/25/2005 12:00:53 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/25/2005 12:00:53 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/25/2005 12:00:53 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/26/2005 12:00:58 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/26/2005 12:00:58 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/27/2005 12:01:35 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/27/2005 12:01:35 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/27/2005 12:01:35 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/28/2005 12:00:56 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/29/2005 12:00:59 AM An error occured on the disk at Controller1-Channel3-Device1. 
5/29/2005 12:00:59 AM An error occured on the disk at Controller1-Channel4-Device1. 
5/29/2005 12:00:59 AM An error occured on the disk at Controller1-Channel5-Device1. 
5/31/2005 12:00:57 AM An error occured on the disk at Controller1-Channel4-Device1. 
6/1/2005 12:01:00 AM An error occured on the disk at Controller1-Channel3-Device1. 
6/1/2005 12:01:00 AM An error occured on the disk at Controller1-Channel4-Device1. 
6/1/2005 12:01:00 AM An error occured on the disk at Controller1-Channel5-Device1. 

I would greatly appreciate any help on this, and will pass on anything I learn.

Thanks,
Brett


----------



## robertorri (Dec 9, 2005)

Hi guys.

I am having the same problem. Any luck finding out what is causing this ?

Robert Orri
[email protected]
Iceland


----------

