Monitoring Adaptec RAID arrays with arcconf

arcconf is a utility that can be used to configure and manage Adaptec RAID arrays. It can also be used to query the status of the array. Below is a script that parses the result of the query looking for problems and if it finds any sends an email containing the relevant lines and the whole query.

It takes the file to parse as argument, if it doesn’t receive an argument it looks for /tmp/arcconfig.

It can be used by including it in crontab or it could be modified for nagios or zabbix monitoring. The arcconf utility needs root permissions to run, however the script doesn’t, since all it does is parse the results.

#!/bin/sh
#
# crontab entry looks like this:
# @hourly        root   /usr/local/sbin/arcconf getconfig 1 > /tmp/arcconf_status ; /root/check_arcconf.sh /tmp/arcconf_status ; rm /tmp/arcconf_status
#

CHECKSTATUS=0                       # will change if something is wrong
MAILTO="monitoring@example.com"     # where to send the result
MAILSUBJECT=""                      # will be set to something reflecting the hostname and status
MAILBODY=""                         #

ARCFILE=$1
if [ -z "$ARCFILE" ]; then
    ARCFILE="/tmp/arcconfig"    # 'arcconf getconfig 1' result
fi

if [ ! -f "$ARCFILE" ]; then
    echo "$ARCFILE not present"
    exit 1
fi

# Controller status
CTRLSTATUS=`grep "Controller Status" $ARCFILE  | sed s/^.*\:\ //`
if [ "$CTRLSTATUS" != "Optimal" ]; then
    MSG="$MSG\nController Status:   $CTRLSTATUS"
    CHECKSTATUS=1
fi

# Check temperature
TEMPSTATUS=`grep "Temperature  " $ARCFILE | sed s/^.*\(// | sed s/\)//`
if [ "$TEMPSTATUS" != "Normal" ]; then
    TEMPSTATUS=`grep "Temperature  " $ARCFILE | sed s/^.*\:\ //`
    MSG="$MSG\nTemperature:         $TEMPSTATUS"
    CHECKSTATUS=1
fi

# Check number of dead drives, >0 is a problem
DEADDRIVES=`grep "Defunct disk drive count" $ARCFILE | sed s/^.*\:\ //`
if [ "$DEADDRIVES" != "0" ]; then
    MSG="$MSG\nDead disk drives:    $DEADDRIVES"
    CHECKSTATUS=1
fi

# Check battery. Not sure about this as mine doesn't have a battery
BATTSTATUS=`grep -A2 "Controller Battery Information" $ARCFILE | grep Status | sed s/^.*\:\ //`
if [ "$BATTSTATUS" != "Not Installed" -a "$BATTSTATUS" != "Optimal" ]; then
    MSG="$MSG\nBattery Status:      $BATTSTATUS"
    CHECKSTATUS=1
fi

# Logical device status
LOGDEV=0
for DEVSTAT in "`grep "Status of logical device" $ARCFILE`"
do
    if [ `echo $DEVSTAT | grep -c -v "Optimal"` -gt 0 ]; then
        LOGDEVSTATUS=`echo $DEVSTAT | sed s/^.*\://`
        MSG="$MSG\nLogical device $LOGDEV:    $LOGDEVSTATUS"
        CHECKSTATUS=1
    fi
    LOGDEV=$((LOGDEV+1))
done

# Logical device segment information
# might wanna rule out "Rebuilding" here, too
SEGSTATUS=`grep "Segment" $ARCFILE  | grep -v "Present"`
if [ "$SEGSTATUS" != "" ]; then
    MSG="$MSG\nNot all logical device segments are Present:\n$SEGSTATUS"
    CHECKSTATUS=1
fi

# If anything is wrong, send an email
if [ $CHECKSTATUS != 0 ]; then
    HOST=`hostname`
    MAILSUBJECT="RAID WARNING on $HOST"
    ARCSTRING=`cat $ARCFILE`
    MSG="$MSG\n\n\nFULL CONFIG DUMP:\n$ARCSTRING"
    echo -e "$MSG" | mail -s "$MAILSUBJECT" "$MAILTO"
fi

This works for Adaptec 5405, other models might return slightly different information.