Friday, August 31, 2012

Udev>ASMlib

I'm fighting the urge to rant about this.  Although I respect Oracle's right to make their products work better and have additional functionality with their other products, I really dislike the position Oracle has on ASMlib, db_flash_cache and HCC on Sun-only storage.  Db_flash_cache and HCC are *wonderful* enhancements to database functionality...and they were released with the caveats that they only work with their respective Oracle co-products...db_flash_cache needs OEL and HCC (non-Exadata) needs Sun storage.  You've probably read prior posts on the Sun 7420 used for Exadata backups...its wonderful...people should buy it on its own merits.  OEL offers a solid product at a great price point relative to other distros...it can stand on its own too.  Still...I get it...Oracle wants to sell more Oracle.

What bothers me more is the decision to no longer release ASMLib for Redhat 6+.  Its a difficult thing for customers who have an installed base of Oracle databases on Redhat 5 and procedures to use ASMlib to switch.  For a small shop, maybe its not that big of a deal...for Oracle's big customers, it involves documentation, meetings, coordination...and a lot of ill will about being forced to change procedures and retrain resources that were trained to use ASMLib.

The written procedures to use udev with Redhat are a bit of a pain.  You identify the uuid of each disk device and create an entry in a udev rules file for it.  Again...for a small shop with a few databases...not a big deal.  I'm working on a project now to move over a hundred databases from AIX to Redhat 6 on vSphere 5 (VMWare), in RAC.  This is one of the most aggressive use cases for Oracle on VMware I've ever heard of.  To have big databases in VMWare is easy...to have busy databases in VMWare is a challenge.  Each node of each database has many disk devices.  Some of the databases are SAP, which requires multiple failgroups...which means multiple controllers with separated storage.  Its easy to make a mistake and add storage to the wrong asm diskgroup...destroying your failgroup separation.

Even with ASMLib its difficult to make sure your ASM disk...ie: VOL45, maps to to correct SCSI controller, which maps to proper storage that uses separate paths all the way to separate storage.  Without ASMlib, there's much more room for human error.  Obviously...I had to automate.

So...Oracle hands you lemons...make exa-lemonade.  I created a udev rule creation script for Redhat that does a similar task to what ASMLib always had done.

Pros to Script
  • Easy to maintain (its just bash)
  • Not kernel (or RH version) dependant, or OEL dependant
  • syncs rules across nodes (disk 14 on node 1 is the same as disk 14 on node 4 by UUID)
  • based on the SCSI controller, the name of the alias is changed...so the dba won't accidentally add a disk from one failgroup to a diskgroup of a different failgroup (which would eliminate the data protection of using multiple failgroups, and fail your SAP ASM platform certification)
Pros to ASMLib

  • Oracle maintains it (...until you update your kernel...which you do regularly for security fixes, right?)
  • Stamps disks (sector 2?) so VOL14  on node 1 is the same as VOL14 on node 4
  • There are some people that think ASMLib is more performant than Udev...but I don't think those claims come from Oracle...and I haven't been able to quantify a difference.  If a performance advantage exists, it must be slight.
To distribute the file to all the other nodes, there are 2 dependencies on Oracle's OneCommand configuration (used in Exadata, OVM, ODM, etc) for the params.ini and doall.sh script.  For params.ini, I added a parameter called SHARED_DIR, which is a directory mounted by all nodes.  If you want...you can just ftp this file to the other nodes and comment those 2 lines out. 

This is a work in progress which is expected to be modified and improved upon by the end user, and as always, use at your own risk...but I think it will likely save you some work creating your udev rules.  There is some detection of formatting of partitions...and it works for me, but you should verify the devices it recognizes as unformatted are really unformated.  Use this in a non-prod, unimportant crash and burn system first.  Hmmm...I can't think of any other warning to give.  Don't run this on any computer, ever.  To be extra, extra safe, you could comment out the last few lines that deal with moving the file around and reloading udev rules...that way you can look at the new rules file before you actually use it.

Ok...that being said, I hope this enables you to get past the lack of ASMlib on Redhat 6+, as it has definitely helped me.

#! /bin/sh
###################################
# Name: udev_rules.sh
# Date: 5/9/2012
# Purpose: This script will create all the udev rules necessary to support
# Oracle ASM for RH 5 or RH6. It will name the aliased devices
# appropriately for the different failgroups, based on the contoller
# they're assigned to.
# Revisions:
# 5/8/2012 - Created
# 5/10/2012 - Will now modify the existing rules to allow the addition of a
# single new disk. It will also sync the udev rules on node 1 with all other nodes.
###################################
source /u01/racovm/params.ini
rm /mnt/shared/udev/99-oracle-asmdevices.rules
data_disk=0
redo_disk=0
arch_disk=0
release_test=`lsb_release -r | awk 'BEGIN {FS=" "}{print $2}' | awk 'BEGIN {FS="."}{print $1}'`
echo "Detected RH release ${release_test}"

if [ -f "/etc/udev/rules.d/99-oracle-asmdevices.rules" ]; then
echo -e "Detected a pre-existing asm rules file. Analyzing...\c"
for y in {1..50}
do
found_data_disk=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "asm-data-disk${y}"`
found_redo_disk=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "asm-redo-disk${y}"`
found_arch_disk=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "asm-arch-disk${y}"`
if [ -n "${found_data_disk}" ]; then
let "data_disk++"
fi
if [ -n "${found_redo_disk}" ]; then
let "redo_disk++"
fi
if [ -n "${found_arch_disk}" ]; then
let "arch_disk++"
fi
echo -e ".\c"
done
echo "complete."
echo "Existing rules file contains:"
echo " ASM Data Disks: ${data_disk}"
echo " ASM Redo Disks: ${redo_disk}"
echo " ASM Arch Disks: ${arch_disk}"
new_file="false"
else
echo "Detected no pre-existing asm udev rules file. Building..."
new_file="true"
fi

echo "Creating new partitions if needed."
sh install.sh &> install.log

for x in {a..z}
do
if [ -n "`ls /dev/sd*1 | grep sd${x}1 `" ] ; then
asm_test1=`file -s /dev/sd${x}1 |grep "/dev/sd${x}1: data" `
asm_test2=`file -s /dev/sd${x}1 |grep "Oracle ASM" `
if [[ -n "${asm_test1}" || -n "${asm_test2}" ]] ; then
controller=`ls /sys/block/sd${x}/device/scsi_device | awk 'BEGIN {FS=":"}{print $1}'`
# ie: scsi_device:1:0:1:0
if [ "${release_test}" = "5" ]; then
result=`/sbin/scsi_id -g -u -s /dev/sd${x}`
else
result=`/sbin/scsi_id -g -u -d /dev/sd${x}`
fi
if [ "${result}" = "" ]; then
echo "No scsi id found for /dev/sd${x}. If you're running on VMWare, verify disk.EnableUUID=true has been added under option->Advanced->General->Configuration Parameters."
exit 1
fi
if [ "${controller}" = "3" ]; then
if [ -f "/etc/udev/rules.d/99-oracle-asmdevices.rules" ]; then
found_uuid=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "${result}"`
else
found_uuid=
fi
#if [[ -z "${found_uuid}" || "${new_file}" = "true" ]]; then
if [ -z "${found_uuid}" ]; then
echo "Detected a new data disk. Adding rule to /etc/udev/rules.d/99-oracle-asmdevices.rules"
let "data_disk++"
if [ "${release_test}" = "5" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -s /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-data-disk${data_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
else
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -d /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-data-disk${data_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
fi
fi
elif [ "${controller}" = "4" ]; then
if [ -f "/etc/udev/rules.d/99-oracle-asmdevices.rules" ]; then
found_uuid=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "${result}"`
else
found_uuid=
fi
if [[ -z "${found_uuid}" || "${new_file}" = "true" ]]; then
echo "Detected a new Redo disk. Adding rule to /etc/udev/rules.d/99-oracle-asmdevices.rules"
let "redo_disk++"
if [ "${release_test}" = "5" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -s /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-redo-disk${redo_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
elif [ "${release_test}" = "6" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -d /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-redo-disk${redo_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
fi
fi
elif [ "${controller}" = "5" ]; then
if [ -f "/etc/udev/rules.d/99-oracle-asmdevices.rules" ]; then
found_uuid=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "${result}"`
else
found_uuid=
fi
if [[ -z "${found_uuid}" || "${new_file}" = "true" ]]; then
echo "Detected a new Arch disk. Adding rule to /etc/udev/rules.d/99-oracle-asmdevices.rules"
let "arch_disk++"
if [ "${release_test}" = "5" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -s /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-arch-disk${arch_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
elif [ "${release_test}" = "6" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -d /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-arch-disk${arch_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
fi
fi
fi
else
echo "/dev/sd${x}1 is not an asm disk."
fi
fi
done
#cat /etc/udev/rules.d/99-oracle-asmdevices.rules
echo "Syncing rules file for all nodes of this cluster..."
cd ${SHARED_DIR}/udev
cp /etc/udev/rules.d/99-oracle-asmdevices.rules .
/u01/racovm/doall.sh -p cp ${SHARED_DIR}/udev/99-oracle-asmdevices.rules /etc/udev/rules.d/99-oracle-asmdevices.rules
echo "Reloading rules for all disks on all nodes in this cluster..."
if [ "${release_test}" = "5" ]; then
/u01/racovm/doall.sh /sbin/udevcontrol reload_rules &> /dev/null
else
/u01/racovm/doall.sh /sbin/udevadm control --reload-rules &> /dev/null
fi
/u01/racovm/doall.sh /sbin/start_udev &> /dev/null
/u01/racovm/doall.sh /sbin/partprobe &> /dev/null
echo "Complete."
echo "To see the ASM UDEV rules: cat /etc/udev/rules.d/99-oracle-asmdevices.rules"


When complete, the rules file looks something like this (on RH5):

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c3800682301d41f40ce5129d796f", NAME="asm-data-disk1", OWNER="oracle", GROUP="dba", MODE="0660"


KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38a77610df366b5ce4045e0f438", NAME="asm-data-disk2", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38d94459b437a48b0d75784d0bf", NAME="asm-data-disk3", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c3803929d52a392a506b75b8fc2d", NAME="asm-data-disk4", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c383a1ab40918dbc2e7a5f8gfb9d", NAME="asm-data-disk5", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38840c4740546cb2d9874152b98", NAME="asm-redo-disk1", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38523a828e0bd08637f79862c5a", NAME="asm-redo-disk2", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38904e50311ce41bd1f8db03ca1", NAME="asm-redo-disk3", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38d4b7e30a0102afb9934bge9f2", NAME="asm-redo-disk4", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38bd8e8a59464126630ff37b5da", NAME="asm-redo-disk5", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38e50ded425980005bb5f685e14", NAME="asm-arch-disk1", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c380a88d61b860gbfab66e4ba2ec", NAME="asm-arch-disk2", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38e2dceadba28f7bb8144egd67a", NAME="asm-arch-disk3", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c383ae7f7d05b2bc2ded9724c69e", NAME="asm-arch-disk4", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c3866e4127140d48467c09f1363a", NAME="asm-arch-disk5", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38bca0e4c305e9236c7d62c553e", NAME="asm-arch-disk6", OWNER="oracle", GROUP="dba", MODE="0660"

1 comment:

  1. Quick update...see the new post on this topic at

    http://otipstricks.blogspot.com/2013/06/adding-disk-to-asm-and-using-udev.html

    Even though its in many Oracle docs, don't run start_udev on a production system!

    ReplyDelete