Thursday, December 6, 2012

Swingbench is great! (but not descriptive when there's an error.)

Just a quick note that may help some of you using Swingbench in distributed mode.  For those of you that haven't used it yet, Swingbench is a great way to compare performance of different Oracle databases running on different platforms or configurations.  The best part is...its free:

http://www.dominicgiles.com/swingbench.html

There are two ways to do it...one is a simple test from your laptop...the other is for a distributed RAC database, in order to push it and see when its bottlenecks are (and to make sure your laptop isn't introducing a bottleneck) You can get the details and a walk through from the author's site (link above), but essentially you have multiple groups connecting to the database directly to specific nodes, then their results are aggregated into the "coordinator process"...and its results are displayed by the cluster overview process.  When I was doing this last night I got this error and was unable to find help on "the internets":

11:33:11 AM FINEST com.dom.benchmarking.swingbench.clusteroverview.datasource.S calabilityDataSource () Connected java.lang.NullPointerException at com.dom.benchmarking.swingbench.clusteroverview.datasource.Transactio nDataSource.updateResultsArray(TransactionDataSource.java:148) at com.dom.benchmarking.swingbench.clusteroverview.datasource.TransactionDataSource.run(TransactionDataSource.java:177) at java.lang.Thread.run(Unknown Source)

I was doing a distributed swingbench test and the workload generators I was using (charbench), were all using the same swingconfig.xml over a shared NFS mount, which had a typo in the connect string...so I ended up having no connections. I can only guess this might be what the java error was trying to say with the "null pointed exception on update of the results array of the transaction data source." For my situation (and maybe yours) I consider this an "unable to connect to database" error...if you hit this issue, check the connect string in swingconfig.xml.

I hope this helps!

Thursday, October 25, 2012

Update to VM script for PowerCLI is on the way....

I'm going to update the script in the previous post to not use vmdk's for the data luns of the database. 

Although this performs well and its the SAP/VMWare best practice (and this is great for smaller db's) the idea that we may have to replicate our bugs on physical hardware for Oracle support means we'd have to live with our bug long enough to set up a new physical server, install the db software and restore the database to it.  For these multi-TB databases, that would take many hours, at least.  If we use RAW devices in the VM's (with the vBlock's Cisco UCS) all we have to do is apply a profile to a spare blade, turn it on and add that node to the RAC cluster.  Within a few minutes we'll be able to replicate the bug for MOS...then we can shutdown the blade and remove the profile.

I'll post it when its finished.

Friday, August 31, 2012

The Oracle RAC VM Build Script for PowerCLI on vSphere 5

One of the benefits of computers is that they're supposed to make repetative tasks easier, or automated.  Still, in many tasks in our industry, we're still supposed to click over and over in a GUI to do the same tasks, without error.  Besides causing the "Post Office Syndrom" (which causes one to "go postal"), this is no way for intelligent human beings to spend their lives.  Any chance I get to automate something that needs to be done over and over, I take it.  With that in mind, this script improved my life, and I hope it'll improve your life too.

I mentioned this in a previous post, so here it is, the PowerCLI vSphere 5 multiple vm build script.  It easily creates multiple VM's with shared storage utilizing a combination of best practices from Oracle, SAP and VMware.  I am by no means a PowerCLI guru...if you have improvements for this...let me know.  Here are a few I would like to see in the long term:

1. I have a friend that's planning to add the XLS functionality in PowerCLI to this...so that the dba team can give him an Excel spreadsheet with a list of database parameters that this script can read in and replace the parameters in a loop...creating many vm's, one after the other, automated.

2. The number of nodes should be a parameter that feeds the logic in a loop...so if you have a 2 node or 8 node RAC db, the same script can be used.

3. The final section that eager-zeroes the storage works...but for large databases it takes an extremely long time.  An alternative method would be to create the vmdk's thin, and then move them to the same datastore as eager zeroed, similar to what's discussed here.  My theory is that this might use VAAI, which could hugely improve the eager zeroing process by offloading it to the SAN.

Also, be aware there is a VSphere client bug that incorrectly reports the backing of the vmdk's and thick lazy zeroed when they're actually thick eager zeroed.  If you run the "eager zero" part of the script and it seems to complete in a few seconds, it means you're trying to eager zero something that's already eager zeroed (regardless of what the client reports), which almost amounts to a no-op. 

Many sections of this have been patch worked from talented people in the PowerCLI community but I can't tell who the original authors were...I think because different snippets have been added on by different people. Still, I'd really like to give them credit and thank them for making this possible. Technical communities that work together restore my faith in the good of mankind. :)


When I run this, I connect via terminal services to vCenter on vSphere 5, then paste it into PowerCLI on that machine...but there are lots of ways to skin that cat... Hmm...I know my blog gets translated to different languages...I wonder how colloquialisms like that are interpreted in Hindi, Chinese, Peta etc? :)


$VS5_Host1 = "node1.company.com"
$VS5_Host2 = "node2.company.com"
$VS5_Host3 = "node3.company.com"
$vmName1 = "racnode1"
$vmName2 = "racnode2"
$vmName3 = "racnode3"
$rac_vm_cpu = 6
$rac_vm_ram_mb = (110GB/1MB)
$rac_vm_ram_mb_rez = (90.6GB/1MB)
$public_network_name = "10.2.14-17"
$private_network_name = "192.168.20.0"
$backup_network_name = "Backup"
$osstore = "os_datastore"
$osstore_size_MB = (100GB/1MB)
$orastore = "ora_datastore"
$orastore_size_KB = (100GB/1KB)
$datastore1 = "data1"
$datastore2 = "data2"
$datastore3 = "data3"
$datastore4 = "data4"
$datastore5 = "data5"
$datastore6 = "data6"
$datastore7 = "data7"
$datastore8 = "data8"
$datastore_size_KB = (550GB/1KB)
$recostore1 = "loga"
$recostore2 = "logb"
$recostore_size_KB = (8GB/1KB)
$archstore1 = "arch01"
$archstore2 = "arch02"
$archstore3 = "arch03"
$archstore_size_KB = (200GB/1KB)

$VM1 = new-vm `
-Host "$VS5_Host1" `
-Name $vmName1 `
-Datastore (get-datastore "$osstore") `
-Location "Oracle" `
-GuestID rhel6_64Guest `
-MemoryMB 4096 `
-DiskMB $osstore_size_MB `
-NetworkName "$public_network_name" `
-DiskStorageFormat "Thin"

$vm2 = new-vm `
-Host "$VS5_Host2" `
-Name $vmName2 `
-Datastore (get-datastore "$osstore") `
-Location "Oracle" `
-GuestID rhel6_64Guest `
-MemoryMB 4096 `
-DiskMB $osstore_size_MB `
-NetworkName "$public_network_name" `
-DiskStorageFormat "Thin"

$VM3 = new-vm `
-Host "$VS5_Host3" `
-Name $vmName3 `
-Datastore (get-datastore "$osstore") `
-Location "Oracle" `
-GuestID rhel6_64Guest `
-MemoryMB 4096 `
-DiskMB $osstore_size_MB `
-NetworkName "$public_network_name" `
-DiskStorageFormat "Thin"

Function Change-Memory {
Param (
$VM,
$MemoryMB
)
Process {
$VMs = Get-VM $VM
Foreach ($Machine in $VMs) {
$VMId = $Machine.Id

$VMSpec = New-Object VMware.Vim.VirtualMachineConfigSpec
$VMSpec.memoryMB = $MemoryMB
$RawVM = Get-View -Id $VMId
$RawVM.ReconfigVM_Task($VMSpec)
}
}
}

Change-Memory -MemoryMB $rac_vm_ram_mb -VM $VM1
Change-Memory -MemoryMB $rac_vm_ram_mb -VM $VM2
Change-Memory -MemoryMB $rac_vm_ram_mb -VM $VM3

Set-VM -vm(get-vm $VM1) -NumCpu $rac_vm_cpu -RunAsync -Version v8 -Confirm:$false
Set-VM -vm(get-vm $vm2) -NumCpu $rac_vm_cpu -RunAsync -Version v8 -Confirm:$false
Set-VM -vm(get-vm $VM3) -NumCpu $rac_vm_cpu -RunAsync -Version v8 -Confirm:$false

Get-VM $VM1 | Get-VMResourceConfiguration | Set-VMResourceConfiguration -MemReservationMB $rac_vm_ram_mb_rez
Get-VM $vm2 | Get-VMResourceConfiguration | Set-VMResourceConfiguration -MemReservationMB $rac_vm_ram_mb_rez
Get-VM $VM3 | Get-VMResourceConfiguration | Set-VMResourceConfiguration -MemReservationMB $rac_vm_ram_mb_rez

New-NetworkAdapter -VM $vm1 -NetworkName "$private_network_name" -StartConnected -Type vmxnet3 -Confirm:$false
New-NetworkAdapter -VM $vm2 -NetworkName "$private_network_name" -StartConnected -Type vmxnet3 -Confirm:$false
New-NetworkAdapter -VM $vm3 -NetworkName "$private_network_name" -StartConnected -Type vmxnet3 -Confirm:$false

New-NetworkAdapter -VM $vm1 -NetworkName "$backup_network_name" -StartConnected -Type vmxnet3 -Confirm:$false
New-NetworkAdapter -VM $vm2 -NetworkName "$backup_network_name" -StartConnected -Type vmxnet3 -Confirm:$false
New-NetworkAdapter -VM $vm3 -NetworkName "$backup_network_name" -StartConnected -Type vmxnet3 -Confirm:$false

Function Enable-MemHotAdd($vm){
$vmview = Get-vm $vm | Get-View
$vmConfigSpec = New-Object VMware.Vim.VirtualMachineConfigSpec

$extra = New-Object VMware.Vim.optionvalue
$extra.Key="mem.hotadd"
$extra.Value="true"
$vmConfigSpec.extraconfig += $extra

$vmview.ReconfigVM($vmConfigSpec)
}

enable-memhotadd $vm1
enable-memhotadd $vm2
enable-memhotadd $vm3

Function Enable-vCpuHotAdd($vm){
$vmview = Get-vm $vm | Get-View
$vmConfigSpec = New-Object VMware.Vim.VirtualMachineConfigSpec

$extra = New-Object VMware.Vim.optionvalue
$extra.Key="vcpu.hotadd"
$extra.Value="true"
$vmConfigSpec.extraconfig += $extra

$vmview.ReconfigVM($vmConfigSpec)
}

enable-vCpuHotAdd $vm1
enable-vCpuHotAdd $vm2
enable-vCpuHotAdd $vm3

New-HardDisk -vm($VM1) -CapacityKB $orastore_size_KB -StorageFormat Thin -datastore "$orastore"
New-HardDisk -vm($vm2) -CapacityKB $orastore_size_KB -StorageFormat Thin -datastore "$orastore"
New-HardDisk -vm($VM3) -CapacityKB $orastore_size_KB -StorageFormat Thin -datastore "$orastore"

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $datastore_size_KB -StorageFormat Thick -datastore "$datastore1"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})
$New_SCSI_1_1 = $New_Disk1 | New-ScsiController -Type ParaVirtual -Confirm:$false
$New_SCSI_2_1 = $New_Disk2 | New-ScsiController -Type ParaVirtual -Confirm:$false
$New_SCSI_3_1 = $New_Disk3 | New-ScsiController -Type ParaVirtual -Confirm:$false

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $datastore_size_KB -StorageFormat Thick -datastore "$datastore2"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_1
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_1
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_1

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $datastore_size_KB -StorageFormat Thick -datastore "$datastore3"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_1
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_1
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_1

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $datastore_size_KB -StorageFormat Thick -datastore "$datastore4"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_1
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_1
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_1

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $datastore_size_KB -StorageFormat Thick -datastore "$datastore5"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_1
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_1
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_1

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $datastore_size_KB -StorageFormat Thick -datastore "$datastore6"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_1
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_1
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_1

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $datastore_size_KB -StorageFormat Thick -datastore "$datastore7"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_1
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_1
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_1

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $datastore_size_KB -StorageFormat Thick -datastore "$datastore8"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_1
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_1
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_1

###################################

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $recostore_size_KB -StorageFormat Thick -datastore "$recostore1"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})
$New_SCSI_1_2 = $New_Disk1 | New-ScsiController -Type ParaVirtual -Confirm:$false
$New_SCSI_2_2 = $New_Disk2 | New-ScsiController -Type ParaVirtual -Confirm:$false
$New_SCSI_3_2 = $New_Disk3 | New-ScsiController -Type ParaVirtual -Confirm:$false

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $recostore_size_KB -StorageFormat Thick -datastore "$recostore2"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_2
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_2
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_2

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $recostore_size_KB -StorageFormat Thick -datastore "$recostore1"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_2
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_2
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_2

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $recostore_size_KB -StorageFormat Thick -datastore "$recostore2"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_2
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_2
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_2

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $recostore_size_KB -StorageFormat Thick -datastore "$recostore1"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_2
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_2
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_2

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $recostore_size_KB -StorageFormat Thick -datastore "$recostore2"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_2
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_2
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_2

#######################


$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $archstore_size_KB -StorageFormat Thick -datastore "$archstore1"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})
$New_SCSI_1_3 = $New_Disk1 | New-ScsiController -Type ParaVirtual -Confirm:$false
$New_SCSI_2_3 = $New_Disk2 | New-ScsiController -Type ParaVirtual -Confirm:$false
$New_SCSI_3_3 = $New_Disk3 | New-ScsiController -Type ParaVirtual -Confirm:$false

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $archstore_size_KB -StorageFormat Thick -datastore "$archstore2"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_3
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_3
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_3

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $archstore_size_KB -StorageFormat Thick -datastore "$archstore3"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_3
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_3
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_3

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $archstore_size_KB -StorageFormat Thick -datastore "$archstore1"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_3
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_3
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_3

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $archstore_size_KB -StorageFormat Thick -datastore "$archstore2"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_3
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_3
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_3

$New_Disk1 = New-HardDisk -vm($VM1) -CapacityKB $archstore_size_KB -StorageFormat Thick -datastore "$archstore3"
$New_Disk2 = new-harddisk -vm($vm2) -diskpath ($New_Disk1 | %{$_.Filename})
$New_Disk3 = new-harddisk -vm($vm3) -diskpath ($New_Disk1 | %{$_.Filename})

set-harddisk -Confirm:$false -harddisk $New_Disk1 -controller $New_SCSI_1_3
set-harddisk -Confirm:$false -harddisk $New_Disk2 -controller $New_SCSI_2_3
set-harddisk -Confirm:$false -harddisk $New_Disk3 -controller $New_SCSI_3_3

$ExtraOptions = @{
# per VMware, SAP and Oracle VMware Best Practices
"scsi1:0.sharing"="multi-writer";
"scsi1:1.sharing"="multi-writer";
"scsi1:2.sharing"="multi-writer";
"scsi1:3.sharing"="multi-writer";
"scsi1:4.sharing"="multi-writer";
"scsi1:5.sharing"="multi-writer";
"scsi1:6.sharing"="multi-writer";
"scsi1:8.sharing"="multi-writer";
"scsi1:9.sharing"="multi-writer";
"scsi1:10.sharing"="multi-writer";
"scsi1:11.sharing"="multi-writer";
"scsi1:12.sharing"="multi-writer";
"scsi1:13.sharing"="multi-writer";
"scsi1:14.sharing"="multi-writer";
"scsi1:15.sharing"="multi-writer";
"scsi2:0.sharing"="multi-writer";
"scsi2:1.sharing"="multi-writer";
"scsi2:2.sharing"="multi-writer";
"scsi2:3.sharing"="multi-writer";
"scsi2:4.sharing"="multi-writer";
"scsi2:5.sharing"="multi-writer";
"scsi2:6.sharing"="multi-writer";
"scsi2:8.sharing"="multi-writer";
"scsi2:9.sharing"="multi-writer";
"scsi2:10.sharing"="multi-writer";
"scsi2:11.sharing"="multi-writer";
"scsi2:12.sharing"="multi-writer";
"scsi2:13.sharing"="multi-writer";
"scsi2:14.sharing"="multi-writer";
"scsi2:15.sharing"="multi-writer";
"scsi3:0.sharing"="multi-writer";
"scsi3:1.sharing"="multi-writer";
"scsi3:2.sharing"="multi-writer";
"scsi3:3.sharing"="multi-writer";
"scsi3:4.sharing"="multi-writer";
"scsi3:5.sharing"="multi-writer";
"scsi3:6.sharing"="multi-writer";
"scsi3:8.sharing"="multi-writer";
"scsi3:9.sharing"="multi-writer";
"scsi3:10.sharing"="multi-writer";
"scsi3:11.sharing"="multi-writer";
"scsi3:12.sharing"="multi-writer";
"scsi3:13.sharing"="multi-writer";
"scsi3:14.sharing"="multi-writer";
"scsi3:15.sharing"="multi-writer";
"disk.EnableUUID"="true";
"ethernet0.coalescingScheme"="disabled";
"ethernet1.coalescingScheme"="disabled";
"sched.mem.pshare.enable"="false";
"numa.vcpu.preferHT"="true";

# per VMware's Hardening Guide - Enterprise Level
"isolation.tools.diskShrink.disable"="true";
"isolation.tools.diskWiper.disable"="true";
"isolation.tools.copy.disable"="true";
"isolation.tools.paste.disable"="true";
"isolation.tools.setGUIOptions.enable"="false";
"isolation.device.connectable.disable"="true";
"isolation.device.edit.disable"="true";
"vmci0.unrestricted"="false";
"log.keepOld"="10";
"log.rotateSize"="1000000";
"tools.setInfo.sizeLimit"="1048576";
"guest.command.enabled"="false";
"tools.guestlib.enableHostInfo"="false"
}
$vmConfigSpec = New-Object VMware.Vim.VirtualMachineConfigSpec;
Foreach ($Option in $ExtraOptions.GetEnumerator()) {
$OptionValue = New-Object VMware.Vim.optionvalue
$OptionValue.Key = $Option.Key
$OptionValue.Value = $Option.Value
$vmConfigSpec.extraconfig += $OptionValue
}

$vmview=get-vm $vmName1 | get-view
$vmview.ReconfigVM_Task($vmConfigSpec)
$vmview=get-vm $vmName2 | get-view
$vmview.ReconfigVM_Task($vmConfigSpec)
$vmview=get-vm $vmName3 | get-view
$vmview.ReconfigVM_Task($vmConfigSpec)

function Set-EagerZeroThick{
param($vcName, $vmName, $hdName)
# Find ESX host for VM
# $vcHost = Connect-VIServer -Server $vcName -Credential (Get-Credential -Credential "vCenter account")
$vmImpl = Get-VM $vmName
if($vmImpl.PowerState -ne "PoweredOff"){
Write-Host "Guest must be powered off to use this script !" -ForegroundColor red
return $false
}

$vm = $vmImpl | Get-View
$esxName = (Get-View $vm.Runtime.Host).Name
# Find datastore path
$dev = $vm.Config.Hardware.Device | where {$_.DeviceInfo.Label -eq $hdName}
if($dev.Backing.thinProvisioned){
return $false
}
$hdPath = $dev.Backing.FileName

# For Virtual Disk Manager we need to connect to the ESX server
# $esxHost = Connect-VIServer -Server $esxName -User $esxAccount -Password $esxPasswd

# Convert HD
$vDiskMgr = Get-View -Id (Get-View ServiceInstance -Server $esxHost).Content.VirtualDiskManager
$dc = Get-Datacenter -Server $esxHost | Get-View
$taskMoRef = $vDiskMgr.EagerZeroVirtualDisk_Task($hdPath, $dc.MoRef)
$task = Get-View $taskMoRef
while("running","queued" -contains $task.Info.State){
$task.UpdateViewData("Info")
}

# Disconnect-VIServer -Server $esxHost -Confirm:$false

# Connect to the vCenter
# Connect-VIServer -Server $vcName -Credential (Get-Credential -Credential "vCenter account")
if($task.Info.State -eq "success"){
return $true
}
else{
return $false
}
}

Set-EagerZeroThick $vCenter $vmName1 "Hard disk 3"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 4"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 5"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 6"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 7"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 8"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 9"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 10"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 11"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 12"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 13"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 14"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 15"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 16"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 17"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 18"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 19"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 20"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 21"
Set-EagerZeroThick $vCenter $vmName1 "Hard disk 22"

Udev>ASMlib

I'm fighting the urge to rant about this.  Although I respect Oracle's right to make their products work better and have additional functionality with their other products, I really dislike the position Oracle has on ASMlib, db_flash_cache and HCC on Sun-only storage.  Db_flash_cache and HCC are *wonderful* enhancements to database functionality...and they were released with the caveats that they only work with their respective Oracle co-products...db_flash_cache needs OEL and HCC (non-Exadata) needs Sun storage.  You've probably read prior posts on the Sun 7420 used for Exadata backups...its wonderful...people should buy it on its own merits.  OEL offers a solid product at a great price point relative to other distros...it can stand on its own too.  Still...I get it...Oracle wants to sell more Oracle.

What bothers me more is the decision to no longer release ASMLib for Redhat 6+.  Its a difficult thing for customers who have an installed base of Oracle databases on Redhat 5 and procedures to use ASMlib to switch.  For a small shop, maybe its not that big of a deal...for Oracle's big customers, it involves documentation, meetings, coordination...and a lot of ill will about being forced to change procedures and retrain resources that were trained to use ASMLib.

The written procedures to use udev with Redhat are a bit of a pain.  You identify the uuid of each disk device and create an entry in a udev rules file for it.  Again...for a small shop with a few databases...not a big deal.  I'm working on a project now to move over a hundred databases from AIX to Redhat 6 on vSphere 5 (VMWare), in RAC.  This is one of the most aggressive use cases for Oracle on VMware I've ever heard of.  To have big databases in VMWare is easy...to have busy databases in VMWare is a challenge.  Each node of each database has many disk devices.  Some of the databases are SAP, which requires multiple failgroups...which means multiple controllers with separated storage.  Its easy to make a mistake and add storage to the wrong asm diskgroup...destroying your failgroup separation.

Even with ASMLib its difficult to make sure your ASM disk...ie: VOL45, maps to to correct SCSI controller, which maps to proper storage that uses separate paths all the way to separate storage.  Without ASMlib, there's much more room for human error.  Obviously...I had to automate.

So...Oracle hands you lemons...make exa-lemonade.  I created a udev rule creation script for Redhat that does a similar task to what ASMLib always had done.

Pros to Script
  • Easy to maintain (its just bash)
  • Not kernel (or RH version) dependant, or OEL dependant
  • syncs rules across nodes (disk 14 on node 1 is the same as disk 14 on node 4 by UUID)
  • based on the SCSI controller, the name of the alias is changed...so the dba won't accidentally add a disk from one failgroup to a diskgroup of a different failgroup (which would eliminate the data protection of using multiple failgroups, and fail your SAP ASM platform certification)
Pros to ASMLib

  • Oracle maintains it (...until you update your kernel...which you do regularly for security fixes, right?)
  • Stamps disks (sector 2?) so VOL14  on node 1 is the same as VOL14 on node 4
  • There are some people that think ASMLib is more performant than Udev...but I don't think those claims come from Oracle...and I haven't been able to quantify a difference.  If a performance advantage exists, it must be slight.
To distribute the file to all the other nodes, there are 2 dependencies on Oracle's OneCommand configuration (used in Exadata, OVM, ODM, etc) for the params.ini and doall.sh script.  For params.ini, I added a parameter called SHARED_DIR, which is a directory mounted by all nodes.  If you want...you can just ftp this file to the other nodes and comment those 2 lines out. 

This is a work in progress which is expected to be modified and improved upon by the end user, and as always, use at your own risk...but I think it will likely save you some work creating your udev rules.  There is some detection of formatting of partitions...and it works for me, but you should verify the devices it recognizes as unformatted are really unformated.  Use this in a non-prod, unimportant crash and burn system first.  Hmmm...I can't think of any other warning to give.  Don't run this on any computer, ever.  To be extra, extra safe, you could comment out the last few lines that deal with moving the file around and reloading udev rules...that way you can look at the new rules file before you actually use it.

Ok...that being said, I hope this enables you to get past the lack of ASMlib on Redhat 6+, as it has definitely helped me.

#! /bin/sh
###################################
# Name: udev_rules.sh
# Date: 5/9/2012
# Purpose: This script will create all the udev rules necessary to support
# Oracle ASM for RH 5 or RH6. It will name the aliased devices
# appropriately for the different failgroups, based on the contoller
# they're assigned to.
# Revisions:
# 5/8/2012 - Created
# 5/10/2012 - Will now modify the existing rules to allow the addition of a
# single new disk. It will also sync the udev rules on node 1 with all other nodes.
###################################
source /u01/racovm/params.ini
rm /mnt/shared/udev/99-oracle-asmdevices.rules
data_disk=0
redo_disk=0
arch_disk=0
release_test=`lsb_release -r | awk 'BEGIN {FS=" "}{print $2}' | awk 'BEGIN {FS="."}{print $1}'`
echo "Detected RH release ${release_test}"

if [ -f "/etc/udev/rules.d/99-oracle-asmdevices.rules" ]; then
echo -e "Detected a pre-existing asm rules file. Analyzing...\c"
for y in {1..50}
do
found_data_disk=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "asm-data-disk${y}"`
found_redo_disk=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "asm-redo-disk${y}"`
found_arch_disk=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "asm-arch-disk${y}"`
if [ -n "${found_data_disk}" ]; then
let "data_disk++"
fi
if [ -n "${found_redo_disk}" ]; then
let "redo_disk++"
fi
if [ -n "${found_arch_disk}" ]; then
let "arch_disk++"
fi
echo -e ".\c"
done
echo "complete."
echo "Existing rules file contains:"
echo " ASM Data Disks: ${data_disk}"
echo " ASM Redo Disks: ${redo_disk}"
echo " ASM Arch Disks: ${arch_disk}"
new_file="false"
else
echo "Detected no pre-existing asm udev rules file. Building..."
new_file="true"
fi

echo "Creating new partitions if needed."
sh install.sh &> install.log

for x in {a..z}
do
if [ -n "`ls /dev/sd*1 | grep sd${x}1 `" ] ; then
asm_test1=`file -s /dev/sd${x}1 |grep "/dev/sd${x}1: data" `
asm_test2=`file -s /dev/sd${x}1 |grep "Oracle ASM" `
if [[ -n "${asm_test1}" || -n "${asm_test2}" ]] ; then
controller=`ls /sys/block/sd${x}/device/scsi_device | awk 'BEGIN {FS=":"}{print $1}'`
# ie: scsi_device:1:0:1:0
if [ "${release_test}" = "5" ]; then
result=`/sbin/scsi_id -g -u -s /dev/sd${x}`
else
result=`/sbin/scsi_id -g -u -d /dev/sd${x}`
fi
if [ "${result}" = "" ]; then
echo "No scsi id found for /dev/sd${x}. If you're running on VMWare, verify disk.EnableUUID=true has been added under option->Advanced->General->Configuration Parameters."
exit 1
fi
if [ "${controller}" = "3" ]; then
if [ -f "/etc/udev/rules.d/99-oracle-asmdevices.rules" ]; then
found_uuid=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "${result}"`
else
found_uuid=
fi
#if [[ -z "${found_uuid}" || "${new_file}" = "true" ]]; then
if [ -z "${found_uuid}" ]; then
echo "Detected a new data disk. Adding rule to /etc/udev/rules.d/99-oracle-asmdevices.rules"
let "data_disk++"
if [ "${release_test}" = "5" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -s /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-data-disk${data_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
else
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -d /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-data-disk${data_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
fi
fi
elif [ "${controller}" = "4" ]; then
if [ -f "/etc/udev/rules.d/99-oracle-asmdevices.rules" ]; then
found_uuid=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "${result}"`
else
found_uuid=
fi
if [[ -z "${found_uuid}" || "${new_file}" = "true" ]]; then
echo "Detected a new Redo disk. Adding rule to /etc/udev/rules.d/99-oracle-asmdevices.rules"
let "redo_disk++"
if [ "${release_test}" = "5" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -s /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-redo-disk${redo_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
elif [ "${release_test}" = "6" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -d /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-redo-disk${redo_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
fi
fi
elif [ "${controller}" = "5" ]; then
if [ -f "/etc/udev/rules.d/99-oracle-asmdevices.rules" ]; then
found_uuid=`cat /etc/udev/rules.d/99-oracle-asmdevices.rules|grep "${result}"`
else
found_uuid=
fi
if [[ -z "${found_uuid}" || "${new_file}" = "true" ]]; then
echo "Detected a new Arch disk. Adding rule to /etc/udev/rules.d/99-oracle-asmdevices.rules"
let "arch_disk++"
if [ "${release_test}" = "5" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -s /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-arch-disk${arch_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
elif [ "${release_test}" = "6" ]; then
echo "KERNEL==\"sd?1\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id -g -u -d /dev/\$parent\", RESULT==\"${result}\", NAME=\"asm-arch-disk${arch_disk}\", OWNER=\"oracle\", GROUP=\"dba\", MODE=\"0660\"" >> /etc/udev/rules.d/99-oracle-asmdevices.rules
fi
fi
fi
else
echo "/dev/sd${x}1 is not an asm disk."
fi
fi
done
#cat /etc/udev/rules.d/99-oracle-asmdevices.rules
echo "Syncing rules file for all nodes of this cluster..."
cd ${SHARED_DIR}/udev
cp /etc/udev/rules.d/99-oracle-asmdevices.rules .
/u01/racovm/doall.sh -p cp ${SHARED_DIR}/udev/99-oracle-asmdevices.rules /etc/udev/rules.d/99-oracle-asmdevices.rules
echo "Reloading rules for all disks on all nodes in this cluster..."
if [ "${release_test}" = "5" ]; then
/u01/racovm/doall.sh /sbin/udevcontrol reload_rules &> /dev/null
else
/u01/racovm/doall.sh /sbin/udevadm control --reload-rules &> /dev/null
fi
/u01/racovm/doall.sh /sbin/start_udev &> /dev/null
/u01/racovm/doall.sh /sbin/partprobe &> /dev/null
echo "Complete."
echo "To see the ASM UDEV rules: cat /etc/udev/rules.d/99-oracle-asmdevices.rules"


When complete, the rules file looks something like this (on RH5):

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c3800682301d41f40ce5129d796f", NAME="asm-data-disk1", OWNER="oracle", GROUP="dba", MODE="0660"


KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38a77610df366b5ce4045e0f438", NAME="asm-data-disk2", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38d94459b437a48b0d75784d0bf", NAME="asm-data-disk3", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c3803929d52a392a506b75b8fc2d", NAME="asm-data-disk4", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c383a1ab40918dbc2e7a5f8gfb9d", NAME="asm-data-disk5", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38840c4740546cb2d9874152b98", NAME="asm-redo-disk1", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38523a828e0bd08637f79862c5a", NAME="asm-redo-disk2", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38904e50311ce41bd1f8db03ca1", NAME="asm-redo-disk3", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38d4b7e30a0102afb9934bge9f2", NAME="asm-redo-disk4", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38bd8e8a59464126630ff37b5da", NAME="asm-redo-disk5", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38e50ded425980005bb5f685e14", NAME="asm-arch-disk1", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c380a88d61b860gbfab66e4ba2ec", NAME="asm-arch-disk2", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38e2dceadba28f7bb8144egd67a", NAME="asm-arch-disk3", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c383ae7f7d05b2bc2ded9724c69e", NAME="asm-arch-disk4", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c3866e4127140d48467c09f1363a", NAME="asm-arch-disk5", OWNER="oracle", GROUP="dba", MODE="0660"

KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s /block/$parent", RESULT=="42000c38bca0e4c305e9236c7d62c553e", NAME="asm-arch-disk6", OWNER="oracle", GROUP="dba", MODE="0660"

High-Performance Oracle RAC on VMWare

Its been a while since I gave the blog some attention. Sorry to those who have commented and I haven't replied to. I'll try to catch up ASAP. I've been extremely busy with a mass migration of Oracle databases from some IBM P5 595's to VMWare ESX 5, running on beautiful Cisco M3's series (Ivy Bridge) blades and Redhat 6. My client had already moved a few hundred smaller databases to this platform...but to move the high IO, multi-terabyte databases to RAC on VMWare is a completely different challenge. This kind of migration to blades without VMWare isn't easy...you have to deal with different hardware, a different OS and different endian (the etymology of that word is hilarious, by the way). Oracle's statement of direction announcement (see 1089399.1) of not supporting ASMLib after Redhat 5, adds one more complication.

After the architecture has been defined and ESX 5 installed (see vBlock and vFlex references...both are excellent selections and I've read on The Register they may be offering FusionIO as part of their packages) the next step is to create the VM's.

This is an unpleasant point-and-click process with any new VM...and with even a 3 node RAC, its too easy to make a mistake. In general, the process is to create the VM's with the appropriate vCPU and RAM you need (which should be a post in itself to calculate that.) Then create the storage on one of the vm's (call it node 1), then...on each of the other nodes create disks and point back to node 1's vmdk's. Rinse and repeat for each node and each shared disk. If you're following the SAP best practices for ASM, you'll need to do that for a minimum of 3 ASM diskgroups, each with their own set of vmdk's and data stores. When that's complete, go back and update the other settings per VMWare's best practices for Oracle running on RAC, then make the settings needed per VMWare's hardening guide. To make these changes, there's a button (under properties-options-Advanced-General called "Configuration Parameters") where you can add a line, and add these parameters and their values in a free-form table. If you make a typo, there's no checking...it just won't work. Per SAP best practices, don't forget to reserve 100% of RAM.

...all that to say, it is a time-consuming pain and its difficult to do without making a mistake. Early on in the project, I decided I would try to find a script from somebody out there who has done this and use their script to create the VM's...but I couldn't find any. In fact, although there were lots of people who were using PowerCLI and good community support, I don't think I found a single reference when PowerCLI was being used to create RAC nodes. So...I came up with a script to do it. There are 2 sections in it I borrowed from the PowerCLI community...the section that updated the configuration parameters and the section that zero's out the vmdk's.

The obvious (but not performant) way to create a RAC database in VMWare (as I said above) is to create the vmdk's and then point all the other node's vmdk's to their counterpart on the first node. After that you have to set the SCSI controller to allow for "physical SCSI Bus Sharing." This works...but this is the generic method of sharing storage across VMware nodes. VMware implements a locking mechanism to protect your data while accessing the same storage from multiple machines. If you have an application that isn't cluster-aware, this is necessary. If you have an application that IS cluster aware (like Oracle RAC) this is redundant. So...for better IO performance in RAC, set up each shared disk with the multi-writer parameter (see below.) For that to work, the disks must be eager zeroed. Zeroing out the vmdk's is a best practice you'll see for people who have high-io VM's (like databases.) In vSphere 5, that's called "thick eager zeroed", and its necessary for multi-writer locking.

There are a couple of key things to keep in mind when working on VMWare with RAC:

1.  Eager zero data storage, as stated above.

2. Sometimes, more vCPU's is slower. How VMWare's cpu scheduling works is that (in an effort to simulate a physical environment) all virtual cores have to be free on physical cores in order to get a time slice. For example, let's say you have a dual-quad blade with hyperthreading turned on (per vSphere 5 best practice), which gives you 16 virtual CPU's. You have other VM's that are busy and they're using up 10 cores at this moment. You created a VM that can use 8 vCPU's and now you need a single time slice. Your single time slice has to wait until all 8 vCPU's are free before it can proceed. Also, even though you just need 1 time slice, it makes 8 physical cores busy. Not only does assigning too many cores slow your VM down, your VM slows down all other VM's on the host. You can have an ESX host CPU bottleneck, even though the sum total of the cpu used inside the VM's is barely anything. This means the DBA needs read access to the host performance statistics, a fact that VMWare admins aren't happy about. Blue Medora is answering that call with an OEM 12c plugin for VMWare.

3. In VMWare ram is shared, but only when its possible to do so. There are many great memory efficiency features in vSphere 5...but if you follow Oracle best practices, they won't help you very much. Huge Pages is used internally in vSphere 5, but if you use huge pages in the VM (for the SGA), it will initially work very well...but as soon as your caches warm up, the SGA becomes distinct from all other memory used on your host. At that point, you get no advantages...so I've found its better to reserve at least the size of the SGA you'll be using. SAP's best practice for RAC on VMWare is to do 100% memory reservation for Prod...and there are other performance enhancing reasons to do that. Besides removing the performance overhead of the memory saving features, it allows some other vSphere 5 features that improve network latency (such as VMWare's Direct Path I/O to reduce network latency, not to be confused with Oracle's definition of Direct Path IO). This can have a huge impact for a busy interconnect in RAC.

4. Many of the VMWare HA features are redundant when you're running RAC. In RAC, if you have a node fail, your processes should fail over to the surviving nodes, and apps keep running. If you're running Cisco UCS, your blade's profile will go to a spare blade, and soon (15 min or so) the failed blade is back in action. VMware HA would restart that VM on a different machine that's still running, and soon (15 min or so) your failed node is back in action, assuming you left sufficient free RAM on the other blades to make that possible. Very smart people disagree about which HA method is best..and I think there are good arguments to be made on all sides. My opinion is that you should provide HA via RAC, because its instant and its more efficient. If you depend on VMWare HA with RAC, you have to keep X% free resources available in reserve on all blades...just in case.  For example, if you have 2 nodes, you'll need to limit your total vm ram allocation to 50% (maybe a bit less due to memory tricks VMWare employs...but its still a large % of your total ram.)

If you depend on RAC for HA, you can use all resources as long as you allocate enough RAM for the additional processes you'd need in case there's a node failure.  Surviving nodes would have to absorb the additional connections that used to be on the failed node...) This allows for much better efficiency, but it means the surviving nodes need to be capable of supporting the additional load from the failed node.

5. One last thing to keep in mind - if you try to start a VM that puts you over 20TB on the host, you get a non-descriptive "out of memory" error, which then references one of your vmdk's and your VM will fail to start.  When I first saw this I thought...what does "out of memory"-an error associated with ram- have to do with a vmdk?  The answer lies in the VMWare internals...for performance reasons, the storage is reference in a reserved heap space in ram...similar to the memory used to track AU's in Oracle's ASM.  By default, there's 80MB set aside for that which is sufficient for 20TB of vmdk storage.  After that, you get the "out of memory" error and the vmdk that pushed you over the limit is referenced.  That’s why a RAM issue references a vmdk.  The solution is to increase the heap size to its max, which is 256MB and allows up to 60TB of vmdk storage per ESX host.  After that, you need to reconsider using VMDK's.  In my project, we were going to pass this limit...so for that reason (and others) we implemented Oracle Advanced Compression as part of the migration.  Suddenly, the databases that were over 60TB total became much less.  We're averaging a little over 3X compression with our data. 

With all the changes I've mentioned...hardware platform, os, endian, asmlib (or the lack of), vmdk's, advanced compression...and we're moving some non-RAC databases to RAC, some non-ASM databases into ASM and implementing index compression...how can we guarantee performance on the new platform with so many variables?  We used Oracle's Real Application Testing and tested extensively.  I'll talk more about that later.

In my next post, I'll show you the script I passed off to the VMWare team to create the VM's.  At first they were a bit hesitant that a database guy was giving them a script to make VM's, but after they realized how many hours of their lives they were going to get back (and that it removed human error), they were more than happy to not only use it for the migration, but to modify it and use it for all other VM builds....

Monday, March 5, 2012

IOPS requirements

So...let's say you're moving your database from the antique hardware its running on (the old hardware is always relatively an antique, right?) to the newest latest greatest hardware.  You're also upgrading your storage to a new SAN or NAS.  The storage architect says, "Yep...according to this shiny marketing page on the storage vendor's website, its the fastest storage in its class, so it should be good enough...and we can thin provision and deduplicate and do backups with snaps and pool all the storage into one monster ring of storage to rule them all!  And, since disk drives are cheaper when you buy them in 3tb versions, we'll save lots of money by getting rid of our smaller spindles and using these big sata spindles!

Ok...maybe I'm pessimistic, but whenever I hear an extreme used like "fastest", I get a red flag in my head...and although the great features of modern storage might be worth it...there's usually some overhead involved.  Also, the term "larger spindles" usually translates in my head to "less IOPS", unless you short stroke (check out my "How much faster is Exadata High Performance than High Capacity" blog for details about that) but in a way, if you're short stroking they really don't have that much more capacity, and you have to use the capacity smarter.

...and in general, marketing pages on websites are usually only correct under extreme examples that make their hardware look better.  Testing...and quantifying performance are always necessary.

After you run your ORION tests and you now know that your new storage is capable of 500,000 IOPS.  Is that enough for your databases?  Here's a query you can use (for non-RAC) to find out.  It'll go through the AWR data in your database to find when the busiest hours have been, and the number of IOPS you used then.  Its possible that this is lower than the peak amount of IOPS you used, since this is an average over an hour, so treat these results as minimums.

I looked around on the internet and I couldn't find anything that met my requirements...the solution that came closest ignored the fact that a physical IOP can be a read or a write...which is a pretty big miss.  In my situation, I actually had multiple databases using the same SAN, so I took these results and put them into a table, where I could sum up the results by hour, and see when the overlapping busiest IOP requirements were, and how high they were.

Looking back, I should have written this using SQL's lag, but...this works.  Keep in mind, this is what Oracle believes to be a physical IO...if you're not in ASM (you should be) you may be using the OS filesystem cache...even if you're in ASM, you may be caching on your NAS or SAN....both situation are hidden from Oracle.  When I put this together, there were a couple of issues I had to overcome.

1. The stat_id for  physical reads/writes was different from db to db...even databases with the same version.  I'm not sure why this is...but I suspect they were originally different versions and upgraded to the same versions.

2. This data is accumulated from snap to snap since the last bounce, so I had to take a snap and compare it to the previous snap to find the delta, then find the number of seconds between those snaps and divide to find the IO/sec.  If there's a database bounce, the accumulator will reset to zero, and it would report a negative value, so I filtered out the negative results.

3. Snaps aren't exactly as long as you think they are...and they're configurable...so you might have them set for 15 min or an hour...and they may actually be running for 15 minutes and 1 second.  This takes the variable length of a snap into account.

I'm sure this isn't perfect, but its worked well for every non-RAC databases I've run it on.  It might work for RAC too...I just haven't had the need to gather IOPS requirements from a RAC database...usually the migration target is in RAC, not the source.  I hope this helps you in your struggle against the vendor-marketing-website-page-believing storage administrator in your life.


select 'mydb01', sample_hour, (rps+wps) IOPS
from (
with snaps as (
select hiof1.snap_id, sum(hiof1.value) reads, sum(hiof2.value) writes
from sys.WRH$_SYSSTAT HIOF1, sys.WRH$_SYSSTAT HIOF2
where HIOF1.stat_id in (select stat_id from v$statname where name like '%physical read total IO%')
  and HIOF2.stat_id in (select stat_id from v$statname where name like '%physical write total IO%')
  and HIOF1.snap_id=hiof2.snap_id
group by hiof1.snap_id
),
my_snaps as
(select snap_id, instance_number, begin_interval_time, end_interval_time,
 extract(second from (end_interval_time-begin_interval_time))+
 (extract(minute from (end_interval_time-begin_interval_time))*60)+
 (extract(hour from (end_interval_time-begin_interval_time))*60*60) seconds
 from dba_hist_snapshot)
select s1.snap_id snap_1, s2.snap_id snap_2, to_date(to_char(begin_interval_time,'MM/DD/YYYY HH24'),'MM/DD/YYYY HH24') sample_hour, sum(s2.reads-s1.reads) reads, sum(s2.writes-s1.writes) writes,
  trunc(sum(s2.reads-s1.reads)/sum(seconds)) rps, trunc(sum(s2.writes-s1.writes)/sum(seconds)) wps
from snaps s1, snaps s2, my_snaps ms
where s1.snap_id=ms.snap_id
  and s1.snap_id=(s2.snap_id-1)
  and (s2.reads-s1.reads)>1
  and (s2.writes-s1.writes)>1
group by s2.snap_id, to_date(to_char(begin_interval_time,'MM/DD/YYYY HH24'),'MM/DD/YYYY HH24'), s1.snap_id
) order by 3 desc;


Keep in mind, Oracle considers usage of AWR tables to be a performance pack licensed option...so if you run this query on a database that isn't licensed...shame on you.  Jonathan Lewis has a nice post to show you how to clean the audit trail.

For example output, here's the results from a busy database I've recently worked on.  I changed the db name and times, but not the iops (yes, that's 210 million physical IOPS...from Oracle's perspective):

-->
db_name SAMPLE_HOUR IOPS
mydb01 01/05/11 08:00 AM 210487926
mydb01 01/25/11 11:00 PM 33824606
mydb01 01/26/11 01:00 AM 24807386
mydb01 09/04/12 03:00 PM 16523389
mydb01 07/26/11 02:00 AM 14372534
mydb01 06/04/12 02:00 PM 9307085
mydb01 05/19/12 02:00 PM 8735280
mydb01 04/20/12 01:00 PM 2439049
mydb01 03/06/11 11:00 AM 1460471
mydb01 09/17/12 07:00 PM 665055

Monday, January 30, 2012

GoldenGate capture with compressed tables will be available soon!

  I've been told "Golden Gate will support compressed tables...soon" for literally years now by multiple people at Oracle.  My first SR on this was opened on March 29th, 2010, nearly 2 years ago!  That SR's resolution was to add me to bugs:

bugdb - 9416239 which tracks SUPPORT of OLTP TABLE COMPRESSION
bugdb - 9428399 which tracks exadata V2 HCC compression
bugdb – 9426065 - SUPPORT ORACLE COMPRESSED TABLES

  Mining compressed tables is a necessary requirement for many large databases that use CDC...especially databases in Exadata, where you're all but expected to make use of Hybrid Columnar Ccompression (HCC). Think of all the benefits of compression in your database:

1. Compressed data means smaller datafiles, which means faster restore times, improving your RTO.

2. Your 8k block actually stores more than 8k of data which means you need to read/write fewer blocks.  Since the blocks are stored compressed in memory, it also means you increase the amount of data in the same size of your db cache. As the blocks move from memory to disk, it improves your potential IOPS capacity.  Ok, not really increasing IOPS capacity, but increasing the amount of data you can move per IO, which has a similar effect.

3. Compression means less storage requirements which in turn mean less storage costs.

In a previous post, I mentioned how, when I first started looking at Golden Gate with Exadata, I found that not only were compressed tables not captured, if GG came across a logged change of a compressed table in the archivelogs or redo, it would abend...even if it was a table that was excluded from capture.  We submitted a prio 1 SR and Oracle created a patch for this-so we were able to begin using GG with Exadata...but all tables that used compression were excluded from GG capture because of this limitation-put another way, we were forced to not compress tables that needed to be captured.  Those were all the biggest, most compressible tables.  This forced my client to use much more storage in their Exadata cells than they anticipated, and today they're preparing to buy additional storage cells, partially because of this limitation.

Imagine the fun they had explaining to their management they need hundreds of thousands of dollars to purchase additional storage from Oracle, ultimately due to an Oracle bug. :)  Come to think of it...no wonder it took Oracle 2 years to fix this! :)

Not to get too off track, but for a truly high-performance database, you should really test how advanced compression affects your system performance.  Your milage will vary, but based on my testing, I would expect it to improve your performance.  See http://otipstricks.blogspot.com/2011/02/exadata-index-dropping-and-compression.htmlThere are other huge performance improvements 11.2.0.3 offers for Exadata, especially for OLTP environments.  

Performance aside, IMHO you should begin to prepare for the upgrade to 11.2.0.3 PB 3 to take advantage of this new GG feature/bug fix, so you're ready to go when the new version of GG is released. 

I have no special knowledge from a friend at Oracle this time...I'm gleaning this from a statement in a PDF from Douglas Reid, Oracle GoldenGate product mgmt.  GG 11.2.1 will have tighter integration with XStream Out API (Capture), which means GG will be using a call to a procedure already in the Oracle kernel.  That internal call will be what handles OLTP and HCC compression, which to this point hasn't been possible.  Soooo...based on the schedule in that PDF of the approx March/April release of GG 11.2.1, there must be, prior to that time,  a database change to allow that.  Since we can't do it in 11.2.0.3 PB2 and its going to happen in the next several weeks...it must be coming in PB3.  The release schedule "remains at the discretion of Oracle"...but short of mind reading, this is the best we've got.

So, 11.2.0.3PB3 will be out w/in the next few weeks.  Based on the previous release cycles of patch bundles, it will be sooner than that.  I would guess sometime this week Exadata 11.2.0.3 PB 3 will be released (it requires Exadata Storage Server 11.2.2.4.)  Check Metalink note 888828.1 for updates...by the time you read this, you'll likely see 11.2.0.3 PB3 listed in that note...if not, check back in a few days. (Add that note to your MOS favourites by clicking the little star on it.  Its the best source to find what's currently GA.)

To sum it up, when you use GG 11.2.1 with Exadata 11.2.0.3 PB3, you'll be able to FINALLY mine compressed tables in Goldengate.
 I could be wrong, but my impression is that a lot of people at different companies are using Goldengate with Exadata.  "Extreme Goldengate Performance on Exadata" is one of my most popular posts.  Given the cost per GB of storage in Exadata, compression could save you a *huge* amount of money.  Once GG 11.2.1 is released, the only reason I can think of NOT to compress everything you can is that your access patterns don't work well with it...assuming the budget that was big enough to buy Exadata is also big enough to license compression. ;)

After ~2 years of waiting for these 3 bugs to be fixed, the wait is finally over.

Wednesday, January 4, 2012

Yet another plug for a great book

If you know anything about Oracle databases, you've likely heard of Jonathan Lewis and the great work he's done in the past on Oracle internals (why things do what they do...what's going on under the hood of the database)...especially his work on the cost-based optimizer.  I know he's inspired me to be much better than I would have been, because after reading his books, I was humbled by my relative ignorance, and became desperate to improve.  He's literally an alien of extraordinary ability....


We exist in a field filled with many geniuses.  What sets him apart from all but very few is his ability to bring people to his depth of understanding...his ability to explain complicated things is unmatched...but come to think of it...many of the "greats" have this ability.  Carry Millsap, Tanel Poder, Richard Niemic, Kerry Osborne...not to mention many others.  Hmm...maybe that's the difference between obscurity and notoriety...


I bring this up because I've been reading one of his newest books, Oracle Core: Essential Internals for DBAs and Developers.  His approach to explaining things in the book is interesting...he's recognized the circular manor of Oracle internals...you have to have a clue about everything before you can get a little depth about something in particular.  You have to have a little depth about everything before you can get deep into anything.  This is how he explains things.  By the time he gets into real depth, he's already brought you to a place where its just a small step, rather than a leap.  This makes his book interesting to people who have used Oracle for years and newbies too.


I'll add it to my book list on the right....