Step By Step: Upgrade 11gR1 RAC to
11gR2 RAC on Oracle Enterprise Linux 5 (32 bit) Platform.
By Bhavin Hingu
This Document shows the step by step of upgrading 3-Node
11gR1 RAC to 11gR2 RAC. I have chosen the below upgrade path which will allow
me to upgrade 11gR1 Clusterware and ASM to 11gR2 Grid infrastructure. I would
prefer to perform this activity in a scheduled outage window though the rolling
upgrade of ASM and CRS is possible in 11gR1. Upgrading 11gR1 RAC database needs
outage and the total downtime may further be avoided or minimized by using
Standby Database in the upgrade process (not covered here).
|
Existing 10gR2 RAC setup (Before Upgrade) |
Target 11gR2 RAC Setup (After Upgrade) |
Clusterware |
Oracle 11g R1 Clusterware 11.1.0.6 |
Oracle 11gR2 Grid Infrastructure 11.2.0.1 |
ASM
Binaries |
11g R1 RAC 11.1.0.6 |
Oracle 11gR2 Grid Infrastructure 11.2.0.1 |
Cluster
Name |
Lab |
lab |
Cluster
Nodes |
node1, node2, node3 |
node1, node2, node3 |
Clusterware
Home |
/u01/app/oracle/crs (CRS_HOME) |
/u01/app/grid11201 (GRID_HOME) |
Clusterware
Owner |
oracle:(oinstall, dba) |
oracle:(oinstall, dba) |
VIPs |
node1-vip, node2-vip, node3-vip |
node1-vip, node2-vip, node3-vip |
SCAN |
N/A |
lab-scan.hingu.net |
SCAN_LISTENER
Host/port |
N/A |
Scan VIPs Endpoint: (TCP:1525) |
OCR
and Voting Disks Storage Type |
Raw Devices |
Raw Devices |
OCR
Disks |
/dev/raw/raw1, /dev/raw/raw2 |
/dev/raw/raw1, /dev/raw/raw2 |
Voting
Disks |
/dev/raw/raw3, /dev/raw/raw4,
/dev/raw/raw5 |
/dev/raw/raw3, /dev/raw/raw4,
/dev/raw/raw5 |
ASM_HOME |
/u01/app/oracle/asm11gr1 |
/u01/app/grid11201 |
ASM_HOME
Owner |
oracle:(oinstall, dba) |
oracle:(oinstall, dba) |
ASMLib
user:group |
oracle:oinstall |
oracle:oinstall |
ASM
LISTENER |
LISTENER (TCP:1521) |
LISTENER (TCP:1521) |
|
||
DB
Binaries |
Oracle 11gR1 RAC (11.1.0.6) |
Oracle 11gR2 RAC (11.2.0.1) |
DB_HOME |
/u01/app/oracle/db11gr1 |
/u01/app/oracle/db11201 |
DB_HOME
Owner |
oracle:(oinstall, dba) |
oracle:(oinstall, dba) |
DB
LISTENER |
LAB_LISTENER |
LAB_LISTENER |
DB
Listener Host/port |
node1-vip, node2-vip, node3-vip
(port 1530) |
node1-vip, node2-vip, node3-vip
(port 1530) |
DB
Storage Type, File Management |
ASM with OMFs |
ASM with OMFs |
ASM
diskgroups for DB and FRA |
DATA, FRA |
DATA, FRA |
OS
Platform |
Oracle Enterprise Linux 5.5 (32 bit) |
Oracle Enterprise Linux 5.5 (32 bit) |
NOTE: The
Grid Infrastructure owner must be the same as the 11gR1 CRS owner. Role
Separation is not possible in upgrades.
HERE’s an existing 11gR1 RAC Setup in detail
The Upgrade Process is composed of below 5 Stages:
·
Pre-Upgrade
Tasks
·
Upgrade 11gR1 Clusterware to the 11gR2 Grid Infrastructure
(11.2.0.1).
·
Upgrade 11gR1 ASM to 11gR2 Grid Infrastructure.
·
Upgrade Database from 11gR1 RAC to 11gR2 RAC.
Install/Upgrade RPMs
required for 11gR2 RAC Installation
Setup of Network Time
Protocol
Start the nscd
on all the RAC nodes
Backing up 11gR1
existing HOMEs and database
Minimum Required RPMs for 11gR2 RAC on OEL 5.5
(All the 3 RAC Nodes):
Below
command verifies whether the specified rpms are installed or not. Any missing
rpms can be installed from the OEL Media Pack
For 11gR2:
rpm -q
binutils compat-libstdc++-33 elfutils-libelf elfutils-libelf-devel
elfutils-libelf-devel-static \
gcc gcc-c++
glibc glibc-common glibc-devel glibc-headers kernel-headers ksh libaio
libaio-devel \
libgcc libgomp libstdc++ libstdc++-devel make
numactl-devel sysstat unixODBC unixODBC-devel
Combined
both the release’s requirements, I had to install below RPM.
numactl-devel ŕ Located on the 3rd CD of OEL 5.5 Media pack.
[root@node1 ~]# rpm
-ivh numactl-devel-0.9.8-11.el5.i386.rpm
warning:
numactl-devel-0.9.8-11.el5.i386.rpm: Header V3 DSA signature: NOKEY, key ID
1e5e0159
Preparing...
########################################### [100%]
1:numactl-devel
########################################### [100%]
[root@node1 ~]#
I had to
upgrade the cvuqdisk
RPM by removing and installing the same with
higher version. This step is also taken care by rootupgrade.sh script.
cvuqdisk ŕ Available on Grid Infrastructure Media (under rpm folder)
rpm -e cvuqdisk
export
CVUQDISK_GRP=oinstall
echo $CVUQDISK_GRP
rpm -ivh
cvuqdisk-1.0.7-1.rpm
SCAN VIPS
to configure in DNS which resolves to lab-scan.hingu.net:
192.168.2.151
192.168.2.152
192.168.2.153
HERE is the existing DNS setup. In that
setup, the below two files were modified with the entry in RED to add these
SCAN VIPs into the DNS.
/var/named/chroot/var/named/hingu.net.zone
/var/named/chroot/var/named/2.168.192.in-addr.arpa.zone
/var/named/chroot/var/named/hingu.net.zone
$TTL
1d
hingu.net.
IN SOA lab-dns.hingu.net. root.hingu.net. (
100 ; se = serial number
8h ; ref = refresh
5m ; ret = update retry
3w ; ex = expiry
3h ; min = minimum
)
IN NS lab-dns.hingu.net.
; DNS server
lab-dns
IN A 192.168.2.200
; RAC Nodes Public name
node1
IN A 192.168.2.1
node2
IN A 192.168.2.2
node3
IN A 192.168.2.3
; RAC Nodes Public VIPs
node1-vip
IN A 192.168.2.51
node2-vip
IN A 192.168.2.52
node3-vip
IN A
192.168.2.53
;
3 SCAN VIPs
lab-scan IN
A 192.168.2.151
lab-scan IN
A 192.168.2.152
lab-scan IN
A 192.168.2.153
; Storage Network
nas-server IN
A 192.168.1.101
node1-nas
IN A 192.168.1.1
node2-nas
IN A 192.168.1.2
node3-nas
IN A 192.168.1.3
/var/named/chroot/var/named/2.168.192.in-addr.arpa.zone
$TTL
1d
@
IN SOA lab-dns.hingu.net. root.hingu.net. (
100 ; se = serial number
8h ; ref = refresh
5m ; ret = update retry
3w ; ex = expiry
3h ; min = minimum
)
IN NS lab-dns.hingu.net.
; DNS machine name in reverse
200
IN PTR lab-dns.hingu.net.
; RAC Nodes Public Name in Reverse
1
IN PTR node1.hingu.net.
2
IN PTR node2.hingu.net.
3
IN PTR node3.hingu.net.
; RAC Nodes Public VIPs in Reverse
51
IN PTR node1-vip.hingu.net.
52
IN PTR node2-vip.hingu.net.
53
IN PTR node3-vip.hingu.net.
;
RAC Nodes SCAN VIPs in Reverse
151 IN
PTR lab-scan.hingu.net.
152 IN
PTR lab-scan.hingu.net.
153 IN
PTR lab-scan.hingu.net.
Restart
the DNS Service (named):
service named restart
NOTE:
nslookup for lab-scan should return names in random order every time.
Network
Time Protocol Setting (On all the RAC Nodes):
Oracle Time Synchronization Service is chosen to be used
over the Linux system provided ntpd. So, ntpd needs to be
deactivated and deinstalled to avoid any possibility of it being conflicted
with the Oracle’s Cluster Time Sync Service (ctss).
# /sbin/service ntpd
stop
# chkconfig ntpd off
# mv /etc/ntp.conf
/etc/ntp.conf.org
Also remove the
following file:
/var/run/ntpd.pid
Network
Service Cache Daemon (all the RAC nodes)
The Network Service Cache Daemon was started on all the RAC nodes.
Service nscd start
Backing Up ORACLE_HOMEs/database:
Steps I followed to take the Backup of ORACLE_HOMEs before
the upgrade: (This can be applied to 11gR1 and 10g HOMEs)
On node1:
mkdir backup
cd backup
dd if=/dev/dev/raw1 of=ocr_disk_10gr2.bkp
dd if=/dev/dev/raw3 of=voting_disk_10gr2.bkp
tar cvf node1_crs_10gr2.tar
/u01/app/oracle/crs/*
tar cvf node1_asm_10gr2.tar
/u01/app/oracle/asm/*
tar cvf node1_db_10gr2.tar /u01/app/oracle/db/*
tar cvf node1_etc_oracle /etc/oracle/*
cp /etc/inittab etc_inittab
mkdir etc_init_d
cd etc_init_d
cp /etc/init.d/init* .
On node2:
mkdir backup
cd backup
tar cvf node2_crs_10gr2.tar
/u01/app/oracle/crs/*
tar cvf node2_asm_10gr2.tar
/u01/app/oracle/asm/*
tar cvf node2_db_10gr2.tar /u01/app/oracle/db/*
tar cvf node2_etc_oracle /etc/oracle/*
cp /etc/inittab etc_inittab
mkdir etc_init_d
cd etc_init_d
cp /etc/init.d/init* .
On node3:
mkdir backup
cd backup
tar cvf node3_crs_10gr2.tar
/u01/app/oracle/crs/*
tar cvf node3_asm_10gr2.tar
/u01/app/oracle/asm/*
tar cvf node3_db_10gr2.tar /u01/app/oracle/db/*
tar cvf node3_etc_oracle /etc/oracle/*
cp /etc/inittab etc_inittab
mkdir etc_init_d
cd etc_init_d
cp /etc/init.d/init* .
RMAN full
database backup was taken.
With this, the pre-Upgrade steps are completed
successfully and are ready to upgrade to 11g R2 Grid Infrastructure Next.
Step By Step: Upgrade Clusterware, ASM and Database from
11.1.0.6 to 11.2.0.1.
Upgrade 11gR1 CRS to 11gR2 Grid Infrastructure:
Oracle document recommends leaving all
the RAC instances up and running during the upgrade process because the
rootupgrade.sh script brings down the crs stack. I would prefer to atleast
shutdown the database cleanly before the start of upgrade process.
·
Stop the labdb database.
·
Start the runInstaller from the 11gR2 Grid
Infrastructure software stage.
Grid Infrastructure Upgrade process:
Installation
Option:
Upgrade Grid Infrastructure
Product
Language:
English
Node
Selection:
Select all
the nodes
SCAN
information:
SCAN name:
lab-scan.hingu.net
SCAN port:
1525
ASM
Monitor Password
Password
entered
Prerequisite
Checks:
Verify all
the minimum prerequisites are satisfied successfully
Privileged
Operating System Groups:
ASM
Database Administrator (OSDBA) Group: dba
ASM
Instance Administrator Operator (OSOPER) Group: dba
ASM
Instance Administrator (OSASM) Group: oinstall
Installation
Location:
Oracle
Base: /u01/app/oracle
Software
Location: /u01/app/grid11201
Summary
Screen:
Verified
the information here and pressed “Finish” to start installation.
At the end
of the installation, the rootupgrade.sh script needs to be executed as root
user on all the nodes one by one.
/u01/app/grid11201/rootupgrade.sh
The rootupgrade.sh failed on the last node (node3) with the below error as
it seemed that CRS died after successful upgrade of OCR.
The alertnode3.log
showed the below error. It seemed that both the OCR disks which are raw devices
became inaccessible after the successful upgrade of OCR. I tried this upgrade
2-3 times and it error out at the same exact place all the time. Because the
OCR was upgraded successfully, I thought to reboot all the nodes at this stage
to see if the HA comes back up successfully after the reboots. I also wanted to
confirm that the OCR integrity via ocrcheck to see if there are no logical
corruption at the block level.
/u01/app/grid11201/log/node3/alertnode3.log:
[ctssd(22505)]CRS-2408:The
clock on host node3 has been updated by the Cluster Time Synchronization
Service to be synchronous with the mean cluster time.
2011-10-13 15:30:58.341
[ohasd(21071)]CRS-2765:Resource 'ora.crsd'
has failed on server 'node3'.
2011-10-13
15:30:58.830
[client(25091)]CRS-1006:The
OCR location /dev/raw/raw2 is inaccessible. Details in
/u01/app/grid11201/log/node3/client/ocrconfig_25091.log.
2011-10-13
15:30:58.845
[client(25091)]CRS-1006:The
OCR location /dev/raw/raw1 is inaccessible. Details in
/u01/app/grid11201/log/node3/client/ocrconfig_25091.log.
2011-10-13 15:33:44.000
[crsd(25138)]CRS-1012:The OCR service started
on node node3.
2011-10-13 15:36:45.355
/u01/app/grid11201/log/node3/client/ocrconfig_25091.log:
Oracle Database 11g Clusterware Release
11.2.0.1.0 - Production Copyright 1996, 2009 Oracle. All rights reserved.
2011-10-13 15:27:59.695: [
OCRCONF][3047016128]ocrconfig starts...
2011-10-13 15:27:59.722: [
OCRCONF][3047016128]Exporting OCR data to
[/u01/app/grid11201/cdata/lab/ocr11.2.0.1.0_upg_node3.ocr]
2011-10-13
15:30:58.830: [ OCRRAW][3047016128]proprior: Header check from OCR device 1 offset 0 failed
(26).
2011-10-13
15:30:58.845: [ OCRRAW][3047016128]proprior: Header check from OCR device 0 offset 0 failed
(22).
2011-10-13
15:30:58.845: [ OCRRAW][3047016128]ibctx: Failed to read the whole bootblock. Assumes invalid format.
2011-10-13 15:30:58.845: [ OCRRAW][3047016128]rtnode:2: Problem [26]
reading the tnode 553. Returning [123]
2011-10-13 15:30:58.846: [ OCRRAW][3047016128]prgval:
problem reading the tnode
2011-10-13 15:30:58.846: [
OCRCONF][3047016128]Error[104]: Failed to get key value for key
CRS.CUR.ora!node2!ons.USR_ORA_PRECONNECT
2011-10-13 15:30:58.847: [ OCRCONF][3047016128]Exiting
[status=failed]...
I rebooted
all the RAC nodes at this point and after the reboot the HA stack came back up
successfully using the new 11gR2 Grid infrastructure but it did not have all
the resources configured. I had to manually configure the SCAN, SCAN_LISTENER,
OC4J and ACFS as shown. The database, DB services and GSD were down when the
11gR2 CRS came back up on all the nodes after the reboot. It was expected for
GSD to remain down as it is disabled by default in 11gR2. I noticed that the
srvctl was no longer working to start the db service oltp. I had to use crs_start from Grid Home and it worked
fine.
Manual
Tasks that were performed to complete the 11gR2 Grid Infrastructure
configuration:
As oracle:
/u01/app/grid11201/bin/srvctl enable nodeapps
–g
/u01/app/grid11201/bin/srvctl start nodeapps –n
node1
/u01/app/grid11201/bin/srvctl start nodeapps –n
node1
/u01/app/grid11201/bin/srvctl start nodeapps –n
node1
/u01/app/oracle/db11gr1/bin/srvctl start
database –d labdb
/u01/app/grid11201/bin/crs_start
ora.labdb.oltp.labdb1.srv
/u01/app/grid11201/bin/crs_start
ora.labdb.oltp.labdb2.srv
/u01/app/grid11201/bin/crs_start
ora.labdb.oltp.labdb3.srv
As root:
/u01/app/grid11201/bin/srvctl add scan -n
lab-scan.hingu.net
/u01/app/grid11201/bin/crsctl add type
ora.registry.acfs.type -basetype ora.local_resource.type -file
/u01/app/grid11201/crs/template/registry.acfs.type
/u01/app/grid11201/bin/crsctl
add resource ora.registry.acfs -type ora.registry.acfs.type
As oracle:
/u01/app/grid11201/bin/srvctl add scan_listener
–l listener -s -p TCP:1525
/u01/app/grid11201/bin/srvctl start scan
/u01/app/grid11201/bin/srvctl start
scan_listener
/u01/app/grid11201/bin/srvctl add oc4j
/u01/app/grid11201/bin/srvctl start oc4j
/u01/app/grid11201/bin/crs_start ora.registry.acfs
As root: (verify the OCR integrity and logical
corruption after the upgrade)
/u01/app/grid11201/bin/ocrcheck
/u01/app/grid11201/bin/crsctl query css
votedisk
After
configuring the CRS resources manually, the final CRS stack looked like below:
The OCR
integrity Check and Logical Corruption Check was verified. Both the Disks
looked fine.
HERE’s the detailed Screen Shots of Upgrading 11gR1 CRS to
11gR2 Grid Infrastructure
Upgrade 11gR1 ASM to 11gR2 Grid Infrastructure:
·
Stopped the labdb database.
·
Invoked the asmca from the 11gR2 Grid Infrastructure HOME (/u01/app/grid11201).
·
Moved the listener LISTENER
from 11gR1 ASM_HOME
·
Started the labdb database using 11gR1 srvctl
·
Started the DB service oltp using /u01/app/grid11201/bin/crs_start
/u01/app/oracle/db11gr1/bin/srvctl stop
database –d labdb
/u01/app/grid11201/bin/asmca
Move the
Listener “LISTENER” from 11gR1 ASM Home to 11gR2 Grid Infrastructure:
/u01/app/oracle/db11gr1/bin/srvctl stop
listener -l LISTENER_NODE1 -n node1
/u01/app/oracle/db11gr1/bin/srvctl stop
listener -l LISTENER_NODE2 -n node2
/u01/app/oracle/db11gr1/bin/srvctl stop
listener -l LISTENER_NODE3 -n node3
/u01/app/oracle/db11gr1/bin/srvctl remove
listener -l LISTENER_NODE1 -n node1
/u01/app/oracle/db11gr1/bin/srvctl remove
listener -l LISTENER_NODE2 -n node2
/u01/app/oracle/db11gr1/bin/srvctl remove
listener -l LISTENER_NODE3 -n node3
Add the listener “LISTENER” using netca from
11gR2 Grid Infrastructure Home (TCP:1521)
/u01/app/grid11201/bin/netca
/u01/app/oracle/db11gr1/bin/srvctl start
database –d labdb
/u01/app/grid11201/bin/crs_start
ora.labdb.oltp.labdb1.srv
/u01/app/grid11201/bin/crs_start
ora.labdb.oltp.labdb2.srv
/u01/app/grid11201/bin/crs_start
ora.labdb.oltp.labdb3.srv
HERE’s
the detailed Screen Shots of Upgrading 11gR1 ASM to 11gR2 Grid Infrastructure
Upgrade
11gR1 RAC Database to 11gR2 RAC:
Start the runInstaller from 11g R2 Real
Application Cluster (RAC) Software Location:
/home/oracle/db11201/database/runInstaller
Real Application Cluster installation process:
Configure
Security Updates:
Email: bhavin@oracledba.org
Ignore the
“Connection Failed” alert.
Installation
Option:
Install
database software only
Node
Selection:
Select All
the Nodes (node1,node2 and node3)
Product
Language:
English
Database
Edition:
Enterprise
Edition
Installation
Location:
Oracle
Base: /u01/app/oracle
Software Location:
/u01/app/oracle/db11201
Operating
System Groups:
Database
Administrator (OSDBA) Group: dba
Database
Operator (OSOPER) Group: oinstall
Summary
Screen:
Verified
the information here and pressed “Finish” to start installation.
At the End of the installation, the below scripts needs
to be executed on all the nodes as root user.
/u01/app/oracle/db11201/root.sh
Upgrade the Database labdb using dbua:
·
Invoked the dbua from the 11gR2 RAC HOME (/u01/app/oracle/db11201).
·
Fixed any Critical Warnings returned from
pre-Upgrade Utility by DBUA.
·
After the Successful Upgrade of Database to
11.2.0.1, moved the Listener LAB_LISTENER to 11gR2 HOME
·
Updated the REMOTE_LISTENER parameter to
lab-scan.hingu.net:1525
·
Stopped the database labdb
·
Rebooted all the nodes and verify that asm,
database, listeners and other resources came back up without any issue.
/u01/app/oracle/db11201/bin/dbua
The upgrade
of 11gR1 RAC database labdb finished without any error and here is the upgrade
result.
Move the
Listener “LAB_LISTENER” from 11gR1 RAC DB Home to 11gR2 RAC database Home:
Move the TNSNAMES.ORA from old 11gR1 HOME to
11gR2 Home.
ssh node3 cp
/u01/app/oracle/db11gr1/network/admin/tnsnames.ora
/u01/app/oracle/db11201/network/admin/
ssh node2 cp
/u01/app/oracle/db11gr1/network/admin/tnsnames.ora
/u01/app/oracle/db11201/network/admin/
ssh node1 cp
/u01/app/oracle/db11gr1/network/admin/tnsnames.ora /u01/app/oracle/db11201/network/admin/
Invoke netca from 11gR1 HOME to remove listener
LAB_LISTNEER
/u01/app/oracle/db11gr1/bin/netca
Invoke netca from 11gR2 HOME to add listener
LAB_LISTNEER on the same port 1530
/u01/app/oracle/db11201/bin/netca
Select the same end point TCP:1530.
Modified
the REMOTE_LISTENER parameter:
alter system set
remote_listener='lab-scan.hingu.net' scope=both sid='*';
Restarted the database to verify that the database
instances are appropriately registered with their respective listeners.
srvctl stop
database -d labdb
srvctl start
database -d labdb
Rebooted
all the 3 RAC nodes and verified that all the resources comes
up without any issue/errors.
reboot
HERE’s the detailed Screen Shots of Upgrading database
from 11gR1 RAC to 11gR2 RAC