Hello, you are not logged in.  Login or sign up
Community >> Blogs
Search Toad World Search

Do you have a topic that you'd like discussed?  We'd love to hear from you.  Send us your idea for a blog topic.

Messing With 11g 32bit RAC
 
Location: Blogs Mike Ault's Blog    
 MikeA Wednesday, September 05, 2007 10:25 AM

Well, the 32 bit 11g beta is officially over with the GA of the 11g 32 bit release on the Oracle download site. I replaced my beta copy with the production release and started playing about 2 weeks ago. I thought you all might like to know some of the “gotchas” I have run into so far.

Granted, some of these may be my fault and yes, I have filed SRs on them and Oracle has been responsive in troubleshooting them, however on the off chance they are bona fide bugs I would like to give you all a heads up. 

The install went nearly flawlessly, the new OUI interface looks classier than the old one and seems to do a good job. I installed the CRS (cluster ready services) first and it went very smoothly. One little change, the OUI requires a separate CRS home so be prepared, in fact, it requires it not to be a part of the ORACLE_BASE location so I just suggest adding a /home/crs area (on UNIX/Linux) for it. There are also new users (crs, asmadmin) and new groups suggested (crsadmin, asmadmin) to allow finer control over who manages what.

After the CRS is installed then you install the database software. Since I was doing a RAC install I installed ASM as well. I had to manually download and install the ASMLIB utility (at least I couldn’t find oracleasm when the install was complete) and before the system would recognize my ASM disks, brand them using the oracleasm utility. I ended up doing a software only install then going back with the database creation assistant (dbca) to perform the actual installation of the ASM and database instances due to the missing oracleasm utility and the inability of the dbca utility to recognize that non-formatted, non-mounted disks where ASM fodder. 

I installed the ASM instance first (of course, rather hard to install the database first) as a part of the database install and it allowed me to only create a single diskgroup, DATA, I then backed out and just did an ASM maintenance run of DBCA to add a second diskgroup, RECOVERY (more about this later).

I then re-ran dbca to create the actual RAC instance. Everything seemed to go well. At the conclusion of the install I attempted to log in with SQLPLUS, it reported the memory area was not present. Attempts to access the ASM instance also failed even though both showed the normal background processes and srvctl reported them up and operating. The asmcmd utility also could not access the ASM instance. I shutdown both the ASM and database instances and then restarted ASM manually. After a manual restart via SQLPLUS I could log in to ASM with SQLPLUS, but srvctl would not log on reporting placement errors. In addition asmcmd could see the ASM instance, diskgroups, disks, etc. after the manual startup. I started ASM on the second node using srvctl, it seemed to start ok, but again I couldn’t access it, I shut it down and restarted it manually and everything seemed to be working fine.  

After verifying the ASM was working (at least for SQLPLUS and ASMCDM) I manually started the database instance, it errored out saying it couldn’t see the RECOVERY diskgroup and thus couldn’t see the control files. I checked with asmcmd and sure enough, no RECOVERY diskgroup. I checked the init.ora for the ASM instances and in the ASM_DISKGROUPS parameter, only the DATA diskgroup was listed. After adding the entry for the RECOVERY diskgroup and bouncing the instances asmcmd showed both diskgroups and the database was able to startup manually, viola! I could see it from SQLPLUS and log in remotely.

I tried several times to start the ASM and database using the srvctl routine, each time I could only see it via the srvctl commands, but any attempt with remote or local SQLPLUS login met with failure. 

Next, I tried to create an additional tablespace. No matter what I tried, I couldn’t create more than a 8 gigabyte (actually just less than that) datafile for my tablespace. Using a 8K blocksize and normal (small) datafiles I should have been able to create a datafile just a little smaller than 32 gigabyes in size. Finally I checked the df –k for my regular filesystems, sure enough the ORACLE_HOME filesystem report zero space available, a quick check of $ORACLE_HOME/dbs showed Oracle had created the datafile not in +DATA on ASM where it should have but in the default location when the “db_create_file_dest” parameter is not set. I checked the database initialization parameters and sure enough, DBCA had not set the default location for file creation. I reset the parameter for both instances and then had no more issues with file creation.

After creating the needed data and index tablespaces I began loading a 300 gigabyte TPCH export. Man was it going slow! During a particular large burst of activity the database fell over and one node restarted. Whoa! This is not supposed to happen! I looked at the alert log for the ASM instance…hmm…normal startup right up to the point I saw: 

“WARNING: No cluster interconnect has been specified. Depending on
           the communication driver configured Oracle cluster traffic 
           may be directed to the public interface of this machine.
           Oracle recommends that RAC clustered databases be configured
           with a private interconnect for enhanced security and
           performance.”

And a bit further down:

“Cluster communication is configured to use the following interface(s) for this instance
192.168.1.103”

Yep, routing of an interconnect over a pubic Ethernet will cause reboots sure as shooting. I looked at the cluster configuration file via an OCRDUMP and it says it is using the private interconnect, I double check the hosts file and the private interconnect it properly configured to a 1 gigabyte dedicated NIC configured as 10.1.1.1. Strange doings. I set up the interconnect to use the 10.1.1.1 NIC (node 1) and 10.1.1.2 NIC (node 2) by using the cluster_interconnects parameters and then restart ASM, yep, it likes that better.

I restart the load assuming it was an ASM problem, however, I look at the related RAC waits, sure enough the normal instance interconnect latencies are sky high and climbing. I kill the load and look at their alert logs, see the same problem, even though, once again, the OCRDUMP shows they are using the private interconnect. I reset their cluster_interconnects parameter and restart.

I restart the load, only to have it fail at 06:05 the next morning. I look at the alert log and see that the automatic maintenance kicked in at 06:00 and set up a resource group…I file an SR but when the load once again fails when nothing is happening I recheck and see some IO errors…the problem is tracked to a bad fibre cable.

Finally, I am loading my 300 GB TPCH instance, it should be interesting to see what comes up next!

Copyright ©2007 Quest Software Inc.
Permalink |  Trackback
Search Blog Entries
 
Copyright 2008 by Quest Software  | Terms Of Use | Privacy Statement | Contact Us