Oracle RAC Specific Questions and Answers: 2008

Wednesday, October 1, 2008

RAC on ASM

RAC on ASM and a Cluster File System
Oracle RAC systems are comprised of multiple database instances sharing the same database files. The shared database files include the data files, control files and online redo logs. Each instance writes to and reads from the same database files simultaneously. For the shared files to be available to processes on multiple machines, the disks containing the files must be shared. With Oracle 10g, a clustered file system is available for free on Linux and Windows systems named Oracle Cluster File System (OCFS). Deploying OCFS to a shared disk device like a storage area network allows database files to be shared between multiple instances. Another disk management system new to Oracle 10g is named Automatic Storage Management (ASM). Oracle can allocate a disk group on a storage device unit and place Oracle database files within this disk group. From the operating system, the Oracle disk group cannot be managed by typical operating system file commands. Management of the files must be done via Oracle’s database instance. A RAC database can have files on both OCFS and ASM file systems simply by specifying the file location when creating the files.

Sunday, September 21, 2008

Real Application Clusters Recovery and Cache Fusion

In Real Application Clusters recovery, the amount of recovery processing required after node failures is proportional to the number of failed nodes. In general, data blocks become available immediately after they are recovered.

When an instance fails and the failure is detected by another Oracle instance, Oracle performs the following recovery steps:

During the first phase of recovery, which is the GES reconfiguration, Oracle first reconfigures the GES enqueues. Then Oracle reconfigures the GCS resources. During this time, all GCS resource requests and write requests are temporarily suspended. However, processes and transactions can continue to modify data blocks as long as these processes and transactions have already acquired the necessary enqueues.
After the reconfiguration of enqueues that the GES controlled, a log read and the remastering of GCS resources occur in parallel. At the end of this step the block resources that need to be recovered have been identified.
Buffer space for recovery is allocated and the resources that were identified in the previous reading of the log are claimed as recovery resources. Then, assuming that there are PIs of blocks to be recovered in other caches in the cluster database, resource buffers are requested from other instances. The resource buffers are the starting point of recovery for a particular block.
All resources and enqueues required for subsequent processing have been acquired and the Global Resource Directory is now unfrozen. Any data blocks that are not in recovery can now be accessed. Note that the system is already partially available.
The cache layer recovers and writes each block identified in step 2, releasing the recovery resources immediately after block recovery so that more blocks become available as cache recovery proceeds.
After all blocks have been recovered and the recovery resources have been released, the system is again fully available. Recovered blocks are available after recovery completes.

In summary, the recovered database or recovered portions of the database become available earlier, and before the completion of the entire recovery sequence. This makes the system available sooner and it makes recovery more scalable.

If neither the PI buffers nor the current buffer for a data block are in any of the surviving instances' caches, then Oracle performs a log merge of the failed instances. As mentioned for recovery in general, the performance overhead of a log merge is proportional to the number of failed instances and to the size of the redo logs for each instance. You can, however, control the size of the log with Oracle's checkpoint features. With its advanced design, Real Application Clusters recovery can manage multiple simultaneous failures and sequential failures. The shared server feature is also resilient to instance failures during recovery.

Cache fusion in 10g

Cache Fusion addresses several types of concurrency

* Concurrent Reads on Multiple Nodes
* Concurrent Reads and Writes on Different Nodes
* Concurrent Writes on Different Nodes

Concurrent Reads on Multiple Nodes

Concurrent reads on multiple nodes occur when two instances need to read the same data block. Real Application Clusters resolves this situation without synchronization because multiple instances can share data blocks for read access without cache coherency conflicts.
Concurrent Reads and Writes on Different Nodes

A read request from an instance for a block that was modified by another instance and not yet written to disk can be a request for either the current version of the block or for a read-consistent version. In either case, the Global Cache Service Processes (LMSn) transfer the block from the holding instance's cache to the requesting instance's cache over the interconnect.
Concurrent Writes on Different Nodes

Concurrent writes on different nodes occur when the same data block is modified frequently by different instances. In such cases, the holding instance completes its work on the data block after receiving a request for the block. The GCS then converts the resources on the block to be globally managed and the LMSn processes transfer a copy of the block to the cache of the requesting instance. The main features of this processing are:

* The Global Cache Service (GCS) tracks a each version of a data block, and each version is referred to as a past image (PI). In the event of a failure, Oracle can reconstruct the current version of a block by using the information in a PI.
* The cache-to-cache data transfer is done through the high speed IPC interconnect, thus eliminating disk I/O.
* Cache Fusion limits the number of context switches because of the reduced sequence of round trip messages. Reducing the number of context switches enables greater cache coherency protocol efficiency. The database writer (DBWn) processes are not involved in Cache Fusion block transfers.

What is Cache Fusion in 10g RAC

Cache Fusion

A diskless cache coherency mechanism in RAC that provides copies of blocks directly from a holding instance's memory cache to a requesting instance's memory cache.

difference between Oracle RAC and OPS

The biggest difference between Oracle RAC and OPS is the addition of Cache Fusion. With OPS a request for data from one node to another required the data to be written to disk first, then the requesting node can read that data. With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.

Not all clustering solutions use shared storage. Some vendors use an approach known as a Federated Cluster, in which data is spread across several machines rather than shared by all. With Oracle10g RAC, however, multiple nodes use the same set of disks for storing data. With Oracle10g RAC, the data files, redo log files, control files, and archived log files reside on shared storage on raw-disk devices, a NAS, ASM, or on a clustered file system. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.

Oracle RAC

Oracle RAC, introduced with Oracle9i, is the successor to Oracle Parallel Server (OPS). Oracle RAC allows multiple instances to access the same database (storage) simultaneously. RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all nodes access the same database, the failure of one instance will not cause the loss of access to the database.

At the heart of Oracle10g RAC is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available in order to allow all nodes to access the database. Each node has its own redo log file(s) and UNDO tablespace, but the other nodes must be able to access them (and the shared control file) in order to recover that node in the event of a system failure.

Oracle RAC Specific Questions and Answers