Is Your Java Application FailoverProof (i.e., RAC Aware)?
Why in the world should a Java developer care about failover and what does it mean to be failoverproof?
Well, failure is inevitable however, in mission critical (i.e., web) deployments, all applications including the Java ones must sustain resource manager (i.e., RDBMS) failure, or connection failure, or transaction failure without disrupting the service.
How exactly?
For the sake of simplicity, let's take a JDBC program. Best practices mandate that Java/JDBC programs capture exceptions and deal with these; here is a skeleton of a failoverproof program using Oracle JDBC in RAC environment:
...
try
{
conn = getConnection();
// do some work
} catch (SQLException e) {
handleSQLException(e); }
...
handleSQLException (SQLException e)
{
if (OracleConnectionCacheManager.isFatalConnectionError(e))
ConnRetry = true; // Fatal Connection error detected
}
Capturing SQL exceptons and re-trying to get a connection are all good JDBC programming so the burden is not really at the Java application level (it has to be somewhat portable), rather at the driver or framework level. Up to these (the driver, the OR Mapping framework, servlet engine, Java EE container) to furnish, under the covers, a failoverproof environment.
Do all drivers and Java frameworks are failoverproof?
You wish! The reality is that very few JDBC drivers or Java frameworks furnish true/reliable connection or transaction failover mechanisms.
From database access point of view, what does it take for a JDBC Driver or a Java framework to be failoverproof?
First of all, a JDBC driver or a Java EE ccontainer by itself cannot furnish a complete failoverproof environment, it more importantly requires the resource manager, in this case the RDBMS to be failoverproof as well. In the Oracle RDBMS case, instance/node failover as well as scalability is furnished by the RAC framework.
What is RAC?
An Oracle database is managed by a database instance which is made of a shared memory (a.k.a. SGA)and a set of database server processes. A database is usually accessed and managed by a single instance. However, an Oracle database can also be concurrently accessed and managed by multiple instances up to 64 nodes and beyond; this technology is known as Real Application Clusters (RAC).
How Does RAC Furnish Failover?
Starting with release 10g, RAC generates events that indicate the health or status of each RAC components including SERVICE, SERVICE_MEMBER,DATABASE, INSTANCE, NODE, ASM, and SRV_PRECONNECT.
The possible status are: UP, DOWN, NOT_RESTARTING, PRECONN_UP, PRECON_DOWN, and UNKNOWN.
Example of events can be: "Instance1 UP", "Node2 Down".
RAC furnishes failover by design in the sense that when a service/instance/node fails, a well written application can be redirected to the surviving node/instance provided these furnish the same service and proceed against the same database.
How Does JDBC Leverages RAC Failover?
The Oracle JDBC 10g drivers, more specifically it's connection cache (a.k.a. Implicit Connection Cache) leverages RAC by subscribing to the following events and status (as described in RAC documentation and in chapter 7 of my book):
- Service Up: The connection pool starts establishing connections in small batches to the newly added service.
- Instance (of Service) Up: The connection pool gradually releases idle connections associated with existing instances and reallocates these onto the new instance.
- Instance (of Service) Down: The connections associated with theinstance are aborted and cleaned up, leaving the connection pool with sound and valid connections.
- Node Down: The connections associated with the instance are aborted and cleaned up, leaving the connection pool with good connections.
But to be reliable, these events must be propagated to interested parties as fast as possible because the timeout mechanisms(tcp_keepalive, tcp_ip_interval, and so on) are unreliable and may take a long (tens of minutes) to indefinite time to be kick-in.
Orale furnishes ONS (Orale Notification Services) and Advanced Queue as publish/subscribe and predictable notification mechanisms which detects and propagates quasi-instantaneously (sub-seconds) those events to components that have subscribed to these mechanisms.
Setting up JDBC for Failover
- Set up a multinstance Oracle Database 10g RAC database (see RAC documentation).
- Virtualize the database host through a service name (see JDBC URL in chapter 7 of my book).
- Configure ONS on each RAC server node (see the RAC Administrator Guide or chapter 7 in my book).
- Configure ONS on each client node (10g Release 1) or use simpler remote subscription (10g Release 2). Ensure ons.jar file is in the CLASSPATH then programmatically set the ONS configuration string for remote ONS subscription at the data source level (unfortunately this cannot yet be set through system property): ods.setONSConfiguration("nodes=node1:4200,node2:4200"); The Java virtual machine (JVM) in which the JDBC driver is running must have oracle.ons.oraclehome set to point to ORACLE_HOME -Doracle.ons.oraclehome=
- Enable the Connection Cache and Fast Connection Failover through system property: -Doracle.jdbc.FastConnectionFailover = true Alternatively, the Connection Cache and Fast Connection Failover can be enabled programmatically using OracleDataSource properties: ods.setConnectionCachingEnabled(true); ods.setFastConnectionFailoverEnabled(true);
Oracle JDBC: Handling of DOWN events (Under the covers) Upon the notification of Service Down event, a worker thread (one per pool instance) processes the event in two passes:First pass: Connections are marked as down first, to efficiently disable bad connectionsSecond pass: Aborts and removes connections that are marked as downNote: active connections that may be in the middle of a transaction receive a SQLException instantly
Oracle JDBC: Hanlding of UP Events (under the covers)
A Service UP event initiates connections to be load balanced to all active RAC instances Connection creation depends on Listener’s placement of connections. Starting with 10g release 2,load balancing advisory events enabled Runtime Connection Load Balancing (covered in chapter 7 of my book).
Object-relational Mapping frameworks as well as any Java EE containers may either leverage Oracle JDBC (bypassing their own connection pool) or subscribe diretly to RAC events using the ONS APIs and processing these (i.e., handle connection retry). To my knowledge, only Oracle's Java EE containers (OC4J) has integrated Fast Connection Failover and ONS at datasource level.
How does Oracle JDBC Fast Connection Failover (FCF) compares with TAF?
Fast Connection Fail-over and TAF differ from each other in the followingways:
- Driver-type dependency: TAF is in fact a OCI failover mechanism exposed to Java through JDBC-OCI. FCF is driver-type independent (i.e., works for both JDBC-Thin and JDBC-OCI).
- Application-Level Connection Retries: FCF supports application-level connection retries (i.e., the application may retry the connection or rethrow the exception). TAF on the other hand retries connection transparently at the OCI/Net out of the control ofthe application or Java framework.
- Integration with the Connection Cache: FCF is integrated with the Implicit Connection Cache and invalidates failed connections automatically in the cache. TAF on the other hand works on a per-connection basis at the network level; it does not notify the connection cache of failures.
- Load Balancing: unlike TAF, FCF and runtime connection load balancing (RCLB) support UP event load-balancing of connections and runtime distribution of work across active RAC instances.
- Transaction Management: FCF automatically rolls back in-flight transations; TAF, on the other hand, requires the application to roll back the transaction and send an acknowledgment to TAF to proceed with the failover.
- TAF does not protect or fail-over codes that have server-side states such as Java or PL/SQL stored procedures; however, the application can register a callback function that will be called upon failure to reestablish the session states.
// register TAF callback function “cbk”
((OracleConnection) conn).registerTAFCallback(cbk,msg);
Voila, you now have a Java plateform with connection pool failover, on top of which you can code and deploy JDBC applications or Java EE components.
For more details, see chapter 7 of my book: http://db360.blogspot.com/2006/08/oracle-database-programming-using-java_01.html







11 Comments:
Is it safe to say, that when using the BC4J/ADF Framework? my application must be FailoverProof/RAC Awar?
Hi,
Fast Connection Failover is based on RAC events. ADFBC provides a failover mode, but it does not depend on RAC.
Kuassi
Kuassi,
The question I have is what does the application code have to do to use TAF vs FCF? We have in our requirements to our client identified to use TAF, however I am questioning whether TAF is the right way to go or FCF would be a better option. We are in a J2EE environment with an Oracle 10.2.0.2 application server cluster and 10.2.0.2 RAC database behind it. From our management perspective since we are well into development what impact would using FCF have, i.e. additional code to our application or just enabling ONS and setting some system configuration properties within the application servers and or database servers? Any assistance you could provide or direction would be greatly appreciated.
Thanks,
Brad
Hi Brad,
Simply put:
- with TAF, connection failover is transparent (in case of node failure) however, you have to programmatically rollback ongoing transactions (TAF requires an ACK from the application, in order to proceed)
- with FCF, you need to retry getConnection() programmatically in case of node faiure (you might not need to do anything if your JDBC code has already done the right thing - i.e., catch SQL exception and retry), ongoing transactions are automatically rolled back.
From JDBC perspective, we recommend FCF and it's future incarnation as these are driver independent (as you know, TAF is an OCI feature which only works for JDBC-OCI).
Hope this helps, Kuassi
Does anyone know if the Spring framework handles this? There is a spring thread at http://forum.springframework.org/showthread.php?t=16931
Cheers
Neil
Nice and knowledgeable sites for everyone-
booksshelf
knowledge
books
liberary
kitaben
Books and references
books
tutorial books
Kuassi,
Our application is not using RAC but using DataGuard as per client requirements. In this case how do we manage the failover when standby databse becomes master? We are using hibernate and c3p0. Please advice.
Thanks
Sri.
Kuassi,
I am an oracle DBA implementing 10gR2 RAC. I have implemented TAF in the past but never had a chance to implement FCF so far. But now our requirement is to implement FCF using ONS. I see that our app team will need to use Oracle JDBC drivers in their code so that we can achieve this. What does the app team need to use/do to ensure that our FCF will work? Thanks much in advance.
Don
Don,
Assuming RAC, ONS, and the services have been confugured and working (see Setting up JDBC for Failover in the blog) , the application team needs to do 2 things.
1) enable FCF either programmatically or using system property
(-Doracle.jdbc.FastConnectionFailover = true)
2) catch SQLException and if this is indeed a FataConnectionError then retry getConnection. See the pseudo-code above in the blog.
That's it, Kuassi
Hi Kuassi,
We are currently using our own connection pooling mechanism, and would like to continue using this pool. We are also moving to a RAC environment, and need to support RAC in our application.
Is it possible to use the RAC type db connection properties with our existing connection pool. (No FCF, and no Implicit Connection Cache)
If so, what is the behavior of the JDBC driver when getConnection is called and one of the nodes in the RAC are down? Does it return a connection failure?
Hi,
Right now, the subscription to RAC events ( using ONS) is handled by our connection pool, not by the JDBC itself. There is no public API that you can use to subscribe to RAC events and manage these in your custom connection pool but we are looking into furnishing a JDBC API which will allow you to do that, in a near future.
KUassi
Post a Comment
Links to this post:
Create a Link
<< Home