Wednesday, 11 January 2012

Fault Handling in Oracle SOA Suite : Advanced Concepts

This tutorial is meant to cover extensively the mechanism that we can adopt for Fault Management for a SOA Suite composite. It will deal with a fairly overall strategy for handling faults and dealing with them in various ways.
Before diving more into advanced concepts of Handling Faults let me present a small introduction covering the basics of a Service composite.
Basic Architecture of a Service Composite in Oracle SOA Suite
image
  1. Service components – BPEL Processes, Business Rule, Human Task, Mediator. These are used to construct a SOA composite application. A service engine corresponding to the service component is also available.
  2. Binding components – Establish connection between a SOA composite and external world.
  3. Services provide an entry point to SOA composite application.
  4. Binding defines the protocols that communicate with the service like SOAP/HTTP, JCA adapter etc.
  5. WSDL advertises capabilities of the service.
  6. References enables a SOA composite application to send messages to external services
  7. Wires enable connection between service components.
Coming to Fault Handling in a composite there are primarily two types of faults
  1. Business faults – Occurs when application executes THROW activity or an invoke receives fault as response. Fault name is specified by the BPEL process service component. This fault is caught by the fault handler using Fault name and fault variable.
  2. Runtime faults – Thrown by system. Most of these faults are provided out of the box. These faults are associated with RunTimeFaultMessage and are included in http://schemas.oracle.com/bpel/extension namespace.
Oracle SOA Suite gives us an option of configuring fault and fault actions using policies. This means that we can create policies in response to a specific type of exception. Policies are defined in a file that by default is called fault-policies.xml
Policies for fault handling consist of two main elements:
  1. The fault condition that activates the policy block—we specify what type of fault(s) the policy is relevant for. We can then apply even more finer grained policy and actions based on error codes, error messages etc.
  2. The action(s) that should be performed when the condition is satisfied. An action for an fault may be to retry it for a certain number of time at a specified interval, or to mark it in recovery for human intervention, use a custom Java code or simply to throw the fault back. If the fault is rethrown then if we have specified any explicit ‘catch’ block in our BPEL process that will be executed.
It should also be noted that fault policies need to be explicitly associated with composites, components, or references. This is done in a fault-bindings.xml file. Fault bindings link the composite, specific components in the composite, or specific references in the components on the one hand to one of the fault policies on the other.
Have a look at the diagram below to understand a mechanism to throw a fault from a service composite, identify the fault type and then take necessary action.
image
The following post will try and cover all aspects of what is covered in the diagram above.
Consider the following fault-policies.xml. Read the comments in the XML to understand what each of the condition, action and property is about.
01<faultPolicies xmlns="http://schemas.oracle.com/bpel/faultpolicy">
02<faultPolicy version="2.0.1" id="CompositeFaultPolicy" xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.oracle.com/bpel/faultpolicy">
03<Conditions>
04<!-- Conditions can be fine grained to include Actions based on Error Codes. If a remotefault occurs check whether it is a WSDLReadingError. If yes then rethrow it else retry it."-->
05<faultName xmlns:bpelx="http://schemas.oracle.com/bpel/extension" name="bpelx:remoteFault">
06<condition>
07<test>$fault.code/code="WSDLReadingError"</test>
08<action ref="ora-rethrow-fault"/>
09</condition>
10<condition>
11<action ref="ora-retry"/>
12</condition>
13</faultName>
14<faultName xmlns:bpelx="http://schemas.oracle.com/bpel/extension" name="bpelx:bindingFault">
15<condition>
16<action ref="java-fault-handler"/>
17</condition>
18</faultName>
19<faultName xmlns:bpelx="http://schemas.oracle.com/bpel/extension" name="bpelx:runtimeFault">
20<condition>
21<action ref="java-fault-handler"/>
22</condition>
23</faultName>
24</Conditions>
25<Actions>
26<!-- This Action will invoke a Custom Java Class to process Faults. Also depending upon the returnValueanother action will be invoked whic is specified by the ref attribute. This demosntrates chaining of Actions"-->
27<Action id="java-fault-handler">
28<javaAction className="com.beatech.faultapp.CustomFaultHandler" defaultAction="ora-human-intervention" propertySet="properties">
29<returnValue value="Manual" ref="ora-human-intervention"/>
30</javaAction>
31</Action>
32<!-- This Action will mark the instance as "Pending for Recovery" in the EM console -->
33<Action id="ora-human-intervention">
34<humanIntervention/>
35</Action>
36<!--This is an action will bubble up the fault to the Catch Blocks-->
37<Action id="ora-rethrow-fault">
38<rethrowFault/>
39</Action>
40<!--This action will attempt 3 retries with intervals of 120 seconds -->
41<Action id="ora-retry">
42<retry>
43<retryCount>3</retryCount>
44<retryInterval>120</retryInterval>
45<retryFailureAction ref="java-fault-handler"/>
46</retry>
47</Action>
48<!--This action will cause the instance to terminate-->
49<Action id="ora-terminate">
50<abort/>
51</Action>
52</Actions>
53<!--Properties can be used to pass values to the Java class as a Map that can be used by the Class -->
54<Properties>
55<propertySet name="properties">
56<property name="myProperty1">propertyValue1</property>
57<property name="myProperty2">propertyValue2</property>
58<property name="myPropertyN">propertyValueN</property>
59</propertySet>
60</Properties>
61</faultPolicy>
62</faultPolicies>
Inside the custom Java fault handler we can also use a switch that acts on the returnValue to chain another Action.
1<javaaction classname="com.beatech.faultapp.CustomFaultHandler" defaultaction="ora-rethrow">
2<returnValue ref="ora-rethrow" value="Rethrow"/>
3<returnvalue ref="ora-terminate" value="Abort"/>
4<returnvalue ref="ora-retry" value="Retry"/>
5<returnvalue ref="ora-human-intervention" value="Manual"/>
6</javaaction>
The next step will be to create a fault-bindings.xml file to simply bound the fault policy file to the composite.
1<faultPolicyBindings version="2.0.1" xmlns="http://schemas.oracle.com/bpel/faultpolicy">
2<composite faultPolicy="CompositeFaultPolicy"/>
3</faultPolicyBindings>
Finally we have to add two properties in the composite.xml to let the composite know about them
<property name=”oracle.composite.faultPolicyFile”>fault-policies.xml></property>
<property name=”oracle.composite.faultBindingFile”>fault-bindings.xml></property>
We can use different names and locations for the fault policies and fault bindings files, by setting the properties oracle.composite.faultPolicyFile and oracle.composite.faultBindingFile in the composite.xml to configure these custom files.
For example we can refer to these files even from the MDS.
<property name=”oracle.composite.faultPolicyFile”>oramds://apps/policy/fault-policies.xml></property>
<property name=”oracle.composite.faultBindingFile”>oramds://apps/policy/fault-bindings.xml></property>
Once we hit a fault in our composite that has a custom Java Action the java class CustomFaultHandler will be instantiated. Here is one example of a Java Class.
The custom Java class has to implement the interface IFaultRecoveryJavaClass that defines two methods i.e handleRetrySuccess and handleFault. The custom Java class CustomFaultHandler has access to the IFaultRecoveryContext containing information about the composite, the fault, and the policy.
If the fault is thrown by a BPEL process we can check if it’s an instanceof BPELFaultRecoveryContextImpl to get further fault details.
01package com.beatech.faultapp;
02import com.collaxa.cube.engine.fp.BPELFaultRecoveryContextImpl;</pre>
03import com.oracle.bpel.client.config.faultpolicy.IBPELFaultRecoveryContext;
04import java.util.Map;
05import oracle.integration.platform.faultpolicy.IFaultRecoveryContext;
06import oracle.integration.platform.faultpolicy.IFaultRecoveryJavaClass;
07 
08public class CustomFaultHandler implements IFaultRecoveryJavaClass {
09public CustomFaultHandler() {
10super();
11}
12 
13public void handleRetrySuccess(IFaultRecoveryContext iFaultRecoveryContext) {
14System.out.println("Retry Success");
15handleFault(iFaultRecoveryContext);
16}
17 
18public String handleFault(IFaultRecoveryContext iFaultRecoveryContext) {
19//Print Fault Meta Data to Console
20System.out.println("****************Fault Metadata********************************");
21System.out.println("Fault policy id: " + iFaultRecoveryContext.getPolicyId());
22System.out.println("Fault type: " + iFaultRecoveryContext.getType());
23System.out.println("Partnerlink: " + iFaultRecoveryContext.getReferenceName());
24System.out.println("Port type: " + iFaultRecoveryContext.getPortType());
25System.out.println("**************************************************************");
26//print all properties defined in the fault-policy file
27System.out.println("Properties Set for the Fault");
28Map props = iFaultRecoveryContext.getProperties();
29for (Object key: props.keySet())
30{
31System.out.println("Key : " + key.toString()  + " Value : " + props.get(key).toString());
32}
33//Print Fault Details to Console if it exists
34System.out.println("****************Fault Details********************************");
35if(iFaultRecoveryContext instanceof BPELFaultRecoveryContextImpl)
36{
37BPELFaultRecoveryContextImpl bpelCtx = (BPELFaultRecoveryContextImpl) iFaultRecoveryContext;
38System.out.println("Fault: " + bpelCtx.getFault());
39System.out.println("Activity: " + bpelCtx.getActivityName());
40System.out.println("Composite Instance: " + bpelCtx.getCompositeInstanceId());
41System.out.println("Composite Name: " + bpelCtx.getCompositeName());
42System.out.println("***********************************************************");
43}
44//Custom Code to Log Fault to File/DB/JMS or send Emails etc.
45return "Manual";
46}
47}
Now here is what we can do from here
  1. Log the fault/part of fault in a flat file, database or error queue in a specified enterprise format.
  2. We can even configure to send an Email to the support group for remedy and action. (Use custom Java Email code or use the UMS java/ejb APIs to do so)
  3. Return a flag with an appropriate post Action. This flag determines what action needs to be taken next
The java class would require the SOA and BPEL runtime in classpath to compile and execute.
image
To make sure that when the composite instances faults out and the fault-policy.xml is able to instantiate this class we have to make it available in the server’s classpath.
There are a couple of ways to do that. Here is one of the way to achieve it.
  1. Compile your Java Project and export it as a jar file (say CustomFaultHandling.jar)
  2. Go to <Middleware Home>\Oracle_SOA1\soa\modules\oracle.soa.bpel_11.1.1 directory of your Oracle SOA Suite installation.
  3. Copy the CustomFaultHandling.jar in the above directory
  4. Unjar the oracle.soa.bpel.jar and edit the MANIFEST.MF to add an entry of the above jar in the classpath.
image
  1. Pack the jar again and restart both the Managed and Admin Server.
  2. Another way is to drop the CustomFaultHandling.jar in the <Middleware Home>\Oracle_SOA1\soa\modules\oracle.soa.ext_11.1.1 and run Ant on the build.xml file present in the directory.
Interestingly we also have an option in the EM console to retry all faults for a composite by setting some values in custom MBeans. They are available as Advanced BPEL properties in the SOA Infra engine.
The MBean that allows recovery of faulted instances is RecoveryConfig. It has two options for retrying
  1. RecurringScheduleConfig : A recovery window may be specified (most probably off peak hours) to recover all faulted instances. Messages being recovered can be throttled by limiting the number of messages picked up on each run by specifying the maxMessageRaiseSize.
  2. StartupScheduleConfig : With this setting on all faulted instances are automatically retried when the soa server is booted up for restart.
More details on how to use RecoverConfig Mbean can be found here
http://download.oracle.com/docs/cd/E14571_01/relnotes.1111/e10133/soa.htm#RNLIN1052
image
There is a small lacuna though here. It is not always possible to recover automatically. Auto-recovery is subject to some conditions.
Consider the scenarios below.
  1. Scenario A: The BPEL code uses a fault-policy and a fault is handled using the “ora-human-intervention” activity, then the fault is marked as Recoverable and the instance state is set to “Running”.
  2. Scenario B: The BPEL code uses a fault-policy and a fault is caught and re-thrown using the “ora-rethrow-fault” action, then the fault is marked as Recoverable and the instance state is set to “Faulted”; provided the fault is a recoverable one (like URL was not available).
In Scenario A, the Recoverable fault CANNOT be auto-recovered using the RecoveryConfig MBean.
In Scenario B, the Recoverable fault can be auto-recovered on server startup and/or pre-scheduled recovery.
All is not lost however. The instances can still be recovered from the console though. However for most practical purposes it isn’t desirable that a huge number of composite instances that are marked for recovery for a remote fault (say end point not available) are retried automatically. It is natural that we will yearn to automate this part as well.
Here is a sample code that gets all remote faults that are marked as recoverable from the Custom Java Class and retries them.
01package com.beatech.salapp;
02 
03import java.util.Hashtable;
04import java.util.List;
05 
06import javax.naming.Context;
07 
08import oracle.soa.management.facade.Fault;
09import oracle.soa.management.facade.FaultRecoveryActionTypeConstants;
10import oracle.soa.management.facade.Locator;
11import oracle.soa.management.facade.LocatorFactory;
12import oracle.soa.management.facade.bpel.BPELServiceEngine;
13import oracle.soa.management.util.FaultFilter;
14 
15public class FaultRecovery {
16 
17private Locator locator = null;
18private BPELServiceEngine mBPELServiceEngine;
19 
20public FaultRecovery() {
21locator = this.getLocator();
22try {
23mBPELServiceEngine =
24(BPELServiceEngine)locator.getServiceEngine(Locator.SE_BPEL);
25catch (Exception e) {
26e.printStackTrace();
27}
28}
29 
30public Hashtable getJndiProps() {
31Hashtable jndiProps = new Hashtable();
32jndiProps.put(Context.PROVIDER_URL,"t3://localhost:4003/soa-infra");
33jndiProps.put(Context.INITIAL_CONTEXT_FACTORY,"weblogic.jndi.WLInitialContextFactory");
34jndiProps.put(Context.SECURITY_PRINCIPAL, "weblogic");
35jndiProps.put(Context.SECURITY_CREDENTIALS, "welcome123");
36jndiProps.put("dedicated.connection""true");
37return jndiProps;
38}
39 
40public Locator getLocator() {
41 
42try {
43return LocatorFactory.createLocator(getJndiProps());
44catch (Exception e) {
45e.printStackTrace();
46}
47return null;
48}
49 
50public void recoverFaults() {
51try {
52System.out.println("Get All Recoverable Faults");
53/* Set Search Filters like composite Name, Instance Ids, fault Names etc.
54Here I am setting the setRevoverable filter to true to retrieve all Recoverable Faults.
55Also I am setting filter on faultName as i want to programatically retry all remote Faults for resubmission.
56*/
57FaultFilter filter = new FaultFilter();
58filter.setFaultName("{http://schemas.oracle.com/bpel/extension}remoteFault");
59filter.setRecoverable(true);
60 
61//Get faults using defined filter
62List<Fault> faultList = mBPELServiceEngine.getFaults(filter);
63System.out.println("=========================Recoverable Faults==================================================");
64for (Fault fault : faultList) {
65System.out.println("=============================================================================================");
66System.out.println("Composite DN: " + fault.getCompositeDN().getStringDN());
67System.out.println("Composite Instance ID: " + fault.getCompositeInstanceId());
68System.out.println("Component Name: " + fault.getComponentName());
69System.out.println("Component Instance ID: " + fault.getComponentInstanceId());
70System.out.println("Activity Name: " + fault.getLabel());
71System.out.println("Fault ID: " + fault.getId());
72System.out.println("Fault Name: " + fault.getName());
73System.out.println("Recoverable flag: " + fault.isRecoverable());
74System.out.println("Fault Message: " + fault.getMessage());
75}
76// Recovering all Recoverable Faults in a single call
77//mBPELServiceEngine.recoverFaults(faultList.toArray(new Fault[faultList.size()]), FaultRecoveryActionTypeConstants.ACTION_RETRY);
78 
79catch (Exception e) {
80e.printStackTrace();
81}
82 
83}
84 
85public static void main(String[] args) {
86FaultRecovery faultRecovery = new FaultRecovery();
87faultRecovery.recoverFaults();
88}
89}
Replace the values in the property map with the ones in your server. Remember to give the managed server port in the Provider URL. Run the Java Class and you would see that the recoverable faults are printed.
image
Verify the same from the console
image
Run the Java program again but this time uncomment the line below
1//mBPELServiceEngine.recoverFaults(faultList.toArray(new Fault[faultList.size()]), FaultRecoveryActionTypeConstants.ACTION_RETRY);
This will result in all faults marked with <strong>Recovery</strong> icon to be retried. So if the remote endpoint is responding and active now the processes will complete.
There are a host of other things that we can do in this Java Class.
Using the <strong>BPELServiceEngine</strong> object we can write messages to the BPEL audit trail, inspect the current activity, and read and update the values of BPEL variables.
The following code snippet if inserted in the code will replace any process variable with a new value before retrying. (May be used in case of Binding or Business Faults)
01//Get faults using defined filter
02List<Fault> faultList = mBPELServiceEngine.getFaults(filter);
03System.out.println("=========================Recoverable Faults ==================================================");
04for (Fault fault : faultList)
05{
06System.out.println("=========================Read Process Variables =========================================");
07//Get Process Instance variables from fault object
08String[] variables = mBPELServiceEngine.getVariableNames(fault);
09System.out.println("Process Instance Variables:");
10for (int i = 0; i < variables.length; i++)
11{
12System.out.println("Variable Name: " + variables[i]);
13}
14//Get Input Variable Data from the Activity, Modify it with a new value and recover
15System.out.println("=========================Replace Variable and Recover ====================================");
16System.out.println("Activity Input Variable Data:");
17String value =mBPELServiceEngine.getVariable(fault, "activityRequestMessage");
18System.out.println("Present value: " + value);
19value = value.replace("remoteFault""Modified Value");
20System.out.println("New value: " + value);
21mBPELServiceEngine.setVariable(fault,"activityRequestMessage",value);
22 
23// Recover Faults one by one
24mBPELServiceEngine.recoverFault(fault,FaultRecoveryActionTypeConstants.ACTION_RETRY,null);
25}
The following JARS would be required in the classpath for the above Java Code
image

<MWHOME>\oracle_common\modules\oracle.fabriccommon_11.1.1\fabric-common.jar
<MWHOME>\jdeveloper\soa\modules\oracle.soa.mgmt_11.1.1\soa-infra-mgmt.jar
<MWHOME>\wlserver_10.3\server\lib\weblogic.jar
<MWHOME>\jdeveloper\soa\modules\oracle.soa.fabric_11.1.1\oracle-soa-client-api.jar
<MWHOME>\oracle_common\webservices\wsclient_extended.jar
The  post discussed the different approaches and strategies for handling faults in a composite in SOA Suite. Let me conclude this article by describing a few best practices around Fault Handling.
Oracle SOA Suite Fault Handling Best Practices
  1. Create fault (catch block) for each partner link. For each partner link, have a catch block for all possible errors. Idea is not to let errors go to catchAll block.
  2. CatchAll should be kept for all errors that cannot be thought of during design time.
  3. Classify errors into various types – runtime, remote, binding, validation, Business errors etc.
  4. Notification should be setup in production, so that, errors are sent to concerned teams by E-Mail. Console need not be visited for finding out status of execution.
  5. Use Catch Block for non-partner link error.
  6. Every retry defined in fault policy causes a commit of the transaction. Dehydration will be reached and threads released.
  7. Automated recovery can be created by creating a fault table, persisting the queue and having an agent  to re-submit the job (For example writing a Timer agent to invoke the Java code we wrote to recover instances) . Can be achieved through scripts. Use only PUBLISHED API of ESB or QUEUE (AQ etc.) for re-submission. Another example would be to use WLST to change the RecoveryConfig MBean to configure recovery window to retry all faulted instances.
  8. Handle Rollback fault by providing ‘No Action’ in fault policy.
  9. Remember – Receive, OnMessage, On Alarm, Wait, CheckPoint (Activity for forcing Java thread to store its current state to Dehydration store) will cause storing of current state to dehydration store and threads will be released.
  10. Always use MDS to store fault policies and bindings to increase their reuse for multiple composites and projects.
All artifacts used for demonstration in this article can be found at this link.

No comments:

Post a Comment

xslt padding with characters call template for left pad and right pad

  Could a call-template be written that took two parameters ?   a string, and a   number) return the string with empty spaces appended t...