Thursday, 10 November 2011

Fault Management Framework by Example


The purpose of the Fault Management Framework is to provide error handling that is external to SOA and does not impact the SOA/BPEL design or runtime. The framework is implemented using policies defined in XML. These policies are reusable across composites/components and can catch both runtime and business faults. Once a fault is caught, the policy defines actions that can be used for the SOA instance such as retry, human intervention, replay scope, rethrow fault, abort, and custom Java actions. When human intervention comes into play, the Enterprise Manager provides a GUI for managing the faulted instance.
When the policies have been defined and bound to composites and/or components, the framework will intercept the fault before the standard fault handler comes into play. For example: if a BPEL process has defined standard BPEL fault handling and a fault policy has been defined/bound to the BPEL process, when a fault occurs the framework will intercept the fault allowing any of the supported actions to be applied to the instance:

The fault policy files are loaded at startup, so when any changes are made to them a server restart is required.  The location for the fault policy files can be in the same directory as the composite.xml or in a location identified by a property in the composite.xml:

<property name="oracle.composite.faultPolicyFile">

<property name="oracle.composite.faultBindingFile">

When using the property settings in the composite.xml, you can then use a different name for the files instead of the default.

Fault Policies (fault-policies.xml / fault-policies.xsd)

There are two XML policy files required to setup the Fault Management Framework in SOA, the first of which is the fault-policies.xml. This file contains one or more fault policy definitions, fault definitions (which can also include conditions), and actions:

NOTE: Pay close attention to the case for elements in the policy files. If you don't have an editor that enforces the schema, it's very easy to define an element like <action> instead of <Action>.

<faultPolicy> Element

In order to more easily manage all the possible faults an enterprise deployment will contain, you have the option of logically grouping your faults using multiple fault policies. Each policy is defined using the <faultPolicy> Element. Each policy definition must contain a unique policy id:
<faultPolicy version="0.0.1" id="FusionMidFaults"

<faultName> Element

Within the <faultPolicy> Element, you will define all the faults associated with the policy wrapped in a <Conditions> Element. Each policy name definition must contain a fault identified by the QName (e.g., bpelx:remoteFault) and an associated action “reference”. You can further refine the fault with an XPath expression to test for values (e.g., $fault.code="3220"):
         <action ref="ora-retry"/>

<Action> Element

Following the </Conditions> Element, you will define individual actions associated with the policy wrapped in an <Actions> Element. Each action definition must contain a unique action id and action specification:


      <Action id="ora-retry">
            <retryFailureAction ref="ora-terminate"/>

      <Action id="ora-terminate">


On thing you will notice in the product documentation is that the action ids use a certain nomenclature where everything is prefixed with “ora-”. It is a common misunderstanding that the ids are reserved, but in reality you can use any name you wish. It's the action specification elements that define what the action definition will do and the ids are used as “references”. For example: in the code snippet above the ora-retry contains a retryFailureAction with a reference to ora-terminate, another action definition.

Fault Bindings (fault-bindings.xml / fault-bindings.xsd)

The second policy file that is required by the Fault Management Framework is the fault-bindings.xml.  This policy file will bind (or map) policies defined in the fault-policies.xml file to levels within the composite.  These levels include:
  • Composite Application
  • Component
    • Reference
    • BPEL Process
    • Mediator

Composite Application Binding

When binding to a composite application, use the <composite> element with an attribute called faultPolicy.  The value of the faultPolicy attribute must match a policy id defined in the fault-policies.xml:
  <composite faultPolicy="FusionMidFaults"/>

Reference Binding

When binding to a reference, use the <reference> element with an attribute called faultPolicy.  The value of the faultPolicy attribute must match a policy id defined in the fault-policies.xml.  You will also need to specify a <name> or <portType> element:
  <reference faultPolicy="FusionMidFaults">
    <portType xmlns:credit="">credit:CreditRatingService</portType>

  <reference faultPolicy="FusionMidFaults">

BPEL Process Binding

When binding to a BPEL Process, use the <component> element with an attribute called faultPolicy.  The value of the faultPolicy attribute must match a policy id defined in the fault-policies.xml.  You will also need to specify a <name> element containing the name of the BPEL process:
  <component faultPolicy="FusionMidFaults">

Mediator Binding

When binding to a mediator, use the <component> element with an attribute called faultPolicy.  The value of the faultPolicy attribute must match a policy id defined in the fault-policies.xml.  You will also need to specify a <name> element containing the name of the meditor:
  <component faultPolicy="FusionMidFaults">

The Example (bpel-300-FaultHandlingFramework_rev1.0.jar)

I'm a big proponent of following up written text with a working example to help better understand what the text is trying to convey. The example I put together here will demonstrate all of the actions provided out of the box by the Fault Management Framework including custom Java and something I am calling throw vs. reply.
To use the example, save the somewhere on your file system and extract the .jar. Then in JDeveloper either create a new SOA project and then import the .jar using the “SOA Archive into SOA Project” option:

You should see something like the following:

At this point, open the fault-policies.xml and fault-binding.xml files and review the contents. You will see that there are multiple policies defined and those policies are bound to various levels within the composite application. Once you have reviewed the policy files, deploy the composite to a running SOA server. Then open EM and select the deployed composite and navigate to the Test page. You will see that there is only one value to provide and it's called faultAction:

To test the various scenarios, the following are the values you can provide for the faultAction:
  • ora-retry
  • ora-human-intervention
  • ora-terminate
  • ora-rethrow-fault
  • ora-replay-scope
  • ora-java
  • mediator
  • throw-vs-reply (see more details below)
  • reply-with-fault (see more details below)
After inputting one of the faultAction values mentioned above and pushing the “Test Web Service” button, review the instance by pushing the “Launch flow Trace” button. You will be able to examine the Trace and Audit Trails to see how the Fault Management Framework is behaving. Once you have a better feeling for what's going on, try changing the policies, redeploy the composite, restart the server, and run the test(s) again to see how your updates compare to what I provided.

throw-vs-reply / reply-with-fault

The throw-vs-reply and reply-with-fault faultAction requires a bit of explanation. I ran across a situation where a “poor design decision” caused a point of confusion with regard to the Fault Management Framework. The scenario that was causing the confusion was as follows: a BPEL process invokes another synchronous BPEL process. The second BPEL process contained an asynchronous flow by invoking a JMS adapter followed by a receive. Even though the BPEL process is defined as synchronous and the response back from the JMS adapter was almost immediate, the JMS adapter caused a dehydration thus a new thread picked up the JMS response to deliver the results. If the new thread were to throw a fault, the BPEL engine would handle it because the original thread was gone due to the dehydration (i.e., the correlation between the first thread and the Fault Management Framework was lost). Furthermore, the originating BPEL process “invoke” will timeout because there is no response or fault flowing to it. For this scenario we do have an option for regaining the correlation: instead of “throwing” an exception you can “reply” with the exception as the payload.
To simplify things with my example, I did not implement the JMS Adapter scenario. Instead, I used a sleep in the FaultGeneratorBPELProcess to force a dehydration which surfaces the same behavior. To see the “point of confusion”, provide throw-vs-reply for the faultAction value and see the timeout exception happen. To see the “solution/expected behavior”, provide reply-with-fault for the faultAction value.  I would also recommend looking at the logic in the FaultGeneratorBPELProcess to see what it's doing with regard to throw-vs-reply and reply-with-fault.
Hopefully this provides some valuable insight into the Fault Management Framework and it's capabilities.

No comments:

Post a Comment