Monday, November 16, 2015

How to write BREX context rules (part 1)

This is the first in a I-do-not-know-how-many posts on how to write BREX context rules. Hopefully, I can provide guidance on what some may consider the voodoo magic of writing context rules, and show that you do not have to be an XPath wizard to write effective context rules.

BREX context rules provide the ability to precisely define what structures are allowed and not allowed in your project's data. This is achieved by using XPath expressions. XPath expressions can get quite complex, but in the majority of the cases, you only need to utilize a basic subset of XPath.

A key benefit to context rules is they can be validated in an automated fashion with the use of a BREX validation tool (aka BREX checker). Therefore, if you have a business rule that is applicable to the XML structure of your data, it is best to express it as a context rule to minimize the need for manual verification of the rule.

The <contextRules> element

In a BREX data module, context rules are specifed under the <contextRules> element:

  <dmodule>
    <content>
      <brex>
        <contextRules>
          ...
        </contextRules>
      </brex>
    </content>
  </dmodule>

The <contextRules> element is repeatable. As the element name implies, you can specify a specific context for set of rules. For example, if you have a set rules that are only applicable for procedural data modules and another set of rules only applicable for fault data modules, you can have the following:

  <contextRules
      rulesContext="http://www.s1000d.org/S1000D_4-1/xml_schema_flat/proced.xsd">
  ...
  </contextRules>
  <contextRules
      rulesContext="http://www.s1000d.org/S1000D_4-1/xml_schema_flat/fault.xsd">
  ...
  </contextRules>

The rulesContext attribute value must be a schema URI. BREX checkers examine the value to determine when the set of rules should be verified against a data module.

When the rulesContext attribute is not specified, then then rules apply to all S1000D CSDB object types:

  <contextRules>
    <!-- rules for all object types -->
  </contextRules>

Limitations of the rulesContext attribute

Unfortunately, the rulesContext is very limited, and can be impractical to use. Reasons:

  • A data module's schema URI must match exactly the schema URI specified in the rulesContext attribute for the rules to be applied. If you have been a good IETP author and have been using the standard schema URIs, this is not a problem. However, I have encountered data where schema URIs refer to local pathnames (e.g. "file://C:/data/...") (apparently there are still folks that do not utilize XML Catalogs).

  • Only a single URI can be specified in the rulesContext attribute. This can be very limiting if you have rules that may be applicable to more than one schema type, or applicable to different issues of a given schema type. For example, if you have a rule that is applicable to procedural and fault data modules, you would need to list that same rule twice, once for the procedural context and once for the fault context.

    Another case is if you have a publication with mixed-Issue data modules (e.g. 4.0, 4.0.2, 4.1, 4.1.A, etc) and you have rules that are applicable to both.

Note: I have considered submitting a CPF to allow for multiple schema URIs to be specified for rulesContext. Since no one else has bothered to submit one in all this time, it is likely the attribute is rarely used (or used in very limited contexts) since it is more common to define XPath expressions to include whatever context is needed (see below).

Alternative to the rulesContext attribute

IMO, the limitations of rulesContext are pretty severe. Fortunately, with the use of XPath, we can achieve context-based rules without the need of using the rulesContext attribute. I will go into more depth on how to write structure object rules in a future post, but the following should give you a basic idea of how a rule can be constructed so it only applies to a data module of a given type:

  <structureObjectRule>
    <objectPath allowedObjectFlag="0">/dmodule/content/
      illustratedPartsCatalog//partNumber[not(@id)]</objectPath>
    <objectUse>Part numbers in IPDs must have an authored ID.</objectUse>
  </structureObjectRule>

The rule in the example is somewhat bogus, but it illustrates how I was able to contextualize the rule to only IPDs by leveraging the XML schema structure of IPDs. Here is a brief explanation of the expression:

  • The start of the expression, "/dmodule/content/illustratedPartsCatalog" restricts any matches to under the <illustratedPartsCatalog> element. Since no other schema type allows such a structure, I have effectively limited the application of the rule to IPDs.

  • The next part is the expression "//partNumber", which indicates any <partNumber> elements under <illustratedPartsCatalog>, at any depth. The XPath expression "//" indicates descendants of any depth.

  • The "[not(@id)]" says to only match <partNumber> elements with no id attribute.

With the designation of allowedObjectFlag="0" for the rule, any match of the above expression is not allowed. Therefore, if the expression does match a node in a data module, that DM will not pass validation.

Equivalent rulesContext-based rule

An equivalent rule that relies on the use of <contextRules>'s ruleContext attribute would be the following:

  <contextRules
      rulesContext="http://www.s1000d.org/S1000D_4-1/xml_schema_flat/ipd.xsd">
    <structureObjectRuleGroup>
      <structureObjectRule>
        <objectPath allowedObjectFlag="0">//partNumber[not(@id)]</objectPath>
        <objectUse>Part numbers in IPDs must have an authored ID.</objectUse>
      </structureObjectRule>
    </structureObjectRuleGroup>
  </contextRules>

Notice how the XPath expression is briefer since I no longer have to establish a context within the expression itself. With the ruleContext attribute setting, I know that the rule will only be evaluated on Issue 4.1 IPD data modules.

You may be satisfied with using the ruleContext version if you know all your data modules will use well defined schema URIs and you have little to no occurrances of rules that are applicable to multiple schema types. However, IMO, it is better practice to author rules that are less susceptible to external changes that could make the rule "stale", especially when the staleness may go unnoticed. For example, if you change the schema URIs of your modules, the existing rules will no longer be applied by the BREX checker. This can go unnoticed, leading to potential false validation passes of data.

Next

XPath Primer

No comments:

Post a Comment