Tuesday, November 24, 2015

How to write BREX context rules (part 2): XPath Primer

In Part 1, I provided a basic introduction to writing BREX context rules and the <contextRules> element. In the second part of this series, I will provide a brief primer to XPath expressions. A complete guide to writing XPath expressions is beyond the scope of this post, but you will need a basic understanding of XPath to get started in writing structured rules for your project. After reading this post, I recommended reviewing any of the numerous XPath tutorials on the web.

What is XPath?

XPath provides a syntax for identifying parts (formally known as nodes) of an XML document. An XML document intrinsically defines a tree structure, similar to how files on a file system are organized. For example, see how the location of the folder "program" is identified at the top of Windows file explorer:

The absolute location is "C:\Program Files\LibreOffice 5\program". With XPath, we identify XML elements in a similar manner, but we use forward slashes, "/", instead of backslashes. For example, say we have the following XML document structure:

  <dmodule>
    <content>
      <brex> <!-- We want to identify this element here -->
      ...
    </content>
  </dmodule>

We can identify the <brex> element with the following XPath expression:

/dmodule/content/brex

Unlike a file system, an XML document can have the items of the same name at the same level. For example:

  <dmodule>
    <content>
      <brex>
        <contextRules>
          <structureObjectRuleGroup>
            <structureObjectRule>... <-- We want this one -->
            <structureObjectRule>... <-- Not this one >
            ...
          </structureObjectRuleGroup>
        </contextRules>
      </brex>
    </content>
  </dmodule>

If we use the XPath expression,

/dmodule/content/brex/contextRules/structureObjectRuleGroup/structureObjectRule

we are actually identifying all <structureObjectRule> elements under <structureObjectRuleGroup>. If we only want the first <structureObjectRule> element, we do the following:

/dmodule/content/brex/contextRules/structureObjectRuleGroup/
    structureObjectRule[1]

Technically, we may still identify more than one <structureObjectRule> element. If you are familiar with the BREX schema—note, this applies to any XML document, just using BREX type as an example— <contextRules> and <structureObjectRuleGroup> are repeatable. So if we take,

/dmodule/content/brex/contextRules/structureObjectRuleGroup/
    structureObjectRule[1]

and apply it to the following XML document:

  <dmodule>
    <content>
      <brex>
        <contextRules>
          <structureObjectRuleGroup>
            <structureObjectRule>... <-- MATCH -->
            ...
          </structureObjectRuleGroup>
        </contextRules>
        <contextRules>
          <structureObjectRuleGroup>
            <structureObjectRule>... <-- MATCH -->
            ...
          </structureObjectRuleGroup>
        </contextRules>
      </brex>
    </content>
  </dmodule>

We will have identified two <structureObjectRule> elements. If we want to only identify the very first <structureObjectRule> element in the document, we use the following:

/dmodule/content/brex/contextRules[1]/
    structureObjectRuleGroup[1]/structureObjectRule[1]

Identifying by ID

If your XML documents contain IDs, using them to identify elements is much easier than using full paths. Take the following for example:

  <dmodule>
    <content>
      <brex>
        <contextRules>
          <structureObjectRuleGroup>
            <structureObjectRule id="SOR-001">... <-- We want this one -->
            <structureObjectRule>...
            ...
          </structureObjectRuleGroup>
        </contextRules>
      </brex>
    </content>
  </dmodule>

The element we want to identify can be expressed as follows:

//structureObjectRule[@id="SOR-001"]

The expression contains some components that need further explanation:

//

This is a shorthand notation indicate any decendant node. Since it is at the start of the expression, it indicates any node within the document.

[@id="SOR-001"]

The "[]" represents a conditional expression on the node that precedes it. In this case, the node that proceeds it is structureObjectRule. In order for a structureObjectRule to match the expression, the expression inside the []'s must evaluate to a true value.

In our example, the conditional expression,

@id="SOR-001"

is only true if the attribute named "id" has the value "SOR-001". In XPath, to distinguish an element name from an attribute name, attribute names are prefixed with the '@ character, hence the use of "@id". If we left out the '@', the name "id" would have been interpreted as the name of a child element.

Identifying by any attribute

You are not limited to ID attributes for identifying elements in an XML document. For example, if I wanted to identify all elements marked as deleted, I can use the following:

//*[@changeType="delete"]

The special character "*" will match any element, but the attribute test condition limits the matching to only those elements that have the changeType set to "delete".

More Information

More complete tutorials on XPath can be found by searching the web.

Next

Learning by Example

Monday, November 16, 2015

How to write BREX context rules (part 1)

This is the first in a I-do-not-know-how-many posts on how to write BREX context rules. Hopefully, I can provide guidance on what some may consider the voodoo magic of writing context rules, and show that you do not have to be an XPath wizard to write effective context rules.

BREX context rules provide the ability to precisely define what structures are allowed and not allowed in your project's data. This is achieved by using XPath expressions. XPath expressions can get quite complex, but in the majority of the cases, you only need to utilize a basic subset of XPath.

A key benefit to context rules is they can be validated in an automated fashion with the use of a BREX validation tool (aka BREX checker). Therefore, if you have a business rule that is applicable to the XML structure of your data, it is best to express it as a context rule to minimize the need for manual verification of the rule.

The <contextRules> element

In a BREX data module, context rules are specifed under the <contextRules> element:

  <dmodule>
    <content>
      <brex>
        <contextRules>
          ...
        </contextRules>
      </brex>
    </content>
  </dmodule>

The <contextRules> element is repeatable. As the element name implies, you can specify a specific context for set of rules. For example, if you have a set rules that are only applicable for procedural data modules and another set of rules only applicable for fault data modules, you can have the following:

  <contextRules
      rulesContext="http://www.s1000d.org/S1000D_4-1/xml_schema_flat/proced.xsd">
  ...
  </contextRules>
  <contextRules
      rulesContext="http://www.s1000d.org/S1000D_4-1/xml_schema_flat/fault.xsd">
  ...
  </contextRules>

The rulesContext attribute value must be a schema URI. BREX checkers examine the value to determine when the set of rules should be verified against a data module.

When the rulesContext attribute is not specified, then then rules apply to all S1000D CSDB object types:

  <contextRules>
    <!-- rules for all object types -->
  </contextRules>

Limitations of the rulesContext attribute

Unfortunately, the rulesContext is very limited, and can be impractical to use. Reasons:

  • A data module's schema URI must match exactly the schema URI specified in the rulesContext attribute for the rules to be applied. If you have been a good IETP author and have been using the standard schema URIs, this is not a problem. However, I have encountered data where schema URIs refer to local pathnames (e.g. "file://C:/data/...") (apparently there are still folks that do not utilize XML Catalogs).

  • Only a single URI can be specified in the rulesContext attribute. This can be very limiting if you have rules that may be applicable to more than one schema type, or applicable to different issues of a given schema type. For example, if you have a rule that is applicable to procedural and fault data modules, you would need to list that same rule twice, once for the procedural context and once for the fault context.

    Another case is if you have a publication with mixed-Issue data modules (e.g. 4.0, 4.0.2, 4.1, 4.1.A, etc) and you have rules that are applicable to both.

Note: I have considered submitting a CPF to allow for multiple schema URIs to be specified for rulesContext. Since no one else has bothered to submit one in all this time, it is likely the attribute is rarely used (or used in very limited contexts) since it is more common to define XPath expressions to include whatever context is needed (see below).

Alternative to the rulesContext attribute

IMO, the limitations of rulesContext are pretty severe. Fortunately, with the use of XPath, we can achieve context-based rules without the need of using the rulesContext attribute. I will go into more depth on how to write structure object rules in a future post, but the following should give you a basic idea of how a rule can be constructed so it only applies to a data module of a given type:

  <structureObjectRule>
    <objectPath allowedObjectFlag="0">/dmodule/content/
      illustratedPartsCatalog//partNumber[not(@id)]</objectPath>
    <objectUse>Part numbers in IPDs must have an authored ID.</objectUse>
  </structureObjectRule>

The rule in the example is somewhat bogus, but it illustrates how I was able to contextualize the rule to only IPDs by leveraging the XML schema structure of IPDs. Here is a brief explanation of the expression:

  • The start of the expression, "/dmodule/content/illustratedPartsCatalog" restricts any matches to under the <illustratedPartsCatalog> element. Since no other schema type allows such a structure, I have effectively limited the application of the rule to IPDs.

  • The next part is the expression "//partNumber", which indicates any <partNumber> elements under <illustratedPartsCatalog>, at any depth. The XPath expression "//" indicates descendants of any depth.

  • The "[not(@id)]" says to only match <partNumber> elements with no id attribute.

With the designation of allowedObjectFlag="0" for the rule, any match of the above expression is not allowed. Therefore, if the expression does match a node in a data module, that DM will not pass validation.

Equivalent rulesContext-based rule

An equivalent rule that relies on the use of <contextRules>'s ruleContext attribute would be the following:

  <contextRules
      rulesContext="http://www.s1000d.org/S1000D_4-1/xml_schema_flat/ipd.xsd">
    <structureObjectRuleGroup>
      <structureObjectRule>
        <objectPath allowedObjectFlag="0">//partNumber[not(@id)]</objectPath>
        <objectUse>Part numbers in IPDs must have an authored ID.</objectUse>
      </structureObjectRule>
    </structureObjectRuleGroup>
  </contextRules>

Notice how the XPath expression is briefer since I no longer have to establish a context within the expression itself. With the ruleContext attribute setting, I know that the rule will only be evaluated on Issue 4.1 IPD data modules.

You may be satisfied with using the ruleContext version if you know all your data modules will use well defined schema URIs and you have little to no occurrances of rules that are applicable to multiple schema types. However, IMO, it is better practice to author rules that are less susceptible to external changes that could make the rule "stale", especially when the staleness may go unnoticed. For example, if you change the schema URIs of your modules, the existing rules will no longer be applied by the BREX checker. This can go unnoticed, leading to potential false validation passes of data.

Next

XPath Primer

Monday, November 9, 2015

What is the difference between brDoc and BREX?

At the 2015 S1000D User Forum, information was provided about the new schema type, brDoc, coming to Issue 4.2. One of the audience member asked what is the difference between brDoc and the BREX. I thought the answer the presenter gave was not very clear. I do not recall specifically what the presenter said, but I could tell the audience member was still confused.

During a break, I approached the audience member and said something like the following:

brDoc explains the "Why" of your business rules while the BREX provides the "How".

The BREX DM mainly lists out the rules your data was authored against, but may not provide the reasoning behind the decisions made in establishing the rule. For example, you may have a BREX rule stating, "All procedural steps must have an authored ID," which can be codified in a <structureObjectRule>. The brDoc DM would provide the reasons the rule exists, which may be due to cross-referencing requirements for the project.

In Issue 4.2, the BREX schema has been updated so for a given rule, you can reference back to the brDoc data module that provides reasoning behind the rule. This linking is done with BR decision identifiers, which are specified in the brDoc data module via <brDecision> nodes. In the BREX, you reference the identifiers via <brDecisionRef> nodes.

For example, in the brDoc DM, you may have something like the following:

...
  <brDecision brDecisionIdentNumber="MYPROJ-BRD-00001">
    ...
    <brDecisionExplanation>
      <para>Identifiers on procedural steps are required so specific
        steps can easily be identified in the viewing system when
        the maintainer submits a problem report on a procedure and/or
        specific steps of a procedure. </para>
    </brDecisionExplanation>
  </brDecision>
...

And in your BREX, you have something like the following to refer back to the decision documented in the brDoc data module:

...
  <structureObjectRule...>
    <brDecisionref brDecisionIdentNumber="MYPROJ-BRD-00001"/>
    <objectPath allowedObjectFlag="0">//proceduralStep[not(@id)]</objectPath>
    <objectUse>All procedural steps must have an authored ID.</objectUse>
  </structureObjectRule>
...

Friday, November 6, 2015

How not to use applicability

While attending the S1000D User Forum (UF), I sat in on the following presentation, Creating Applicability Statements that Work for the CCT.

I highly recommend to not following the guidance in this UF presentation regarding service bulletins:

  • It is mixing technical data with product configuration. The ACT's and CCT's roles are to provide the definitions of your product attributes and conditions. The PCT is designed to contain the actual instances of those attributes. Take the English dictionary as an analogy. The dictionary provides the definitions of the words. The actual instances of those words are in separate books, essays, and other English-based writings. What the UF presentation is advocating that for specific, special words, we should include all the various writings that use those words in the dictionary itself.

    The PCT is the only module type designed to capture product configuration. It exists to complete the XML-based model of applicability, but in practice, the PCT either is not used (it's use is optional), is stubbed just to contain the primary keys (to faciliate product selection in a viewer), or is dynamically generated from the product configuration database, not the CSDB.

    For example, for NAVAIR, we have one program that only stores aircraft identifiers in the PCT (called "BUNOs"). When the viewer is launched, the configuration database is queried for the current state of all the other attributes, including service bulletins (NAVAIR calls them Tech Directives) and passes the values to the viewer to initialize applicability state table.

  • PCT-based filtering does NOT require you to assert against the primary product key everytime. This is a strawman the UF presentation argues to justify the model it is advocating. When authoring applicability, you only need to assert on those attributes that are relevant. For example, if a given procedural step is only applicable if a given service bulleting is incorporated, all you need to do is the following:

    <applic id="a001">
      <assert applicPropertyIdent="SB-001"
        applicPropertyType="condition" applicPropertyValues="PRE"/>
    </applic>

    There is nothing that requires you to always have assertion against the product key. If a step is only dependent on certain conditions, you only need to assert against those conditions. Only assert on the entire product if a step is only applicable for entire product instances.

    When maintenance is started, normally the product being worked on would have been selected at the start of maintenance. Therefore, all the production configuration attributes and conditions (represented in the PCT--either statically or dynamically--see above), will have been set into the state table, including the incorporation status of SB-001 for the product.

  • The UF presentation increases the complexity of the viewing system in assigning values into the state table. Instead of just reading all the attribute values from the selected product from the PCT, the viewer now has to parse the <incorporation> section of the CCT, follow several extra ID references, and apply applicability filtering to determine SB values. This introduces unnecessary complexity in the viewing implementation when the same effect can be achieved with the simplier PCT-based model.

  • If you make the assumption that UF presentation regarding service bulletins is the better model, then why not use the model for all attributes? From a data modeling perspective, there is nothing special about service bulletin incorporation state that justifies it being handled differently than any other property (product attribute and/or condition) associated with a product. The UF presentation is basically advocating that for these set of properties, use a complicated way to assign their values, but for all other properties, assign them in the PCT.

The danger of that UF presentation is the presenter was very animated that this is the way you should do service bulletins, and for those in the audience that were new to S1000D, and less knowledgable of how applicability works, they can be lead down an operational path that is more complex and more expensive. Because of this, I voiced my concerns during the Q&A period.

Afterwards, I had a gentlemen come to me and thank me I did say something. He was fairly new to S1000D, and right after the presentation, he worried that the way they were going to use applicability was wrong, because a supposed expert was vocal in saying what was the right way and what was the wrong way. After I voiced my objections, he felt much better that he and his program made valid choices on how to use applicability.

Sometime later, I was informed this service bulletin applicability model was pushed for by Civil Aviation, hence, changes to the S1000D specification to support it. The push for the change came from a large aircraft provider to reflect how they managed their techdata and product configuration over the years. So, instead of the company changing their operations to use a better model, they basically pushed in changes into S1000D so they can continue as business as usual and state they are S1000D conformant.

Thursday, November 5, 2015

Issue 4.1 Applicability

My colleague did an admirable job in providing a presentation on Issue 4.1 Applicability at the S1000D User Forum in San Diego. The presentation is available from the s1000d.org site.

We got the idea of doing the presentation during our work in updating NSIV, an S1000D viewer, to support the applicability features in Issue 4.1. Although Issue 4.1 has been published for some time now, we encountered problems in the specification related to 4.1 applicability, which could lead to author confusion and ambiguous implementations of the model.

One thing we unfortunately left out of the presentation was the new externalized applicability capability. Applicability annotations can now be stored in a CIR (common information repository) DM (data module) where they can be referenced in other DMs. With only 30 minutes provided for the presentation, it was hard enough to squeeze the material that was included in the time allotted.

Along with the creation of the presentation for the user forum, we submitted several CPFs (Change Proposal Forms) through the USSMG to "clean up" the new features of the 4.1 applicability model. It was too late to see any changes incorporated into the upcoming Issue 4.2 release, but we hope our proposed changes will be incorporated in the release after that.

I am interested in finding out who else has implemented the 4.1 applicability model in their S1000D processing and rendering tools. I have doubts that those using any 4.1 features have software that supports the entire model. If others have implemented a complete model, I would have expected CPFs to have already been submitting to address the problems I encountered during my fairly complete implementation of the 4.1 model.

The applicability related CPFs my colleage and I have submitted are as follows:

  • Clarification is needed on Filtering Rules for External references vs Alias attributes
  • Clarification is needed regarding a ‘non string’ type product and/or condition attribute – that it may provide an enumeration label. In addition, there currently is no way to ensure applicability property values conform to its data-typing
  • Formal definitions are needed for applicability attribute value types: Boolean, Integer, and Real.
  • Clarification is needed on the use of external reference applicability attributes and alias attributes within the Applicability cross-reference table catalog.
  • If a Pub Module or Data Module contains a computable branch of an applicability annotation, it must also include a reference to the Applicability Cross-reference Table (ACT) data module <applicCrossRefTableRef>.
  • Clarification is needed on the fact that multiple associations can be made for alias attributes, but only one can be made for external attributes.
  • There is a conflict between chapter text and the schema. The element “Applicability Reference” <applicRef> supports externalized applicability annotations. Specification text identifies its child element <dmRef> as mandatory, yet schema has as optional.
  • Clarification is needed on the defining of a product or condition attribute marked as an alias.

First Post

This blog is an attempt to capture my experiences of working with S1000D, primarily from the perspective of a software developer. In my 20+ years in developing software, the past 7 years has mostly been dealing with S1000D and the development of an Interactive Electronic Technical Publication (IETP) viewer supporting publications authored in S1000D. The viewer is called NSIV, which currently stands for NAVAIR Standard IETM Viewer. The use of "IETM" is an older term still used by NAVAIR that stands for Interactive Electronic Technical Manual.

During my time working on the viewer, I have also been involved with the development of the S1000D specifications itself by attending and participating in various working group meetings. My current areas of expertise are the S1000D applicability model and BREX (Business Rules Exchange). I have implemented the applicability model in NSIV and developed a BREX validation tool for NAVAIR.

One thing I have noticed during my observations and participation in the evolution of the S1000D specification is an inadequate representation of individuals with direct, on-hands experience in developing S1000D-aware software tools. Yes, there are folks from companies that have developed S1000D software, but those representatives tend to me more on the data-side and not directly involved in the software side. This deficiency ends up getting reflected in the specification. I will likely expand more on this is later posts.