In Part
1, I provided a basic introduction to writing BREX context rules and
the <contextRules>
element. In the second part of
this series, I will provide a brief primer to XPath expressions.
A complete guide to writing XPath expressions is beyond the scope
of this post, but you will need a basic understanding of XPath to
get started in writing structured rules for your project.
After reading this post, I recommended reviewing any of the numerous
XPath tutorials on the web.
What is XPath?
XPath provides a syntax for identifying parts (formally known as
nodes) of an XML document.
An XML document intrinsically defines a tree structure, similar to how
files on a file system are organized. For example,
see how the location of the folder "program
" is
identified at the top of Windows file explorer:
The absolute location is
"C:\Program Files\LibreOffice 5\program"
.
With XPath, we identify XML elements in a similar manner, but
we use forward slashes, "/
", instead of backslashes.
For example, say we have the following XML document structure:
<dmodule> <content> <brex> <!-- We want to identify this element here --> ... </content> </dmodule>
We can identify the <brex>
element with the
following XPath expression:
/dmodule/content/brex
Unlike a file system, an XML document can have the items of the same name at the same level. For example:
<dmodule> <content> <brex> <contextRules> <structureObjectRuleGroup> <structureObjectRule>... <-- We want this one --> <structureObjectRule>... <-- Not this one > ... </structureObjectRuleGroup> </contextRules> </brex> </content> </dmodule>
If we use the XPath expression,
/dmodule/content/brex/contextRules/structureObjectRuleGroup/structureObjectRule
we are actually identifying all
<structureObjectRule>
elements under
<structureObjectRuleGroup>
. If we only want the
first <structureObjectRule>
element, we do the
following:
/dmodule/content/brex/contextRules/structureObjectRuleGroup/ structureObjectRule[1]
Technically, we may still identify more than one
<structureObjectRule>
element.
If you are familiar with the BREX schema—note, this applies to any
XML document, just using BREX type as an example—
<contextRules>
and
<structureObjectRuleGroup>
are repeatable. So
if we take,
/dmodule/content/brex/contextRules/structureObjectRuleGroup/ structureObjectRule[1]
and apply it to the following XML document:
<dmodule> <content> <brex> <contextRules> <structureObjectRuleGroup> <structureObjectRule>... <-- MATCH --> ... </structureObjectRuleGroup> </contextRules> <contextRules> <structureObjectRuleGroup> <structureObjectRule>... <-- MATCH --> ... </structureObjectRuleGroup> </contextRules> </brex> </content> </dmodule>
We will have identified two
<structureObjectRule>
elements. If we want to only
identify the very first <structureObjectRule>
element
in the document, we use the following:
/dmodule/content/brex/contextRules[1]/ structureObjectRuleGroup[1]/structureObjectRule[1]
Identifying by ID
If your XML documents contain IDs, using them to identify elements is much easier than using full paths. Take the following for example:
<dmodule> <content> <brex> <contextRules> <structureObjectRuleGroup> <structureObjectRule id="SOR-001">... <-- We want this one --> <structureObjectRule>... ... </structureObjectRuleGroup> </contextRules> </brex> </content> </dmodule>
The element we want to identify can be expressed as follows:
//structureObjectRule[@id="SOR-001"]
The expression contains some components that need further explanation:
//
This is a shorthand notation indicate any decendant node. Since it is at the start of the expression, it indicates any node within the document.
[@id="SOR-001"]
The "
[]
" represents a conditional expression on the node that precedes it. In this case, the node that proceeds it isstructureObjectRule
. In order for astructureObjectRule
to match the expression, the expression inside the[]
's must evaluate to a true value.In our example, the conditional expression,
@id="SOR-001"
is only true if the attribute named "
id
" has the value "SOR-001
". In XPath, to distinguish an element name from an attribute name, attribute names are prefixed with the '@
character, hence the use of "@id
". If we left out the '@
', the name "id
" would have been interpreted as the name of a child element.
Identifying by any attribute
You are not limited to ID attributes for identifying elements in an XML document. For example, if I wanted to identify all elements marked as deleted, I can use the following:
//*[@changeType="delete"]
The special character "*
" will match any element, but
the attribute test condition limits the matching to only those elements
that have the changeType
set to "delete
".
More Information
More complete tutorials on XPath can be found by searching the web.
No comments:
Post a Comment