Zoegond's notes: XPath

Wednesday, June 20, 2012

XPath

/ absolute path separator - /AAA/BBB selects BBB (not AAA)
// relative path separator - //AAA/BBB selects BBB (not AAA)

( ie, / means what it does in file paths, whereas // means sort of /.*/ with any number of hierarchy levels intervening )

( Starting the path with "/" means the path starts at the root node of the document. Even if you supplied a different node to document.evaluate as the context node, / still has this meaning. To search under a context node, which is usually why you supplied one in the first place, start the path with .// , ie "at any position among the descendants of the context node")

.. as in file paths - //AAA/BBB/.. selects AAA (not BBB)

* all matching elements (at one level only)
[n] nth element at that level (1+) - also last()

[otherelement] select elements that have otherelement as a child

@attr select an attribute (not the tag it belongs to)
[@attr] select an element that has an attribute
attr can be *. not() negates, so [not(@*)] = an element without any attrs

[@attr='...'] select an element that has attr set to a given value

[expression] eg saying name()='BBB' is the same as //BBB

text() select all text node children - can be subscripted etc (use a predicate with string(.) or string() to select on the content of the text, eg [contains(string(.),'dog')]

Axes:
/AAA = /child::AAA (ie, child:: is the default axis)
/AAA/BBB = /child::AAA/child::BBB
//descendant::* = all elements below the specified level
//parent::* = all parents
//parent::DDD = all DDDs which are parents
ancestor::
following-sibling::, preceding-sibling::
//A/B/following-sibling:: = all following siblings of a B which has an A parent
following::, preceding::
descendant-of-self::, ancestor-of-self:: ('and' not 'or')
self::

Other functions:
normalize_space() = ltrim(rtrim())
starts_with(name(), 'BBB')
contains(name(), 'x')
string-length() remember < and > represented as entities
floor(), ceiling()

Beware using unfamiliar functions looked up on net, they may be from XPath 2 which browsers don't implement. Useful list of functions which do work in XPath 1 here (Appendix C) http://www.w3.org/TR/xpath-functions/#xpath1-compatibility

Other operators
div, mod

String value: I'm being told that nodes have a string-value, which for purposes of parsing HTML is all the text in themselves and their descendant nodes. This means that . in an XPath expression represents the text in a node, so you can specify things like //span[contains(.,'Dalek')] to pull out all spans containing the string "Dalek".

Zoegond's notes

Quick

Wednesday, June 20, 2012

XPath

Followers

Blog Archive