Using XPath to Select Nodes

XPath is a special query language the is specifically used for selecting  nodes in an XML document. With these language, you don’t have to search for the  entire tree of the XML nodes. You will learn the basics of this language and  apply it to a program. The two methods used for selecting nodes using the XPath  language are the XmlNode.SelectNodes() and the XmlNode.SelectSingleNode(). The SelectNodes() method returns a XmlNodeList which contains all the nodes that match the XPath string. Consider the following XML document.

<?xml version="1.0" encoding="utf-8" ?>
<Persons>
  <Person name="John Smith">
    <Age>30</Age>
    <Gender>Male</Gender>
  </Person>
  <Person name="Mike Folley">
    <Age>25</Age>
    <Gender>Male</Gender>
  </Person>
  <Person name="Lisa Carter">
    <Age>22</Age>
    <Gender>Female</Gender>
  </Person>
  <Person name="Jerry Frost">
    <Age>27</Age>
    <Gender>Male</Gender>
  </Person>
  <Person name="Adam Wong">
    <Age>35</Age>
    <Gender>Male</Gender>
  </Person>
</Persons>

Figure 1

Suppose you want to get the age of every person, the code for doing that is:

XmlDocument document = new XmlDocument();
document.Load("Persons.xml");

XmlNodeList nodes = document.DocumentElement.SelectNodes("/Persons/Person/Age");

foreach(XmlNode node in nodes)
{
    textBoxResult.Text += node.InnerText + "\r\n";
}

After loading the document, we used the DocumentElement property which is of type XmlNode.  We used the SelectNodes() method which accepts a  string argument that contains the XPath query. The XPath query /Persons/Person/Age tells that get Age element  that is a child of a Person element which is a child of the Persons element. All  the matching nodes will then be returned as an XmlNodeList. We used a foreach loop to  print each age in a text box. Figure 2 shows you some XPath operations that you  can use to query specific nodes.

 XPath Query  Description
 .  Selects the current node.
 ..  Selects the parent of the current node.
 *  Selects all the child of the current node.
 nodename  Selects all child nodes specified by the name.
 /  Selects the root node.
 //  Selects nodes from the current node that match the selection
expression no matter where they are.
 //*  Selects all elements in the document.
 /element  Selects the root element named element.   Starting a path
with a / means you are using an absolute path to an element.
 /element/*  Selects all the child of the root element.
 element/*  Selects all the child nodes of a child element.
 element/child  Selects the child elements which are a child of a specified   child
the element of the current node.
 //element  Selects all elements with the specified name regardless of
where they are in the document.
 element//child  Selects all child elements of the parent regardless of
where they are inside the parent element.
 @attribute  Selects an attribute of the current node where attribute
is the name of the attribute.
 //@attribute  Selects all the attributes specified by its name regardless of
where they are in the document.
 @*  Selects all attribute of the current node.
 element[i]  Selects an element with the specified element name and the   specified index.
 text()  Selects the text of all the child nodes of the current element.
 //text()  Selects text of every element in the document.
 //element/text()  Selects the text of all the matching elements.
 //element[name=’value’]  Selects all elements with a child containing a specified value.
 //element[@att=’value’]  Selects all elements with the specified attribute having the   specified value.

Figure 2 – XPath Operations

For example, if you want to select the current node, then you will use the . operator.

XmlNode current = document.DocumentElement.SelectSingleNode(".");

Notice that we used the XmlNode.SelectSingleNode()  method to only select 1 node. In case of multiple results, it will return the  first matching node. The method accepted the XPath string as its argument.  Passing “.” will result on returning the actual node that calls it. Also notice that the returned value is an XmlNode and not an XmlNodeList.

Suppose you want to select all the Person elements which are children of the  Persons node. You can use the following XPath query.

XmlNodeList personNodes = 
 document.DocumentElement.SelectNodes("/Persons/Person");

Notice that we started the query with a slash (/).  This indicates that we are using an absolute path. We start fromt he Persons  root node, then we look at all the Person nodes which is a direct child of the  root node. If we are to get every age of each person, then we can use the  following code.

XmlNodeList personNodes = 
 document.DocumentElement.SelectNodes("/Persons/Person/Age");

You can also use a relative path where the searching starts from the current  node. For example, you can query all the Person  node from the DocumentElement root node using the  following code:

XmlNodeList personNodes = 
 document.DocumentElement.SelectNodes("Person");

or the age of every person using the following code:

XmlNodeList personNodes = 
 document.DocumentElement.SelectNodes("Person/Age");

Notice that we didn’t precede the XPath query with a /  to indicate that we are using a relative path.

We can also query nodes regardless of where they are in the document. This is  usefull when you wan’t to search the whole document and return all matching  nodes. For example, if you want to query all the Gender nodes, even if you are  starting from the root node, then you can use the following query:

XmlNodeList personNodes = 
 document.DocumentElement.SelectNodes("//Gender");

We precede the element name with // to indicate  that the whole document should be searched. The query will now return all the  matching nodes wherever they are in the document. If you want to limit the area  where the searching will be done, then you can specify the parent node or root  node where the searching will start.

XmlNodeList personNodes = 
 document.DocumentElement.SelectNodes("/Persons//Gender");

The above code will search for all Gender elements under the Persons node.

If you want to query specific elements, then you can use their index. The  following queries the third child of the Persons element.

XmlNode personNodes = 
 document.DocumentElement.SelectSingleNode("/Persons/Person[3]");

Since we are querying a single node, we used the SelectSingleNode() method. The third Person element is represented by  Person[3] where we used 3 as the index. Note that indices are base-1, so the  counting starts with 1 and not 0 as opposed to arrays in C#.

If you want to search for all the person who has a gender of male, then you can  use the following code.

XmlNodeList personNodes = 
 document.DocumentElement.SelectNodes("//Person[Gender='Male']");

Since we used a double slash (//),  then  the whole document will be searched. Inside the bracket, we specify the name of  the child element and after that, we specified what the value should be. The  value must be enclosed in single quotes if it represents a string.

When dealing with attributes of elements, you can use the @ operator followed by the name of the attribute.  For example, the following prints all the name of every person.

XmlNodeList list = document.DocumentElement.SelectNodes(@"//Person/@name");

foreach (XmlNode node in list)
{
    textBox1.Text += node.Value + "\r\n";
}