Using XPath to Select Nodes
XPath is a special query language the is specifically used for selecting nodes in an XML document. With these language, you don’t have to search for the entire tree of the XML nodes. You will learn the basics of this language and apply it to a program. The two methods used for selecting nodes using the XPath language are the XmlNode.SelectNodes() and the XmlNode.SelectSingleNode(). The SelectNodes() method returns a XmlNodeList which contains all the nodes that match the XPath string. Consider the following XML document.
<?xml version="1.0" encoding="utf-8" ?>
<Persons>
<Person name="John Smith">
<Age>30</Age>
<Gender>Male</Gender>
</Person>
<Person name="Mike Folley">
<Age>25</Age>
<Gender>Male</Gender>
</Person>
<Person name="Lisa Carter">
<Age>22</Age>
<Gender>Female</Gender>
</Person>
<Person name="Jerry Frost">
<Age>27</Age>
<Gender>Male</Gender>
</Person>
<Person name="Adam Wong">
<Age>35</Age>
<Gender>Male</Gender>
</Person>
</Persons>
Figure 1
Suppose you want to get the age of every person, the code for doing that is:
XmlDocument document = new XmlDocument();
document.Load("Persons.xml");
XmlNodeList nodes = document.DocumentElement.SelectNodes("/Persons/Person/Age");
foreach(XmlNode node in nodes)
{
textBoxResult.Text += node.InnerText + "\r\n";
}
After loading the document, we used the DocumentElement property which is of type XmlNode. We used the SelectNodes() method which accepts a string argument that contains the XPath query. The XPath query /Persons/Person/Age tells that get Age element that is a child of a Person element which is a child of the Persons element. All the matching nodes will then be returned as an XmlNodeList. We used a foreach loop to print each age in a text box. Figure 2 shows you some XPath operations that you can use to query specific nodes.
XPath Query | Description |
---|---|
. | Selects the current node. |
.. | Selects the parent of the current node. |
* | Selects all the child of the current node. |
nodename | Selects all child nodes specified by the name. |
/ | Selects the root node. |
// | Selects nodes from the current node that match the selection expression no matter where they are. |
//* | Selects all elements in the document. |
/element | Selects the root element named element. Starting a path with a / means you are using an absolute path to an element. |
/element/* | Selects all the child of the root element. |
element/* | Selects all the child nodes of a child element. |
element/child | Selects the child elements which are a child of a specified child the element of the current node. |
//element | Selects all elements with the specified name regardless of where they are in the document. |
element//child | Selects all child elements of the parent regardless of where they are inside the parent element. |
@attribute | Selects an attribute of the current node where attribute is the name of the attribute. |
//@attribute | Selects all the attributes specified by its name regardless of where they are in the document. |
@* | Selects all attribute of the current node. |
element[i] | Selects an element with the specified element name and the specified index. |
text() | Selects the text of all the child nodes of the current element. |
//text() | Selects text of every element in the document. |
//element/text() | Selects the text of all the matching elements. |
//element[name=’value’] | Selects all elements with a child containing a specified value. |
//element[@att=’value’] | Selects all elements with the specified attribute having the specified value. |
Figure 2 – XPath Operations
For example, if you want to select the current node, then you will use the . operator.
XmlNode current = document.DocumentElement.SelectSingleNode(".");
Notice that we used the XmlNode.SelectSingleNode() method to only select 1 node. In case of multiple results, it will return the first matching node. The method accepted the XPath string as its argument. Passing “.” will result on returning the actual node that calls it. Also notice that the returned value is an XmlNode and not an XmlNodeList.
Suppose you want to select all the Person elements which are children of the Persons node. You can use the following XPath query.
XmlNodeList personNodes =
document.DocumentElement.SelectNodes("/Persons/Person");
Notice that we started the query with a slash (/). This indicates that we are using an absolute path. We start fromt he Persons root node, then we look at all the Person nodes which is a direct child of the root node. If we are to get every age of each person, then we can use the following code.
XmlNodeList personNodes =
document.DocumentElement.SelectNodes("/Persons/Person/Age");
You can also use a relative path where the searching starts from the current node. For example, you can query all the Person node from the DocumentElement root node using the following code:
XmlNodeList personNodes =
document.DocumentElement.SelectNodes("Person");
or the age of every person using the following code:
XmlNodeList personNodes =
document.DocumentElement.SelectNodes("Person/Age");
Notice that we didn’t precede the XPath query with a / to indicate that we are using a relative path.
We can also query nodes regardless of where they are in the document. This is usefull when you wan’t to search the whole document and return all matching nodes. For example, if you want to query all the Gender nodes, even if you are starting from the root node, then you can use the following query:
XmlNodeList personNodes =
document.DocumentElement.SelectNodes("//Gender");
We precede the element name with // to indicate that the whole document should be searched. The query will now return all the matching nodes wherever they are in the document. If you want to limit the area where the searching will be done, then you can specify the parent node or root node where the searching will start.
XmlNodeList personNodes =
document.DocumentElement.SelectNodes("/Persons//Gender");
The above code will search for all Gender elements under the Persons node.
If you want to query specific elements, then you can use their index. The following queries the third child of the Persons element.
XmlNode personNodes =
document.DocumentElement.SelectSingleNode("/Persons/Person[3]");
Since we are querying a single node, we used the SelectSingleNode() method. The third Person element is represented by Person[3] where we used 3 as the index. Note that indices are base-1, so the counting starts with 1 and not 0 as opposed to arrays in C#.
If you want to search for all the person who has a gender of male, then you can use the following code.
XmlNodeList personNodes =
document.DocumentElement.SelectNodes("//Person[Gender='Male']");
Since we used a double slash (//), then the whole document will be searched. Inside the bracket, we specify the name of the child element and after that, we specified what the value should be. The value must be enclosed in single quotes if it represents a string.
When dealing with attributes of elements, you can use the @ operator followed by the name of the attribute. For example, the following prints all the name of every person.
XmlNodeList list = document.DocumentElement.SelectNodes(@"//Person/@name");
foreach (XmlNode node in list)
{
textBox1.Text += node.Value + "\r\n";
}