XQuery/XQuery and Python
< XQuery
Over at [1] Cameron Laird has an example of Python code to extract and list the a tags on an XHTML page:
import elementtree.ElementTree
for element in elementtree.ElementTree.parse("draft2.xml").findall("//a"):
if element.tag == "a":
attributes = element.attrib
if "href" in attributes:
print "'%s' is at URL '%s'." % (element.text,
attributes ['href'])
if "name" in attributes:
print "'%s' anchors '%s'." % (element.text,
attributes['name'])
The equivalent to this Python code in XQuery is:
for $a in doc("http://en.wikipedia.org/wiki/XQuery")//*:a
return
if ($a/@href)
then concat("'", $a,"' is at URL '",$a/@href,"' ")
else if ($a/@name)
then concat("'", $a,"' anchors '",$a/@name,"' ")
else ()
tags in XQuery on Wikipedia (view source)
Here the namespace prefix is a wild-card since we don't know what the html namespace might be.
More succinctly but less readably (and with quotes omitted from the output for clarity), this could be expressed as:
string-join(
doc("http://en.wikipedia.org/wiki/XQuery")//*:a/
(if (@href)
then concat(.," is at URL ",@href)
else if (@name)
then concat(.," anchors ", @name)
else ()
)
,' '
)
tags in XQuery on Wikipedia (view source)
More usefully, we might supply the url of any XHTML page as a parameter and generate an HTML page of external links:
declare option exist:serialize "method=xhtml media-type=text/html";
let $url :=request:get-parameter("url",())
return
<html>
<h1>External links in {$url}</h1>
{
for $a in doc($url)//*:a[text()][starts-with(@href,'http://')]
return
<div><b>{string($a)}</b> is at <a href="{$a/@href}"><i>{string($a/@href)}</i> </a></div>
}
</html>