XQuery/XQuery and Python

Over at [1] Cameron Laird has an example of Python code to extract and list the a tags on an XHTML page:

    import elementtree.ElementTree
    
    for element in  elementtree.ElementTree.parse("draft2.xml").findall("//a"):
        if element.tag == "a":
            attributes = element.attrib
            if "href" in attributes:
                print "'%s' is at URL '%s'." % (element.text,
                                                attributes ['href'])
            if "name" in attributes:
                print "'%s' anchors '%s'." % (element.text,
                                                attributes['name'])

The equivalent to this Python code in XQuery is:

 for $a in doc("http://en.wikipedia.org/wiki/XQuery")//*:a
 return   
   if ($a/@href)
   then concat("'", $a,"'  is at  URL '",$a/@href,"'&#10;")
   else if ($a/@name)
   then concat("'", $a,"'  anchors '",$a/@name,"'&#10;")
   else ()

tags in XQuery on Wikipedia (view source)

Here the namespace prefix is a wild-card since we don't know what the html namespace might be.

More succinctly but less readably (and with quotes omitted from the output for clarity), this could be expressed as:

 string-join(
      doc("http://en.wikipedia.org/wiki/XQuery")//*:a/
        (if (@href)
        then concat(.,"  is at  URL ",@href)
        else if (@name)
        then concat(.," anchors ", @name)
        else ()
        )
         ,'&#10;'
     )

tags in XQuery on Wikipedia (view source)

More usefully, we might supply the url of any XHTML page as a parameter and generate an HTML page of external links:

declare option exist:serialize "method=xhtml media-type=text/html";

let $url :=request:get-parameter("url",())
return
  <html>
      <h1>External links in {$url}</h1>
       { 
        for $a in doc($url)//*:a[text()][starts-with(@href,'http://')]
        return 
               <div><b>{string($a)}</b> is at  <a href="{$a/@href}"><i>{string($a/@href)}</i> </a></div>
       }
  </html>

XQuery on Wikipedia