Tech behind Tech

Raw information. No finesse :)

Posts Tagged ‘xml

Parsing XML in Clojure

with 5 comments


Problem :

We need to parse a xml string and be able to query using xpath style tag list.

Ex :

<friends>
  <person>
   <name>Siva</name>
  </person>
</friends>

I need a function that can do this,

(get-value xml :person :name)

returns “Siva”

Solution :

To parse and query xml we need to do these following three things in clojure.

1) Convert xml string (file) to Struct Map

Clojure core comes with a build in xml library (clojure.xml http://clojure.github.com/clojure/clojure.xml-api.html) that has a parse function which takes in InputStream and returns a struct map that represents xml.

(defn get-struct-map [xml]
  (let [stream (ByteArrayInputStream. (.getBytes (.trim xml)))]
    (xml/parse stream)))
user> (get-struct-map xml)
{:tag :friends, :attrs nil, :content [{:tag :person, :attrs nil, :content [{:tag :name, :attrs nil, :content ["Siva"]}]}]}

This struct map is cumbersome to query.

2) Convert Struct Map to Zipper Data Structure

“A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages.” http://en.wikipedia.org/wiki/Zipper_%28data_structure%29

To make it easy for us to traverse we will change struct map to zipper data structure. Clojure comes with zip library ( it is short form for zipper ) http://clojure.github.com/clojure/clojure.zip-api.html

We will this zip library to convert xml struct map to zipper data structure.

user> (clojure.zip/xml-zip xml-struct)
[{:tag :friends, :attrs nil, :content [{:tag :person, :attrs nil, :content [{:tag :name, :attrs nil, :content ["Siva"]}]}]} nil]

3) Use Zip-filter library and query zipper data structure.

Now that we have our xml in zipper data structure we could use zip-filter library that is present in clojure.contrib. http://clojure.github.com/clojure-contrib/zip-filter-api.html

user> (clojure.contrib.zip-filter.xml/xml-> zipper-struct :person :name)
("Siva")

Putting this altogether

(ns com.sivajag.utils.xml
  (:import (java.io ByteArrayInputStream))
  (:require [clojure.xml :as xml])
  (:require [clojure.zip :as zip])
  (:require [clojure.contrib.zip-filter.xml :as zf]))

(defn get-struct-map [xml]
  (if-not (empty? xml)
    (let [stream (ByteArrayInputStream. (.getBytes (.trim xml)))]
      (xml/parse stream))))

(defn get-value [xml & tags]
  (apply zf/xml1-> (zip/xml-zip (get-struct-map xml)) (conj (vec tags) zf/text)))

user> (get-value xml :person :name)
"Siva"

Happy Coding!!!

Written by Siva Jagadeesan

June 25, 2010 at 1:57 pm

Posted in Clojure

Tagged with , , ,

Follow

Get every new post delivered to your Inbox.

Join 146 other followers

%d bloggers like this: