Tech behind Tech

Raw information. No finesse :)

MQL – A Clojure library for querying Nested Maps

with 8 comments


I was working on a feature that needed me to query nested map structure. I wanted to do it in a generic way. Amit pointed me to rql, a library for dealing with collections of records in clojure. I thought it will be helpful if I have something similar to rql for querying map, so I created mql.

Given a map

(def m
     {:cid
      {
       :visits {
                :id-1 {
                       :last-ts "1284166000040"
                       :first-ts "1274166000000"
                       :duration "40"}
                :id-2 {
                       :last-ts "1274166000040"
                       :first-ts "1274166000000"
                       :duration "40"}
                :id-3 {
                       :last-ts "1264166000040"
                       :first-ts "1274166000000"
                       :duration "40"}}

       :promo{
              :id-1 {:promo "p2"}
              :id-2 {:promo "p1"}}

       :purchase {
                  :id-1
                  {:order-id "order-id-1"
                   :total-dollars "970.00"
                   :purchase? "true",
                   :merchant-total-dollars "1000.00"}
                  :id-3
                  {:order-id "order-id-2"
                   :total-dollars "1000.00"
                   :purchase? ""
                   :merchant-total-dollars "1000.00"}}}})

Select: simple select

mql.core> (select [:cid :promo] m)
{:id-1 {:promo "p2"}, :id-2 {:promo "p1"}}

Select: with filtering

mql.core> (select [:cid :purchase] (where [* :total-dollars] :gt 980) m)
([:id-3 {:order-id "order-id-2", :total-dollars "1000.00", :purchase? "", :merchant-total-dollars "1000.00"}])

You can currently use :gt,:ge,:lt,:le and :eq as logical operators in where clause. In where clause for key-seq you can use * if a key is dynamic. The last key in where clause key-seq should not be *.

Limitations:

  • There could be only one * in Where Clause
  • Select clause cannot have * now

This is an early version of this library. It is doing what I need for now. So if you have any requirements/idea, you can send me a patch or let me know, I will hack when I get some free time.

Please feel free to send me your feedback on syntax, code style … etc

Written by Siva Jagadeesan

September 8, 2010 at 10:43 pm

Posted in Clojure

Tagged with , ,

8 Responses

Subscribe to comments with RSS.

  1. Hi Siva,

    maybe the author is not aware of it, but rql should also work with maps, e.g. look at the where function:

    (defn where
    [coll & preds]
    (reduce (fn [coll pred] (filter #(= ((first pred) %) (last pred)) coll)) coll (partition 2 preds)))

    or with a bit of destructuring:

    (defn where
    [coll & preds]
    (reduce (fn [coll [k v]] (filter #(= (k %) v) coll)) coll (partition 2 preds)))

    But I don’t think it is always worth an extra library for these kind of small code snippets.

    Cheers,
    Stefan

    Stefan

    September 13, 2010 at 2:39 am

    • Thanks Stefan,

      When it comes to map, keys could be different. I wanted a library that could take wildcard characters.

      Don’t you think that is the beauty of clojure, most libraries are small code snippet :)

      — Siva Jagadeesan

      Siva Jagadeesan

      September 13, 2010 at 1:57 pm

      • Hi Siva,

        right, keys could be different, but the function is still working for both.

        Comming back to your mql library. Please, don’t use (select [:cid :promo] m) where you can use simple core Clojure: (get-in m [:cid :promo]). And why do you use :gt instead of >. Don’t create your own DSL where it is not necessary. Use the power of Clojure, don’t hide it.
        I would also not use read-string by default, because 980 is not the same as “980”. And don’t use the function * as a wildcard marker.
        The problem with your map is its structure. The id’s should be part of the map entries and map entries of the same type should be stored in (or returned as) a collection (like it’s done in rql, CouchDB, MongoDB, etc). So you can use get-in to get a collection and then you can use all the Clojure magic to do map, reduce, etc.
        Btw. You can use identity instead #(identity %)

        Cheers,
        Stefan

        Stefan

        September 14, 2010 at 1:11 am

      • Hi Stefan:

        Thanks a lot for your feedback.

        ” Please, don’t use (select [:cid :promo] m) where you can use simple core Clojure: (get-in m [:cid :promo]).”

        The reason for not using get-in and using select is , I want to able to do something like

        (select [:cid * :id-1])

        With plain get-in, I wont be able to do that.

        ” And why do you use :gt instead of >. Don’t create your own DSL where it is not necessary. Use the power of Clojure, don’t hide it.”

        Yes I agree I could just use > instead of :gt. :gt is more of a syntactic sugar. I guess it is a just a preference.

        “I would also not use read-string by default, because 980 is not the same as “980″.”

        I will take a look at this.

        “don’t use the function * as a wildcard marker”

        Stefan, any reason you are against it. I was thinking * is used a wildcard character in unix, so it will be easy to understand.

        “The problem with your map is its structure. The id’s should be part of the map entries and map entries of the same type should be stored in (or returned as) a collection (like it’s done in rql, CouchDB, MongoDB, etc).”

        Stefan, could you change the example map I have given in structure your are talking about. It is not very clear to me.

        You can use identity instead #(identity %)

        Stupid me .. changing it now.

        Thanks again Stefan, for your feedback.

        Keep it coming,

        — Siva Jagadeesan

        Siva Jagadeesan

        September 14, 2010 at 6:06 pm

  2. Hi Siva,

    it’s me again ;-)

    The most important question in the first place: Why do you have such a strange data structure?

    So the id-x is not a unique identifier, more something like an index or a group-id.

    The structure I have in mind is pretty simple and well known, e.g. posts and comments:

    (def posts [{:id 1, :title “First Day”, :body “whatever”, :comments []}
    {:id 2, :title “Another Day”, :body “whenever”, :comments [{:author “Alex”, :comment “wow”}
    {:author “Stuart”, :comment “great”}]}])

    All posts are stored in one collection and each post has his own unique id (primary key). In this case all comments are embedded, but it could also be a collection of references (foreign keys).

    So now we’re back in the cosy world of Clojure collections, e.g.:

    (filter #(.contains (:title %) “Day”) posts)

    or using rql:

    (where posts :title “First Day”)

    or:

    (where (-> posts (where :id 2) first :comments) :author “Alex”)

    Sorry if I have missed something or may have oversimplified the hole story. Or maybe you can spread more light on your chosen map structure and the multi * select.

    Cheers,
    Stefan

    Stefan

    September 15, 2010 at 5:41 am

    • Hi Stefan:

      Thanks for making this discussion a learning experience for me.

      In our project, we store data in hadoop. After summarizing this is easiest data format we were able to extract. Also traversing that data structure is not that difficult. So it was not a big deal for us.

      In the data structure you have , there is only one layer of variable key (:id) . If have more then one level, our filter function is going to be bit more complicated. I am not saying it is impossible. MQL gives you that abstraction where you could traverse thru multiple layers of variable keys.

      If you ask me, can’t be this done using clojure core? Yes it can be. MQL is just a small abstraction to make it easier.

      Regards,

      — Siva Jagadeesan

      Siva Jagadeesan

      September 16, 2010 at 11:41 am

      • Hi Siva,

        it doesn’t matter where the data is coming from (hadoop, database, file), in Clojure you should work with sequence abstraction, in this case maps containing vectors of maps containing…

        Sorry, I still don’t see the reason behind the multi level id – keys, and if in doubt I stick to…

        “It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.”
        Alan J. Perlis

        Cheers,
        Stefan

        Stefan

        September 23, 2010 at 2:23 am

      • Hi Stefan:

        Thanks for bring up Alan’s quote,

        “It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.”

        This is the exact reason that MQL works on generic clojure map structure, instead of expecting a particular data structure.

        As Alan suggested, MQL is a new set of functions that work on generic clojure map structure.

        Regards,

        — Siva Jagadeesan

        Siva Jagadeesan

        September 27, 2010 at 10:53 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 146 other followers

%d bloggers like this: