Tuesday, January 27, 2015

Live queries in Comprehend

Someone recently asked me if it's possible to have Comprehend fire events when the database is updated and new matches are encountered. This is indeed possible with not too much effort. The trick is to combine forward matching and mutable indexed sets.

To start, suppose we are maintaining a set of pairs and we are interested in all triples [a b c] such that [a b] and [b c] are in our database. We define a function that prints such matches iff they are new since the function was last called:
(require '[comprehend :as c])
(require '[comprehend.mutable :as cm])

(defn report-new-matches [m-s]
  "Prints new paths of length 3 since report-new-matches was last
  called on m-s. Prints zero matches on the first invocation."
  (let [!results (volatile! nil)]
      (vreset! !results (c/comprehend :mark :report-new-matches
                                      [a b]
                                      [b c]
                                      [a b c]))
      (cm/mark m-s :report-new-matches))
    (doseq [x @!results]
      (println "Matched" x))))
This function runs a transaction that comprises two steps:
  1. Run a query on the mutable indexed set
  2. Update the mutable indexed set with a marker
The transaction ensures that the set is not mutated between step 1 and step 2. After the transaction completes, report-new-matches prints the results.

We now hook up report-new-matches as the 'io' argument of the mutable indexed set constructor:
(declare m-s)
(defn live-reporter
  ([] nil)
  ([s diff] (report-new-matches m-s)))
(def m-s (cm/mutable-indexed-set live-reporter))
(report-new-matches m-s) ; init
(I admit the circular dependency between the definitions of live-reporter and m-s is currently a little ugly. I might tweak the signature of 'io' functions to remedy this problem.)

That's it! Let's see what happens when we add elements:
(reduce cm/conj m-s [[1 2] [2 3] [1 3] [3 4]])
; Matched [1 3 4]
; Matched [1 2 3]
; Matched [2 3 4]

(cm/conj m-s [0 1])
; Matched [0 1 3]
; Matched [0 1 2]
At some point the comprehend.mutable API will be amended with a macro to make this even easier. In the meantime, as you can see, it's straightforward to role out your own solution.

Thursday, January 1, 2015

Comprehend is now a lot faster

In a previous blog post I announced Comprehend, a Clojure library for pattern matching on indexed sets.

In its original implementation, when an item was added to an indexed set, Comprehend would translate the item’s relevant properties to predicate logic and it would store the resulting relations in an indexed core.logic database. When Comprehend was subsequently given a query, it would translate the query to predicate logic, and then have core.logic unify the database and the (translated) query.

The above approach came with quite a bit of overhead, both in terms of performance and in terms of memory use. That is why I have now rewritten the entire Comprehend engine.

The new engine uses a custom multi-threaded unifier. Indexes are created lazily, as they are needed, and are stored in a cache.

Comprehend currently uses a soft cache by default. This means that the garbage collector is in charge of evicting indexes. This seems to work well in practice, but if that’s not to your liking—don’t worry—it’s trivial to swap in an alternative cache from core.cache. Moreover, Comprehend should have enough hooks to allow for a custom cache that intelligently evicts indexes as items are added or removed from the indexed set.

I tried the new engine on a few queries and found the overhead small enough. Even for relatively small sets, Comprehend returns results quickly and I found it quite smart about what indexes (not) to create.

Sunday, May 18, 2014

Comprehend: A Clojure Library for Pattern Matching on Sets

I've been doing a lot of Objective-C programming lately. Objective-C is not the worst language, but last week I couldn't help but become nostalgic of the Clojure programming I did at Hacker School. At some point I could no longer help but open Good Ol' Sublime Text and experiment with core.logic.

My procrastinating was productive because I now have the pleasure of presenting Comprehend to you.

Comprehend is a Clojure library containing a setlike data type that is optimized for the idiom { ∀ patterns ⊆ S : expr }. In Clojure this ends up looking like this:
(c/comprehend S pattern1 pattern2 ... expr) ; => '(result1 result2 result3 ...)
I explain the syntax in quite a bit of detail on the Github page (repeat link). In this blog post I want to go into some technical aspects of the implementation.

The main ideas

In Comprehend core.logic.pldb is used to manage the indexes of indexed sets.

When adding an element X to an indexed set, here is what we do:
  • Add something akin to [:root-element (hash X)] to the logic database
Next, with Y = X initially, we recursively perform the following steps:
  • If Y is sequential then for every element Z of Y at position i:
    • Add [:vector-length (count Y)] to the logic database
    • Add [:vector-element (hash Y) i (hash Z)]
    • If Z is a (non-opaque) collection then repeat this algorithm with Y := Z
  • If Y is a map then we do something similar for both the keys and the values of Y. We don't store the size of Y, however, because in the case of maps we also want to match on supermaps.
  • If Y is a different kind of collection (e.g. a  set) then we do the same but we don't have to deal with both keys and values.
That's it! core.logic.pldb puts indices on all the elements in our tuples and this helps us perform fast queries. Next to the index we also store a dictionary from all the hashes in our tuples to the original objects.

So about those queries. We follow a similar procedure to translate patterns into tuples. Here are the main differences:
  • At compile time we determine what the unbound symbols in the pattern are.
  • Wherever we would previously put (hash Y) or (hash Z) in a tuple, we now need to be more clever:
    • If Y (or Z) was one of the unbound symbols then we just return it. (Nitpick: This is at runtime and core.logic will have made the symbol resolve to a core.logic 'lvar'.)
    • If Y (or Z) is a collection that contains one of the unbound symbols then we replace it by an lvar. We reuse the same lvar for identical patterns.
    • In all other cases we store the hash value after all.
We then hand core.logic all of the following: (a) the core.logic.pldb index, (b) the list of unbound symbols in the pattern, and (c) the tuples-with-variables. Core.logic will do its thing and returns a sequence of sequences of hashes. The outermost sequence can be thought of as a list of matches. Every innermost sequence contains hashes of assignments for (b). We map these hashes onto assignments using the dictionary that we stored next to the index.

Recall that we determined (b) at compile time. This means that at compile time we can also generate the following template:
(for [[var1 var2 ...] (do-all-of-the-above)] expr)
Thus, insofar the only unbound symbols in expr also occurred in (one of) the patterns, expr will be grounded.

What is it good for?

I wrote a quick and dirty benchmark. The idea being merely to verify that my library is faster than naive approaches for some problems.

In the benchmark I create a graph of N vertices and N random edges. I then compare several approaches to find all paths that consist of four edges.

The first approach I tried, of course, uses Comprehend.

I also implemented an alternative core.logic query that uses a vector representation of the graph as its input source. This code uses core.logic's membero operator and also core.logic's built-in support for unification of vectors with logic variables. At around N=50 Comprehend conclusively overtakes this implementation.

I also implemented a naive algorithm that uses 'for', 'filter', and argument destructuring. At N=50 this implementation is about 50% slower than the previous core.logic query.

That is to say, there is some proof that there exist use cases for Comprehend. The exact scope of use cases is not clear to me yet, however. This warrants further investigation.

Practical uses aside, I had a ton of fun implementing this library and definitely also learned a thing or two along the way.

Alright, now it's back to Objective-C for me!

Wednesday, February 26, 2014

Termcat ClojureScript Port and Live Demo

Recently I spent some time porting Termcat from Clojure to ClojureScript. As a result, it is now possible to embed the Termcat compiler in web pages.

I also went one step further and created a live demo. The demo consists of a simple Termcat editor. As you enter Termcat code, it compiles your document to HTML/MathML on the fly.

Do try the demo if you're interested in alternatives for LaTeX or Markdown!

Monday, December 16, 2013

The Command Line Interface for Termcat

Yesterday I created the long overdue command line interface (CLI) for Termcat.

Usage: java -jar termcat.jar [options] <document[.tc]>

The HTML output is stored in document.html.

  -b, --browse   Open HTML output in browser
  -w, --watch    Watch document for changes and recompile when changed
  -v, --verbose  Use verbose output

(When running from source code, substitute 'lein run' for 'java -jar termcat.jar'.)

The most exciting thing here is the --watch switch. When activated, Termcat will first compile the source document to HTML and then watch for changes in the source file. Whenever the source is modified, Termcat automatically recompiles it.

Convenience aside, --watch has two performance benefits. First, it's not necessary to start Java every time you want to recompile a Termcat document. Second, Termcat keeps a cache around of intermediate results when watching a document. This means that Termcat doesn't need to reevaluate the entire document for every minor update (although it's currently not nearly as effective at this than I reckon it could be).

I'm also happy about the --browse option. When the --browse and --watch features are combined, Termcat automatically reloads the browser every time it recompiles the Termcat document. This means you get live previews of your Termcat documents out of the box!

lein run --verbose --watch --browse blog-post.tc
Compiling... (Elapsed time: 737.621 msecs; cache size: 410 items)
HTML output stored in blog-post.html
Watching for changes...
Recompiling... (Elapsed time: 457.546 msecs; cache size: 500 items)
Recompiling... (Elapsed time: 26.785 msecs; cache size: 500 items)
Recompiling... (Elapsed time: 404.613 msecs; cache size: 502 items)
Recompiling... (Elapsed time: 17.539 msecs; cache size: 502 items)

Termcat uses Java's WatchService API for tracking document modifications. The cache is based on soft references.

Monday, December 2, 2013


Last week was a short week at Hacker School because of the holidays. I had an amazing first Thanksgiving indeed, but I also got some work done.

Last week I started porting my theorem prover to ClojureScript. My intention is to create a web-based front end for my theorem prover using D3.js to draw tableaux and models. I'm cautiously optimistic that this won't be too much work as my tableaux and models are nothing more than labeled DAGs.

My porting efforts came to a temporary halt when I encountered a bug in ClojureScript. This led me to take a closer look at the ClojureScript codebase—a productive distraction that resulted in a patch by yours truly.

I have also started working on a manual for Termcat.

Additionally, I have been working on a new rewriting system for Termcat that should be more powerful, easier to understand, and faster than the system that's in place now. The downside is that it has subtly different semantics and this means I can't just assume my current rules will keep working when I switch rewriting systems. I might be in for a bumpy ride!

Monday, November 25, 2013

Thoughts on the Clojure threading macros

Most programmers who have dabbled in Clojure are probably familiar with the threading macros -> and ->>. For readers, who aren't. Here's what they do.

The threading macros allow you to write function composition in reversed order. For instance, using the threading macros (f (g (h x))) can be written as (-> x h g f) or (->> x h g f). Here, the first argument of the macros are evaluated as usual. The other arguments are assumed to be functions.

It's also possible to use -> and ->> for functions that expect multiple arguments. As such, the code (-> a (f b) g) is translated into (g (f a b)) and (->> a (f b) g) is rewritten as (g (f b a)). That is to say, every time the pattern (function arg1 arg2) is encountered, the result of the preceding expression is inserted as the first (->) or last argument (->>) of the function call.

The macro ->> is perhaps the more popular threading macro. It is very useful in combination with map, reduce, or filter—functions that take a function as their first argument and a collection as their last argument. To wit, you might write the following code:

(->> [-1 0 1 2]
     (filter pos?)
     (map inc)
     (map str))

This expression would first filter all positive elements from the vector [-1 0 1 2] (i.e. delete -1 and 0),  then increment the remaining elements, and finally convert them to strings. The result is the list ("2" "3").

The macro -> is useful for functions that perform operations on their first argument. For instance,

(-> { :a [1 2] }
    (get :a)
    (conj 3))

This evaluates to the vector [1 2 3].

Invoking object or protocol methods in Clojure abides by the following syntax: (method obj arg1 arg2 ...). If, in typical functional style, your methods return updates on 'obj' then the macro -> tends to be the more useful threading macro for composing method calls.

Combining -> and ->> can be a bit of a pain. Nesting ->> within an -> expression works well:

(-> {:a [0 1 2 3]}
    (get :a)
    (->> (filter pos?)
         (map inc))
; (4 3 2)

This works because -> inserts the result of (get :a {:a [0 1 2 3]}) as the first argument of the call to ->>. Unfortunately, however, there's no obvious way for nesting -> within ->>.

Using only -> and ->> there doesn't seem to be a good general way of dealing with long chains of composed function calls that expect the 'object' in different positions.

Another annoyance is that -> and ->> don't allow for control flow in an obvious way. That is to say, you can't easily use if-statements within an -> expression.

To be fair, you could use lambdas like this

(-> ...
    ((fn [x] (if (pred x)
                 (foo x)
                 (bar x)))))

but this is a less than stellar solution if only because it introduces a lot of extra parentheses.

(Granted, using the special lambda form #() one pair of parentheses can be eliminated, but then the trade-off is that #()-expressions cannot be nested.)

Where the Clojure threading macros appear to fall short, the Swiss arrow macros purport to be a solution. However, there are rather many of them and I'm not sure if you want other people to understand your code then I feel it's better not to use them.

Enter the macro as->. It's a built-in macro (as of Clojure 1.5) and it's pretty darn awesome. Here's the source code:

(defmacro as->
  "Binds name to expr, evaluates the first form in the lexical context
  of that binding, then binds name to that result, repeating for each
  successive form, returning the result of the last form."
  {:added "1.5"}
  [expr name & forms]
  `(let [~name ~expr
         ~@(interleave (repeat name) forms)]

That code is unusually short, don't you agree? Here's what it does. It transforms the code

(as-> [1 2 3 4] $
      (conj $ 5)
      (map dec $))


(let* [$ [1 2 3 4] 
       $ (conj $ 5) 
       $ (map dec $)] $)

and it returns (0 1 2 3 4). I think it's a real gem.

The macro as-> allows you to introduce control flow and it works very well with the -> macro. To wit, you can write things like

(-> ...
    (as-> x (if (pred x)
                (foo x)
                (bar x)))))

It also works great in combination with the -> macro. For instance, you can write

(as-> [1 2 3] x
      (map inc x)
      (-> x
          (conj 5)))
; [2 3 4 5]

Isn't that just swell?