Thursday, January 1, 2015

Comprehend is now a lot faster

In a previous blog post I announced Comprehend, a Clojure library for pattern matching on indexed sets.

In its original implementation, when an item was added to an indexed set, Comprehend would translate the item’s relevant properties to predicate logic and it would store the resulting relations in an indexed core.logic database. When Comprehend was subsequently given a query, it would translate the query to predicate logic, and then have core.logic unify the database and the (translated) query.

The above approach came with quite a bit of overhead, both in terms of performance and in terms of memory use. That is why I have now rewritten the entire Comprehend engine.

The new engine uses a custom multi-threaded unifier. Indexes are created lazily, as they are needed, and are stored in a cache.

Comprehend currently uses a soft cache by default. This means that the garbage collector is in charge of evicting indexes. This seems to work well in practice, but if that’s not to your liking—don’t worry—it’s trivial to swap in an alternative cache from core.cache. Moreover, Comprehend should have enough hooks to allow for a custom cache that intelligently evicts indexes as items are added or removed from the indexed set.

I tried the new engine on a few queries and found the overhead small enough. Even for relatively small sets, Comprehend returns results quickly and I found it quite smart about what indexes (not) to create.