Skip to content

sitr.us

Functional data structures in JavaScript with Mori

Updated 2013-11-12 (originally posted )functional programming14 min read

I have a long-standing desire for a JavaScript library that provides good implementations of functional data structures. Recently I found Mori, and I think that it may be just the library that I have been looking for. Mori packages data structures from the Clojure standard library for use in JavaScript code.

Table of Contents

Functional data structures

A functional data structure (also called a persistent data structure) has two important qualities: it is immutable and it can be updated by creating a copy with modifications (copy-on-write). Creating copies should be nearly as cheap as modifying a comparable mutable data structure in place. This is achieved with structural sharing: pointers to unchanged portions of a structure are shared between copies so that memory need only be allocated for changed portions of the data structure.

A simple example is a linked list. A linked list is an object, specifically a list node, with a value and a pointer to the next list node, which points to the next list node. (Eventually you get to the end of the list where there is a node that points to the empty list.) Prepending an element to the front of such a list is a constant-time operation: you just create a new list element with a pointer to the start of the existing list. When lists are immutable there is no need to actually copy the original list. Removing an element from the front of a list is also a constant-time operation: you just return a pointer to the second element of the list. Here is a slightly more-detailed explanation.

Lists are just one example. There are functional implementations of maps, sets, and other types of structures.

Rich Hickey, the creator Clojure, describes functional data structures as decoupling state and time. (Also available in video form.) The idea is that code that uses functional data structures is easier to reason about and to verify than code that uses mutable data structures.

Clojure, ClojureScript, and Mori

Clojure is a functional language that compiles to JVM bytecode. It is a Lisp dialect. Among Clojure's innovations are implementations of a number of functional data structures, old and new. For example, other Lisps tend to place prime importance on linked lists; but a lot of Clojure code is based on PersistentVector, which supports efficient random-access operations.

ClojureScript is an alternative Clojure compiler that produces JavaScript code instead of JVM bytecode. The team behind ClojureScript has ported Clojure collections to ClojureScript implementations - which are therefore within reach of JavaScript code.

Mori incorporates the ClojureScript build tool and pulls out just the standard library data structures. It builds a JavaScript file that can be used as a standalone library. Mori adds some API customizations to make the Clojure data structures easier to use in JavaScript - such as helpers to convert between JavaScript arrays and Clojure structures. Mori also includes a few helpers to make functional programming easier, like identity, constant, and sum functions. These are the data structures provided in the latest version of Mori, v0.2.4:

constructorMoriClojure
listMori docsClojure docs
vectorMori docsClojure docs
rangeMori docsClojure docs
hash_mapMori docsClojure docs
array_mapClojure docs
sorted_mapClojure docs
sorted_map_byClojure docs
setMori docsClojure docs
sorted_setMori docsClojure docs
sorted_set_byClojure docs

All of these data structures are immutable and can be efficiently updated via copying and structural sharing.

The documentation for Mori is pretty good. But it does skip over some of the available data structures and functions. Since most of the stuff provided by Mori comes from Clojure, if you cannot find information that you need in the Mori docs you can also look at the Clojure docs. I provided links to the Clojure documentation for each data structure where more detailed descriptions are available.

Installing and running

To get Mori, install the npm package mori. Then you can require the module 'mori' in a Node.js project; or copy the file mori.js from the installed package and drop it into a web browser.

Examples

Let's take a look at the particular advantages of some of these structures.

vector

A vector is an ordered, finite sequence that supports efficient random-access lookups and updates. Vectors are created using the vector function from the 'mori' module. Any arguments given to the function become elements of a new vector. In Node.js you can import Mori and create a vector like this:

Mori also works with AMD implementations (such as RequireJS) for use in browser code:

Idiomatic Clojure is not object-oriented. The convention in Clojure is that instead of putting methods on objects / values, one defines functions that take values as arguments. Those functions are organized into modules to group related functions together. This approach makes a lot of sense when values are mostly immutable; and it avoids name collisions that sometimes come up in object-oriented code, since names are scoped by module instead of by object.1

Since Mori is an adaptation of Clojure code, it uses the same convention. Data structures created with Mori do not have methods. Instead all functionality is provided by functions exported by the 'mori' module. That is why the code here uses expressions like mori.count(v) instead of v.count().

Existing vectors can be modified:

conj is an idiom that is particular to Clojure. It inserts one or more values into a collection. It behaves differently with different collection types, using whatever insert strategy is most efficient for the given collection.2

higher-order functions

Mori provides a number of higher-order functions. Here is an example that computes the sum of the even values in a collection:

Or, borrowing from an example in the Mori documentation, one might compute a sum for even values and a separate sum for odd values:

The example above returns a map created with array_map, which is a map implementation that works well with a small number of keys.

sorted_set_by

JavaScript does not have its own set implementation. (Though it looks like one will be introduced in ECMAScript 6.) Sets are a feature that I often miss.

A sorted set is a heavenly blend of a sequence and a set. Any duplicate values that are added are ignored, and there is a specific ordering of elements. Unlike a list or a vector, ordering is not based on insertion, but is determined by comparisons between elements.

One possible use for a sorted set is to implement a priority queue. Consider an example of a calendar application. sorted_set_by takes a comparison function that is used to to maintain an ordering of added values. With the appropriate comparison appointments are added and are automatically sorted by date:

Like the underlying set, this calendar implementation is immutable. When an appointment is added you get a new calendar value.

The comparison function for comparing appointments sorts appointments by date, and uses title as a secondary sort in case there are appointments with the same date and time. The sorted set uses this function to determine equality as well as ordering; so if it made comparisons using only the date field then the calendar would not accept multiple appointments with the same date and time.

Appointments can be added to a calendar and queried in date order:

Looks good! Now let's add an undo feature. In case the user changes her mind about the last appointment that was added, the undo feature should recreate the previous state without that appointment. The implementation of Calendar is the same as before except that the constructor takes an additional optional argument, the add method passes a reference from one calendar value to the next, and there is a new undo method:

Assuming the same set of appointments, we can use the undo method to step back to a state before the fourth appointment was added:

Immutability comes in handy in this scenario. It is trivial to step back in time.

Actually because Calendar is immutable, you don't necessarily need a special method to get undo behavior:

I'm sure that the Synesthesia Bike Tour is a lot of fun. It's just that when demonstrating an undo feature, something has to take the fall. That's just the world that we live in, I suppose.

hash_map

All JavaScript objects are maps. But those can only use strings as keys.3 The hash_map provided by Mori can use any values as keys.

If you use plain JavaScript objects as keys they will be compared by reference identity. If you use Mori data structures as keys they will be compared by value using equality comparisons provided by ClojureScript.

Mori maps are immutable; so there is never a need to create defensive copies. An update operation produces a new map.

The function assoc adds any number key-value pairs to a map; and dissoc removes keys.

All of this also applies to array_map, sorted_map, and sorted_map_by. See Different map and set implementations below for information about those.

There is a common pattern in JavaScript of passing options objects to constructors to avoid having functions that take zillions of arguments. It is also common to have a set of default values for certain options. So code like this is pretty typical:

I usually put in an empty object as the first argument to the Underscore _.extend call so that I get a new object instead of modifying the given options object in place. Modifying the options object could cause problems if it is reused somewhere outside of my constructor. An alternative to the defensive copy could be to use immutable Mori maps:

The function mori.into(coll, from) adds all of the members of from into a copy of coll. Both from and coll can by any Mori collection.

That does still involve copying the input options object into a new Mori map. It is also possible to provide a Mori sequence as input to start with - js_to_clj will accept either a plain JavaScript object or a Mori collection:

There is probably no performance gain from using Mori in this situation. It is unlikely to matter anyway, since the structures involved are small. In situations with larger data structures, or where data is copied many times in a loop, Mori's ability to create copies faster using less memory could make a difference.

But in my opinion applying immutability consistently - even where there are no significant performance gains - can simplify things.

Apples to apples

The transformations that are available in Mori - map, filter, etc.

  • return lazy sequences no matter what the type of the input collection is. (See Laziness below for an explanation of what laziness is). This is advantageous because the other collection implementations are not lazy. But what if you want to do something like create a new set based on an existing set? The answer is that you feed a lazy sequence into a new empty collection using the appropriate constructor or the into function, which dumps all of the content from one collection into another:

Applying map to the first set is lazy - but building the second set with into is not. So a good practice is to avoid building non-lazy collection until the last possible moment.

An odd quirk in Mori is that the set constructor takes a collection argument, but most of the other constructors take individual values or key-value pairs. The into function behaves more consistently across data structure implementations.

You might want to write transformations that are polymorphic - that can operate on any type of collection and that return a collection of the same type. To do that use mori.empty(coll) to get an empty version of a given collection. This makes it possible to build a new collection without having to know which constructor was used to create the original.

Here is a function that removes null values from any Mori collection and that returns a collection of the same type:

If the collection given is a map then the value of elem in the filter callback will be a key-value pair. So compact includes an is_map check and extracts the value from that key-value pair if a map is given.

A nice advantage of empty is that if you use it on a sorted_set_by or on a sorted_map_by, the new collection will inherit the same comparison function that the original uses.

Mori pairs well with Bacon

In a previous post, Functional Reactive Programming in JavaScript, I wrote about functional reactive programming (FRP) using Bacon.js and RxJS. A typical assumption in FRP code is that values contained in events and properties will never be updated in place. The immutable data structures that Mori provides are a perfect fit.

List versus Vector

Linked lists are nice and simple - but become less appealing when it becomes desirable to access elements at arbitrary positions in a sequence, to append elements to the end of a sequence, or to update elements at arbitrary indexes. The running time for all of these operations on lists is linear, and the append and update operations require creating a full copy of the list up to the changed position.

Clojure's PersistentVector is a sequence, like a list. But under-the-hood it is implemented as a tree with 32-way branching. That means that any vector index can be looked up in O(log_32 n) time. Appending and updating arbitrary elements also takes place in logarithmic time. For more details read Understanding Clojure's PersistentVector implementation.

Sequences that are implemented as mutable arrays of contiguous memory support constant-time lookups and modification (not including array resizing when array length grows). If O(log_32 n) does not seem appealing in comparison, consider that if you are using 32-bit integers as keys:

Which means that your keyspace will run out before your vector's tree becomes more than 7 layers deep. Up to that point operations will be O(7) or faster.

If your keys are JavaScript numbers, which are 64-bit floating point values, the largest integer that you can count up to without skipping any numbers is Math.pow(2, 53). The logarithm of that number is also small:

The hash_map and set implementations in Mori are also implemented as trees with 32-way branching. The sorted map and set structures are implemented as binary trees.

Equality, ordering, and hashing

Map and set lookups are based on either hashing or ordering comparisons, depending on the implementation. JavaScript does not have built-in hash functions - at least not that are accessible to library code. It also does not have defined ordering or equality for most non-primitive values. So Mori uses its own functions, which it gets from ClojureScript.

hash

Every Mori value has a hash that identifies its content:

If a value is recreated with the same content, it has the same hash:

Mori's hash function delegates to a specific hash algorithm for each of its data structures, which are ultimately based on internal algorithms that Mori uses to compute hashes for primitive JavaScript values:

Since Mori also accommodates arbitrary JavaScript values as map keys and set values, it also has a scheme for assigning hash values to other JavaScript objects.

It seems that Mori has a monotonically increasing counter for object hash values. The first object that it computes a hash for gets the value 1; the second object gets 2, and so on. To keep track of which object got which hash value, it stores the value on the object itself:

There are obvious hash-collision issues between non-Mori objects, numbers, and true. But Mori data structures can handle hash collisions. If a collection uses all of those types as keys it could end up with one hash bucket with three entries.

equals

Mori has its own equals function, which also comes from ClojureScript. As with hash, any two mori values that have the same content are considered to be equal:

It works on primitive JavaScript values:

When applied to non-Mori JavaScript objects, equals works the same way that the built-in === function does:

compare

ClojureScript has a compare function, which Mori uses in its sorted data structure implementations. Mori does not export the compare function. The function defines an ordering for Mori values and for primitive JavaScript values - but not for other JavaScript objects. So if you want to put non-Mori objects into a sorted structure you will have to use an implementation that accepts a custom comparison function.

Different map and set implementations

hash_map and set use a hash function for lookups and have O(log_32 n) lookup times; the sorted variants use comparison functions for lookups and have O(log_2 n) lookup times; and array_map is just an array of key-value pairs, so it uses only an equality function for lookups and has O(n) lookup times.

constructorinsert timelookup timehow keys/values are checked
hash_maplog_32 nlog_32 nhash(key) === hash(target_key) && equals(key, target_key)
array_map1nequals(key, target_key)
sorted_maplog_2 nlog_2 ncompare(key, target_key)
sorted_map_bylog_2 nlog_2 nlike sorted_map with a user-supplied comparison function
setlog_32 nlog_32 nhash(val) === hash(target_val) && equals(val, target_val)
sorted_setlog_2 nlog_2 ncompare(val, target_val)
sorted_set_bylog_2 nlog_2 nlike sorted_set with a user-supplied comparison function

hash and equals are provided by Mori. compare is part of ClojureScript, but is not exported by Mori.

Note that the sorted structures do not perform equals checks - instead they rely on a comparison function to determine whether values or keys are the same. On the other hand, the hash structures do perform equals checks to handle hash collisions.

array_map is unique among the map and set implementations in that it preserves the order of keys and values based on the order in which they were inserted. If a value is inserted into an array_map and then the map is converted to a sequence (for example, using mori.seq(m)) then the last key-value pair inserted appears last in the resulting sequence. The ordering in a new array_map is determined by the order of key-value pairs given to the constructor.

array_map is a good choice for small maps that are accessed frequently. The linear lookup time looks slower than other map implementations on paper. But there is no hashing involved and only simple vector traversal - so each of those n steps is faster than each of the log_32 n steps in a hash_map lookup.

The array_map implementation has an internal notion of the upper limit on an efficient size. Once the map reaches that threshold, inserting another key-value pair produces a hash_map instead of a larger array_map.4

The information here on implementations and running times mainly comes from An In-Depth Look at Clojure Collections.

Laziness

Many of the functions provided by Mori return what is called a lazy sequence. Being a sequence this is like a list and can be transformed using functions like map, filter, and reduce. What makes a sequence lazy is that it is not actually computed right away. Evaluation is deferred until some non-lazy function accesses one or more elements of the transformed sequence. At that point Mori computes and memoizes transformations.

The results of a lazy evaluation are cached. If the same sequence is forced again the map function will not be called a second time:

Mori runs just enough deferred computation to get whatever result is needed. It is often not necessary to compute an entire lazy sequence:

In the above case there is an intermediate lazy sequence that theoretically contains five values: the results of doubling values in the original set. But only the first two values in that sequence are needed. The other three are never computed.

Laziness means that it is possible to create infinite sequences without needing unlimited memory or time:

If you try this out in a REPL, such as node, be aware that when an expression is entered a JavaScript REPL will usually try to print the value of that expression, which has the effect of forcing evaluation of lazy sequences. If you enter a lazy sequence you will end up in an infinite loop:

The solution to this is to be careful to assign infinite sequences in var statements. That prevents the REPL from trying to print the sequence:

Powers of two are actually a terrible example of a lazy sequence: any element in that sequence could be calculated more quickly using Math.pow(). It just happens that powers of two are simple to demonstrate.

Algorithms that really do benefit from infinite sequences are those where computation of any element requires references to earlier values in the sequence. A good example is computing Fibonacci numbers.

This implementation uses the iterate function, which takes a function and an initial value. It creates an infinite sequence by repeatedly applying the given function. The given starting value is [0, 1], and each invocation of the given function combines the second value in the previous pair with the sum of the values in the previous pair; so the resulting sequence is: ([0, 1] [1, 1] [1, 2] [2, 3] [3, 5] ...). The map function is applied to that, taking the first value from each pair.

Using this sequence allows us to ask the questions:

A lazy sequence might also contain lines from a large file or chunks of data flowing into a network server. At the time of this writing I was not able to write a program that traversed a long sequence in constant space. But I have verified that this is possible in JavaScript. I may find a solution and post an update later.

Efficiency

In procedural code running a sequence through multiple operations that apply to every element would result in multiple iterations of the entire sequence. Because Mori operates lazily it can potentially collect transformations for each element and apply them in a single pass:

If applying map to a collection twice resulted in two iterations we would expect to see:

The fact that the first pass and second pass are interleaved suggests that Mori collects transformations and applies all transformations to a value at once. This is the advantage of lazy evaluation: it encourages writing code in a way that makes most logical sense rather than thinking about performance. You can write what are logically many iterations over a collection and the library will rearrange computations to minimize the actual work that is done.


  1. You might be wondering how Clojure handles polymorphism, since the convention is to use functions instead of methods. Clojure has a feature called protocols that permit multiple implementations for functions depending on the type of a given argument. Elsewhere in the functional world, Haskell and Scala provide a similar, yet more powerful feature, called type classes.
  2. When conj is used on a list it prepends elements (like cons) because prepending is much cheaper than inserting at other possible positions. Given a vector conj appends values. Appending is often desired, and appending to a vector is just as efficient as inserting at any other position. conj works on sets and maps too - but in those cases the idea of insertion position is not usually meaningful.
  3. It does look like ECMAScript 6 will add a Map implementation and a WeakMap to the language spec, both of which will take arbitrary objects as keys (only non-primitives in the WeakMap case). But those implementations will not be immutable!
  4. http://clojure.org/data_structures#Data Structures-ArrayMaps

Revision history

2013-11-12
Added section on installing Mori.