9 December 2012

Are all Map Functions the same?
If you like this content,
here's my book

map is a function that's commonly implemented in functional programming languages But what the hell is map, and is it always the same thing? Map can mean different things in different programming languages, and the differences are both significant and subtle. This post talks about how map is implemented in Haskell, Java 8, F# and Scala and what the differences tell us about the different languages themselves.

What is map?

The basic idea is that if you have a bunch of values, you frequently want to perform an operation on all of them. Now obviously there's a pattern to doing this - outlined in the Java code below - for every value in your input, you do something to it, and add it to your output collection. Since this is a common operation you ideally want to have a way to abstract it, so that you don't have to write the same boilerplate over and over again. What you ideally want to do is take a function that applies to an individual value, and convert it to a function that applies to a bunch of these values. The function that performs that operation is called map.

List output = new ArrayList();
for (Integer value : input) {
    output.add(value + 1);
}

A function that takes another function as an argument, or returns one, is called a higher order function

Map in Haskell

Heavy use of map is a very functional style of programming - you've got a function that takes a function and returns a function! It consequently only seems natural to start our exploration with Haskell - a pure functional programming language. Don't worry if you don't know any haskell - I'll explain what's going on! If you look at the source code [0] this is how its implemented:
map _ [] = []
map f (x:xs) = f x : map f xs

You can read this code in English as somewhat equivalent to:

  1. If you've got an empty list, then return an empty list
  2. Otherwise, take the first element in the list (x) and apply the function f to it. Also, apply yourself to the tail of the list, and then put the new head and the new tail back together.

This is a very simple definition of map, but this description ignores some subtleties! Firstly Haskell is a Lazily Evaluated programming language. What that means in general is that values are only evaluated when they're needed. What it means in this specific case is that your map function will automatically work over infinitely long lists. The other thing to notice is that in Haskell functions are first class citizens - that 'f' is a value, just like 1 or "a string".

One of the downsides of the way that map is defined in haskell is that its specific to the datatype that its defined over.[4] You may have noticed earlier that I used the rather unscientific term 'bunch of values' in my explanation of what map is - well that's because the basic concept can apply to lots of different data structures - sets, hashtables, etc. The way that map is defined here only operates over lists[1]. There is also a lazily evaluated equivalent - that simply returns an object that looks like a collection, but really wraps lazily calling the function on an underlying collection.

Map in F#

F# is a programming language that runs on the .NET platform. It supports both functional and object oriented concepts, and is thus described as 'multi-paradigm'. In order ot implement the map function for the different types of collections that can be found F# has a series of modules with different map implementations. So you end up with Set.map, List.map etc. These map functions all return concrete implementations of the same collection type. That is to say that the Set implementation returns a set, the List implementation returns a list. If you were to chain operations of this type you would end up with intermediate values and conversions at the end. This is less than ideal because you're creating unnecessary intermediate collections and also because it doesn't address the issue of converting around between collection types [2].

There is a way to avoid this, which is to use the sequences module. This lazily evaluates sequences, and offers a way to chain operations, by calling Seq.map, Seq.filter etc, before finally constructing a list or a set from the sequence.

Map in Scala

Scala is another multi-paradigm programming language, but it takes quite a different approach to its collections framework. Firstly, it separates collections into mutable and immutable variants. A datastructure is immutable if you can't change its state after its been created [3]. it also has parallel and serial versions of a lot of common collections algorithms. There is also the overarching design goal of trying to keep common interfaces that operate over the different variants. For example there is a trait called GenSeq which defines the API that lets you operate on a sequence - be it serial or parallel, immutable or mutable.

This has an interesting effect on the way that its map function works. For a start its defined quite high up in the type hierachy - it operates over not just Lists, but anything that can be iterated over. You may also notice something a bit weird in the type declaration: implicit bf: CanBuildFrom[Iterable[A], B, That]). What the heck is that?

CanBuildFrom is a bit like a factory - it tells the function what collection to produce. This means that you can call map on a list, and convince your runtime to produce a set without having to explicitly call Set.map everywhere. In order to avoid passing the CanBuildFrom into every function its defined as an implicit - a parameter that can get passed according to language defined rules without requiring the programmer to write out the argument call. This gives the collection framework a lot of flexibility in terms of use - but adds a significant conceptual burden if you're trying to understand exactly what its type definitions mean.

Map in Java 8

Java has long been a generally object oriented language whose designers and users eschewed functional concepts. However, Java 8 will add lambda expressions and alters the collections framework in order to introduce some functional concepts into the language. Those include an implementation of the map function. Java 8 introduces the concept of a 'Stream' - a sequence of data that can processed. Its higher order collections functions are generally called off of the stream abstraction - this is where map lives. Map itself is lazily evaluated, but this isn't a core language feature as in Haskell. The stream framework is designed such that functions that return another stream are lazy, and functions that return a concrete type are eager. Consider the following example:

int sum = shapes.stream()
                .filter(s -> s.getColor() == BLUE)
                .map(s -> s.getWeight())
                .sum();

Shapes is a collection, and the call to stream gives you a stream instance. Both the filter and map functions return streams and are thus lazily evaluated to enable efficient chaining. The final function call - sum() is eagerly evaluated in order to force a concrete int value as the result. Now if you wished to transform into a concrete List, then you could change the last line to:

.into(new ArrayList<>());

This performs the same basic job as 'CanBuildFrom' in Scala, but is explicit rather than implicit and only gets called at the end of a chained sequence of operations.

Conclusions

In all cases the implementation of map is performing the same underlying operation, however, the difference in implementation reflects the differing philosophies of different programming languages. Haskell has a powerful primitive language features in the form of lazy evaluation. Furthermore the general philosophy within functional programming language is to put less emphasis on abstracting away primitive types and more emphasis on writing simple and direct code.

F# and Scala both offer map implementations over a variety of concrete types. F# is more explicit, but also a lot simpler. Scala picks an almost ideal way of abstracting the concept of what concrete type results from the map function - but at the expense of complexity. Whether this complexity is really an improvement is a question of design principles - are you willing to live with a more complex solution in order for it to be marginally better? Java 8 offers an approach with lazy evaluation, but utilizes an observation that when operating on collections you chain lazy operations and finish with an eager operation. It abstracts the API over parallel and serial collections, but instead of the complexity of implicits uses an explicit .into call to convert to build the desired concrete collection type.



[0] Yes - I know this is technically how 'GHC' implements map, rather than 'Haskell' - but if you're about to complain about that, then you're missing the point.
[1] You could implement a more general map function using a type class, but this isn't how the default map definition in prelude is defined.
[2] There's also Select, but it doesn't seem idiomatic f#.
[3] Or possibly that nothing outside of the object can observe that its state has changed after creation.
[4] There is a function in haskell's prelude that offers abstraction in this regard - and that is fmap. fmap is a generalisation of map which applies to functor instances.

Read More: Why /you/ should talk at conferences



comments powered by Disqus