Practical Clojure

by Luke VanderHart & Stuart Sierra

On Amazon
ISBN: 978-1430272311
My Rating: 5/10

My impressions of Practical Clojure are mixed. On the one hand, the book provides a good introduction to Clojure with many code snippets you can execute in the REPL. But on the other hand, I completely missed the "practical" part. There is no example showing you how you could combine the introduced concepts to solve a "real" problem with Clojure. Plus there is no mention of how you would test your Clojure code.

An additional thing I didn't like about the book is that some parts feel like they were just copied from the API documentation. A link to the API docs would have been enough.

My notes

The Clojure Way

Clojure is a full-fledged dialect of the Lisp programming language.

Clojure is designed from the ground up to run within the Java environment and to easily integrate with Java.

A key characteristic of Clojure is that it is a functional language, which means that functions are the fundamental building-block for programs.

Imperative languages perform complex tasks by executing large numbers of instructions, which sequentially modify a program state until a desired result is achieved. Functional languages achieve the same goal through nested function composition – passing the result of one function as a parameter to the next.

The functional and imperative models of computation are formally equivalent, and therefore equally capable of expressing any computational task.

Pure functions are an important concept in functional programming. A pure function is one that depends upon nothing but its parameters, and does nothing but return a value. If a function reads from anywhere except its parameters, it is not pure. If it changes anything in the program state (known as a side effect), it is not pure either.

Functional programming is largely concerned with the careful management (or elimination) of state and side effects. Both are necessary for programs to do anything useful, but are regarded as necessary evils, and functional programmers do their best to use them as little as possible.

State is any data the program stores that can possibly be changed by more than one piece of code.

Side effects are anything a function does when it is executed, besides just returning a value. If it changes program state, writes to a hard disk, or performs any kind of IO, it has executed a side effect.

Clojure's goal is not to prevent programmers from using state or side effects, but to make it safe and straightforward.

Clojure by its nature encourages users to write code that is easy to read and debug. Explicit state and side effects mean that it is extremely easy to read over a program and see what it is doing, without even needing to always understand how.

One of the most important ways in which Clojure encourages purely functional style where possible is to provide a set of immutable data structures.

Clojure is not object-oriented.

Clojure strives for a separation between data and behavior.

It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures.

Alan Perlis

The Clojure Environment

The fundamental unit of a Clojure program is not the line, the keyword, or the class, but the form. A form is any unit of code that can be evaluated to return a value.

Literals are forms which resolve to themselves. Examples: strings, numbers.

Symbols are forms which resolve to a value. In Clojure, symbols are used to identify function arguments, and globally or locally defined values.

Composite forms use symmetrical parenthesis, brackets, or braces to make groups of other forms. When evaluated, their value depends on what type of form they are – brackets evaluate to a vector and braces to a map. Of special interest are composite forms which use parenthesis. These indicate a list, and lists in Clojure have a special meaning: they are evaluated as function calls. When a list is evaluated, it is the same as calling a function, and the evaluated value of the form is the return value from that function. The first item in the list is the function to call, and the rest of the items are arguments to pass to the function.

Special forms are a particular type of a composite form. They are used very similarly to a function call. The difference is that the first form of a special form is not a function defined somewhere, but a special system form that's built into Clojure. The first form in the list identifies the special form being used and the other forms in the list are like arguments to the special form.

By convention, Clojure source code files have the extension *.clj.

When you start a Clojure program, either by opening up a new REPL or running a source file directly, you are creating a new global environment. This environment lasts until the program is terminated, and contains all the information the program needs to run. Whenever you use def to define a Var, or define a function, it is added (or interned) to the global environment. After it is interned, it is available for reference from anywhere within the same environment.

Vars can be defined and bound to symbols using the def special form: (def var-name var-value).

Vars are not exactly like variables in other programming languages. Most importantly, once defined, they are not intended to be changed.

A symbol is an identifier that resolves to a value.

Just about anything you see in Clojure code that is not either a literal or a basic syntactic character (quotes, parenthesis, braces, brackets, etc.) is probably a symbol.

All function names in Clojure are symbols. When a function is called as part of a composite form, it first resolves the symbol to get the function and then applies it.

By convention, symbol names in Clojure are usually lower-case, with words separated by the dash character (-). If a symbol is a constant or a global program setting, it often begins and ends with the star character (*).

Vars in Clojure are all scoped by namespace. Every Var has a namespace as a (sometimes implicit) part of its name. When using a symbol to refer to a var, you can use a forward slash before the symbol name itself to specify the namespace.

A new namespace is declared with (ns new-namespace). If there is already a namespace of that name, it will just switch to it as the current namespace.

By convention, each Clojure source file has its own namespace – a ns declaration ought to be the first form within any Clojure file.

Controlling Program Flow

In Clojure, all functions are first-class objects. This means the following:

They can be dynamically created at any point during the execution of the program.
They aren't intrinsically named, but can be bound to symbols or to more than one symbol.
They can be stored as values in any data structure.
They can be passed to, and returned from, other functions.

The most basic way to define a function is with the fn special form, which returns a new first-class function when evaluated. In its simplest form, it takes two arguments: a vector (a bracketed list) of argument symbols and an expression which will be evaluated when the function is called. Example: (fn [x y] (* x y)). To use this function you can either bind it to a var (def my-mult (fn [x y] (* x y))) or use it within the same form: ((fn [x y] (* x y)) 3 4).

Clojure provides the defn form as a shortcut for defining a function and binding it to a symbol. The defn form takes the following arguments: a symbol name, a documentation string (optional), a vector of arguments, and an expression for the function body. Example:

(defn my-mult
  "Multiplies two values"
  [x y]
  (* x y))

You can check the doc-string of any function using the built-in doc function, which prints information on a function to the standard system output.

Arity refers to the number of arguments that a function accepts. In Clojure, it is possible to define alternate implementations for functions based on arity. Instead of passing a single vector for arguments and an expression for the implementation, you can pass multiple vector/expression pairs, each enclosed in parentheses. Example:

(defn square-or-multiply
  "Squares a single argument, multiplies two arguments"
  ([] 0)
  ([x] (* x x))
  ([x y] (* x y)))

To have a function with a variable arity, Clojure provides the special symbol & in the argument definition vector for function definitions. To use it, just add a & and a symbol name after any normal argument definitions in your argument definition vector. When the function is called, any additional arguments will be added to a seq, and the seq will be bound to the provided symbol.

Clojure provides a shorthand form for declaring a function, in the form of a reader macro. To declare a function in shorthand, use the pound sign, followed by an expression. The expression becomes the body of the function, and any percent signs in the body are interpreted as arguments to the function. Example: (def multiply #(* %1 %2))

Reader macros are specialized, shorthand syntax and can usually be identified because they are just about the only forms in Clojure that are not contained by matched parenthesis, brackets, or braces. They are resolved as the first step when parsing Clojure code and are transformed into their long form before the code is actually compiled. Reader macros are provided for a few extremely common tasks, and they can't be defined by users.

The most basic conditional form is the if form. Example:

(if (= 1 2)
  "Math is broken!"
  "Math still works.")

To choose between several different options, the cond form can be used. It takes any number of test/expression pairs. It evaluates the first test, and, if true, returns the result of the first expression. If false, it tries the next test expression, and so on. If none of the test expressions evaluate to true, it returns nil, unless you provide an :else keyword as the last expression, which serves as a catch-all. Example:

(defn weather-judge
  "Given a temperature in Celsius, comments on the weather"
  [temp]
  (cond
    (< temp 20) "It's cold"
    (> temp 25) "It's hot"
    :else "It's comfortable"))

let allows you to specify bindings for multiple symbols, and a body expression within which those symbols will be bound. The symbols are local in scope – they are only bound within the body of the let. They are also immutable.

The let form consists of a vector of bindings and a body expression. The binding vector consists of a number of name-value pairs. Example: (let [a 2 b 3] (+ a b))

Clojure provides no direct looping syntax. Instead it uses recursion in scenarios where it is necessary to execute the same code multiple times.

Tail-call optimization means that, if certain conditions are met, the compiler can optimize the recursive calls in such a way that they do not consume stack. The only requirement for a recursive call to be optimized is that the call occurs in tail position, i.e. it is the last thing a function does before returning.

In some functional languages tail call optimization happens automatically whenever a recursive call is in tail position. Clojure does not do this. In order to have tail recursion in Clojure, it is necessary to indicate it explicitly using the recur form. To use recur, just call it instead of the function name whenever you want to make a recursive call.

With Clojure, you can tell at a glance if a function is tail recursive, and it's impossible to make a mistake. If something uses recur, it's guaranteed never to run out of stack space due to recursion. And if you try to use recur somewhere other than in correct tail position, the compiler will complain.

The loop special form, used in conjunction with recur, provides the capability to make tail recursion even simpler by providing the means to declare and call a function at the same time. loop takes two forms: first, a vector of initial argument bindings (in name/value pairs) and an expression for the body. Whenever recur is used within the body of the loop, it will recursively "call" the loop again with any passed arguments rebound to the same names as in the look definition. Example:

(loop [i 0]
  (if (= i 10)
    i
    (recur (+ i 1))))

The most important and basic way to run a side effect is to use the do special form. It takes multiple expressions, evaluates them all and returns the value of the last one. This means that from a functional standpoint, all expressions but the last are ignored; they are present only as a means to execute side effects.

If you have a function that needs to perform side effects, Clojure also provides a way to run side effects directly from a function definition or directly inside the body of a loop without needing to explicitly use a do form. This is accomplished quite simply by providing multiple expressions as the body of a function or loop. The last expression will be evaluated for the return value of the function. All other expressions are evaluated solely for side effects.

Functions that take other functions as arguments are known as higher-order functions.

Closures are first-class functions that contain values as well as code. These values are those in scope at function declaration, preserved along with the function. Whenever a function is declared, the values locally bound to symbols it references are stored along with it. They are "closed over" (hence the name) and maintained along with the function itself. This means that they are then available for the function's entire lifespan.

Currying refers to the process of transforming a function into a function with fewer arguments by wrapping it in a closure.

In Clojure, any function can be curried using the partial function. partial takes a function as its first argument and any number of additional arguments. It returns a function that is similar to the provided function, but with fewer arguments; it uses the additional arguments to partial instead. Example: (def times-pi (partial * 3.12159))

Using the comp function it is possible to create new functions by combining existing functions instead of specifying an actual function body. comp takes any number of parameters: each parameter is a function. It returns a function that is the result of calling all of its argument functions, from right to left. Starting with the rightmost, it calls the function and passes the result as the argument to the next function and so on. Therefore, the function returned by comp will have the same arity as the rightmost argument to comp, and all the functions passed to comp except for the rightmost must take a single argument. The final return value is the return value of the leftmost function.

Data in Clojure

Clojure is a dynamically typed language, which means that you never need to explicitly define the data type of symbols, functions, or arguments in your programs.

Internally every Clojure type is represented by a Java class or interface.

Clojure supports entering literals directly as ratios using the / symbol. Ratios entered as literals will automatically be reduced.

Keywords are a special primitive data type unique to Clojure. Their primary purpose is to provide very efficient storage and equality tests. For this reason, their ideal usage is as the keys in a map data structure or other simple "tagging" functionality. They begin with a colon, for example: :keyword.

Clojure's collections adhere strictly to Clojure's philosophy of immutability. Operations which could be considered to "change" them actually return an entirely new immutable object with the changes in place.

To use a list literal as a data structure rather than having it be evaluated as code, just prefix it with a single quote character.

Vectors are similar to lists in that they store an ordered sequence of items. However, they differ in one important way: they support efficient, nearly constant-time access by item index. In general, they should be preferred to lists for most applications as they have no disadvantages compared to lists and are much faster. Vectors are represented as literals in Clojure programs by using square brackets. Example: def nums [1 2 3 4 5])

Vectors are functions of their indexes. This means you can call the vector like a function, and pass the index you want to retrieve. Example: (nums 0) will return 1.

Maps store a set of key/value pairs. They are represented by curly braces, enclosing an even number of forms. Example: (def my-map {:a 1 :b 2 :c 3}). Because the comma character is equivalent to whitespace in Clojure, it is often used to clarify key-value groupings without any change to the actual meaning of the map definition: (def my-map {:a 1, :b 2, :c 3}).

Struct maps allow you to predefine a specific key structure, and then use it to instantiate multiple maps which conserve memory by sharing their key and lookup information. They are semantically identical to normal maps: the only difference is performance.

To define a structure, use defstruct: it takes a name and a number of keys. Example: (defstruct person :first-name :last-name). To create instances of person, use the struct-map function: (def person1 (struct-map person :first-name "Hans" :last-name "Muster"))

Typically it is best to use normal maps first and refactor your program to use struct-maps only as an optimization.

Sets are collections of unique values and support efficient membership tests as well as common set operations such as union, intersection, and difference. The literal syntax for a set is the pound sign accompanied by the members of the set enclosed in curly braces. Example: (def languages #{"Java" "Lisp" "C++"})

Sequences

In Clojure, sequences are a unified way to read, write, and modify any data structure that is logically a collection of items. They are an abstraction, a common programming interface that generalizes behavior common to all collections and exposes it via a library of sequence functions.

Technically, the various types of data structures are not sequences themselves, but rather can be turned into sequences with the seq function. seq takes a single argument and creates a sequence view of it. Since almost all the sequence functions call seq on their arguments internally, there isn't much distinction in practice most of the time.

Every sequence, conceptually, consists of these two parts: the first item in the sequence, accessed by the first function, and another sequence representing all the rest of the items, accessed by the rest function.

The first/rest architecture of sequences is the basis for another important aspect of Clojure sequences: laziness. Lazy sequences provide a conceptually simple and highly efficient way to operate on amounts of data too large to fix in the system memory at once. Laziness is made possible by the observation that logically, the rest of a sequence doesn't need to actually exist, provided it can be created when necessary. Rather than containing an actual, concrete series of values, the rest of a lazy sequence can be implemented as a function which returns a sequence.

Be careful with infinite sequences. They are logically infinite, but care is required not to attempt to realize an infinite number of values. Just because a sequence is infinite doesn't mean you want to take an infinite amount of time to process it, or try to load the whole thing into memory at once.

Once a lazy sequence is realized, it will consume memory for all the values it contains, provided there is a reference to the realized sequence, until the reference to the realized sequence is discarded and the sequence is garbage collected.

Ensuring proper memory usage is a responsibility of the developer. Keeping track of memory usage means, primarily, Keeping track of references to lazy sequences.

State Management

As much as possible, Clojure advocates eliminating state from programs. In general, data should be passed and returned from functions in a purely functional way. It keeps things clean, protected, and parallelizable.

Clojure introduces a philosophical and conceptual paradigm shift in its treatment of things. It takes the standard notion of a thing (an object or a variable) and decomposes it into two separate concepts – state and identity. Every thing has both a state and an identity. State is a value associated with an identity at a particular moment in time, whereas identity is the part of a thing that does not change, and creates the link between many different states at many different times. The value of each state is immutable and cannot change. Rather, change is modeled by the identity being updated to refer to a different state entirely.

States are simply any of Clojure's data types. The only limitation on values that can be used as states is that they ought to be immutable.

Identities are modeled using one of the three reference types: refs, agents, and atoms. Each represents an identity and points to a state. They differ in the semantics of how they can be updated to refer to new state values and are useful in different situations.

Use refs to manage synchronous, coordinated state
Use agents to manage asynchronous, independent state
Use atoms to manage synchronous, independent state

One requirement common to many systems is that updates to certain identities be coordinated to ensure data integrity. Coordinated updates can't just take one identity into account – they have to manage the states of several interdependent identities to ensure that they are all updated at the same time and that none are left out. Clojure uses refs to provide coordinated state.

The alternative to coordinated state is independent state. Independent identities stand on their own and can have their state updated without concern for other identities. Updates to independent identities are usually faster than updates to coordinated identities; use them in preference to refs unless coordinated access is required. Clojure provides agents and atoms as independent identity reference types.

Synchronous updates to the values identities occur immediately, in the same thread from which they are invoked. The execution of the code does not continue until the update has taken place. Updates to the values of refs and atoms are both handled synchronously in Clojure.

Asynchronous updates occur at some unspecified point in the (near) future, usually in another thread. The code execution continues immediately from the point at which the update was invoked, without waiting for it to complete. Agents are Clojure's implementation of asynchronously updated identities.

Refs are Clojure's implementation of synchronous, coordinated identities. Each is a distinct identity, but operations on them can be run inside a transaction, guaranteeing that multiple identities whose values depend on each other are always in a consistent state. Refs provide access to Clojure's Software Transactional Memory (STM) system.

To create a ref, use the built-in ref function, which takes a single argument: the initial value of the ref. Example: (def my-ref (ref 5)). To get the current state of the ref, the deref function has to be used resp. the shorthand for it: the @ symbol. Examples: (deref my-ref) resp. @my-ref

All updates within a single transaction are committed to the application state atomically, at the same time. Consistency across ref values is guaranteed. Transactions are also isolated, which means that no transaction can see the effects of any other transaction while it is running. Additionally, transactions nest. If a transaction is initiated while already inside a transaction, the inner transaction simply becomes part of the outer transaction and will not commit until the outer transaction commits.

The most important form when working with refs is the dosync macro. dosync initiates a transaction and takes any number of additional forms. Each provided form is evaluated sequentially within a transaction. For actually updating the state of a ref, the most basic function is ref-set. It must be run within a transaction established by dosync. Example: (dosync (ref-set my-ref 6)). Another common function for updating refs is alter and takes a ref, a function, and any number of additional arguments. Example: (dosync (alter my-ref + 3))

The function provided to alter must be free of side effects and return a purely functional transformation of the ref value. This is because the function may be executed multiple times as the STM retries the transaction.

The distinction between ref-set and alter is not so much in their actual functionality as in what they imply to someone reading the code. alter usually indicates that the new value of the ref is a function of the old, that it is an update that is related to it in some way. ref-set implies that the old value is being obliterated and replaced with the new.

Atoms are Clojure's implementation of synchronous, uncoordinated identities. When updated, the change is applied before proceeding with the current thread and the update occurs atomically. All future dereferences to the atom from all threads will resolve to the new value. Reads of atoms are guaranteed never to block and updates will retry if the atom's value is updated while they are in progress, just like refs. In practice, atoms are used almost exactly like refs, except that since they are uncoordinated they do not need to participate in transactions.

To create an atom, use the atom function: (def my-atom (atom 5)).

As with refs, there are two ways to update the value of an atom: swap! and reset!. Examples: (swap! my-atom + 3) and (reset! my-atom 1)

Updates to the values of agents occur asynchronously in a separate system managed thread pool dedicated to managing agent state.

Agents can be created by using the agent function: (def my-agent (agent 5)).

The value of an agent can be updated by dispatching an action function using the send or send-off function. The call to send returns immediately in the current thread. At some undetermined point in the future, in another thread, the action function provided to send will be applied to the agent and its return value will be used as the new value of the agent. Example: (send my-agent + 3)

send-off has an identical signature and behavior as send. The only difference is that the two functions hint at different performance implications to the underlying agent runtime. Use send for actions that are mostly CPU-intensive, and send-off for actions that are expected to spend time blocking on IO. This allows the agent runtime to optimize appropriately.

Actions to any individual agent are applied serially, not concurrently. Multiple updates to the same agent won't overwrite each other or encounter race conditions.

Because action functions dispatched to agents occur asynchronously in a separate thread, they need a special error-handling mechanism. Agents have one of two possible failure modes :fail or :continue. If an exception is thrown while processing an action, and the agent's failure mode is :continue, the agent continues as if the action which caused the error had never happened, after calling an optional error-handler function. If its failure mode is :fail, the agent is put into a failed state, and will not accept any more actions until it is restarted.

The important feature of agents is not only that they protect state, but that updates to that state occur concurrently with the thread that initiated the update.

Validators are functions that can be attached to any state type and which validate any update before it is committed as the new value of the identity. If a new value is not approved by the validator function, the state of the identity is not changed.

To add a validator to an identity, use the set-validator! function. It takes two arguments: an identity and a function. The function must not have side effects, must take a single argument, and must return a boolean. Example: (set-validator! my-ref (fn [x] (< 0 x)))

Watches are functions which are called whenever a state changes. To add a watch, use the add-watch function.

Namespaces and Libraries

Namespaces are the means by which you divide your Clojure code into logical groups, similar to packages in Java. Almost every Clojure source file begins with a namespace declaration using the ns macro.

Fundamentally, a namespace is just a Clojure map. The keys of the map are Clojure symbols and the values are either Clojure Vars or Java classes.

Clojure namespaces follow similar naming conventions to Java packages: they are organized hierarchically with parts separated by periods. When translating between namespace names and file names, periods become directory separators and hyphens become underscores.

Example of creating a new namespace:

(ns com.example.library
  (:require [clojure.contrib.sql :as sql])
  (:use (com.example one two))
  (:import (java.util Date Calendar)
           (java.io File FileInputStream)))

This creates the new namespace com.example.library and automatically refers the clojure.core namespace. It loads the clojure.contrib.sql namespace and aliases it as sql. It loads the namespaces com.example.one and com.example.two and refers all the symbols from them into the current namespace. Finally, it imports the Java classes Date, Calendar, File, and FileInputStream.

By default, all definitions in a namespace are public, meaning they can be referenced from other namespaces.

There are two ways to create a private Var. The first is the defn- macro, which works exactly like defn but creates a private function definition. The second is to add :private metadata to the symbol you are defining: (def #^{:private true} *my-private-value* 123).

Always use the :only option of :use to make it clear which symbols you need from the other namespace.

Metadata

Clojure provides mechanisms to attach metadata to objects, but it has a very specific definition: metadata is a map of data attached to an object that does not affect the value of the object. Two objects with the same value and different metadata are considered equal.

You can attach metadata to a symbol or any of Clojure's built-in data structures with the with-meta function, and retrieve it using the meta function: (with-meta [1 2] {:about "A vector"}). Note that Java classes do not support metadata.

In general, collection functions (conj, assoc, and so on) are supposed to preserve metadata, while sequence functions (cons, take, and so on) are not. Be careful with operations on data structures that have metadata, and don't assume that metadata will be preserved. Always test first.

Whenever you consider using metadata, think very carefully about its semantics: metadata is not part of the value of an object. In general, any data that is relevant to users of your application should not be considered metadata. Metadata is information that only you, the programmer, care about.

Multimethods and Hierarchies

Clojure multimethods provide runtime polymorphic dispatch. That is, they permit you to define a function with multiple implementations. At runtime, the implementation that executes is determined based on the arguments to the function.

Clojure multimethods support multiple dispatch, meaning the implementation can be determined by any and all arguments to the function. Also, the dispatch can be based on any feature of the arguments, not just type.

Multimethods are created with defmulti and implemented with defmethod: (defmulti name dispatch-fn) resp. defmethod multifn dispatch-value [args...] & body).

You call a multimethod like an ordinary function. When you call it, the dispatch function is immediately called with the same arguments that you gave to the multimethod. The value returned by the dispatch function is called the dispatch value. Clojure then searches for a method (defined with defmethod) with a matching dispatch value.

To support dispatching on multiple arguments, the dispatch function has to return a vector.

To provide a default method implementation, you can use the :default keyword as the dispatch value.

In Clojure, type hierarchies are completely independent from method implementations, so they are more flexible than class-based inheritance. They can support almost any combination of relationships, including multiple inheritance and multiple roots.

(derive child parent) creates an is-a relationship between child and parent. The child and parent are referred to as tags, because they are used to identify a type or category. Tags may be either keywords or symbols.

Once the relationships are defined, they can be queried with the isa? function: (isa? child parent).

Since Clojure's hierarchies permit multiple inheritance, situations may arise in which there is more than one valid choice for a multimethod. Clojure does not know which to choose, so it throws an exception.

Multimethods are very flexible, but that flexibility comes at a cost: they are not very efficient. As a result, multimethods are probably not suitable for "low-level" functions that get called very frequently. That's why none of Clojure's built-in functions are multimethods.

Java Interoperability

Clojure is built on Java not only because it is a portable, feature-rich platform, but because thousands of libraries are written in Java. Clojure can leverage all this existing code to get past the "library problem" that plagues most new programming languages.

Clojure is designed to make working with Java libraries as seamless as possible.

Clojure uses just three special forms to handle all interactions with Java classes. The new special form creates an instance of a class: (new String "hello"). The . (dot) special form calls Java methods or fields: (. Integer valueOf "42"). To set the value of public fields, you can use the set! special form like this: (set! (. target name) value).

Java method calls can be made to look more like Clojure function calls by putting the method name at the head of a list, prefixed by a period: (.method object arguments).

New instances of Java classes can be constructed by placing the class name at the head of a list, followed by a period: (String. "hello").

Parallel Programming

Clojure offers a variety of techniques for introducing concurrency, ranging in levels of abstraction from high-level concepts such as agents all the way down to JVM primitives, accessible through the Java interoperability features.

In their execution aspect, agents are run in thread pools managed by the Clojure runtime. Actions sent to agents will be queued and then executed in one of two thread pools, depending on whether the action was dispatched using the send or send-off function.

The thread pool used by the send function is sized and tuned to match the number of physical processors available to the JVM. This optimizes throughput for CPU-intensive actions: the number of actions executing concurrently will be roughly equal to the number of physical CPUs. If an action is dispatched while all threads in the thread pool are busy, it is queued and will execute in turn.

The thread pool used by the send-off function is not limited to the number of physical processes available, but can contain an arbitrarily larger number of processes. The reasoning behind this is that high-latency tasks such as accessing a remote resource will spend most of their time waiting. As such, it's more efficient to allow many processes to time-share on the same processor.

There are certain functions and macros in the Clojure standard library which initiate parallel processing. They are extremely convenient, because they require no work to set up and are often a drop-in replacement for their serial counterparts. There are three built-in concurrent tools: pmap, pvalues, and pcalls.

Whether using a concurrency function is beneficial depends on the "weight" of the execution involved. If it's lightweight, don't bother. The cost of setting up the parallel execution exceeds the benefit.

A Clojure future represents a computation, running in a single thread. As soon as the future is created, a new thread is created and starts executing the computation. When the computation finishes, the thread is recycled and the resulting value can be retrieved from the future by dereferencing it. Alternatively, if the computation is not yet finished when the future is dereferenced, the dereferencing thread will block until the computation is complete. A future is created by using the future macro: (def my-future (future (* 100 100))).

A promise is a value that may not yet exist. If a promise is dereferenced before its value is set, the dereferencing thread blocks until a value is delivered to the promise. When a promise's value is set, all threads waiting for a promise get the value and are released. You can create a promise by calling the promise function with no arguments: (def my-promise (promise)). To deliver a value to a promise, use the deliver function: (deliver my-promise 5).

Macros and Metaprogramming

Metaprogramming is the use of code to modify or create other code.

In Clojure, all code is data and all data is code. This property is called homoiconicity, which means that the language's code is represented in terms of the language's data structures.

Macros are the primary means of metaprogramming in Clojure. A Clojure macro is a construct which can be used to transform or replace code before it is compiled.

When you use a macro in your code, what you are really telling Clojure to do is to replace your macro expression with the expression returned by the macro.

To create a macro, use the defmacro macro. This defines a function and registers it as a macro with the Clojure compiler. From then on, when the compiler encounters the macro, it will call the function and use the return value instead of the original expression. Example:

(defmacro triple-do [form]
  (list 'do form form form))

Using macros can be somewhat mind-bending, since you have to keep in mind not only the code you're writing, but the code you're generating. Clojure provides two functions that help debug macros as you write them: macroexpand and macroexpand-1. They both take a single quoted form as an argument. If the form is a macro expression, they return the expanded result of the macro without evaluating it, making it possible to inspect and see exactly what a macro is doing. macroexpand expands the given form repeatedly until it is no longer a macro expression, while macroexpand-1 expands the expression only once.

The best way to use macros is to use them as little as possible, because they are more difficult to reason about then normal code, and if a problem occurs, it can be much trickier to debug.

Datatypes and Protocols

Datatypes and protocols are roughly analogous to Java's classes and interfaces, but they are more flexible.

A protocol is a set of methods. The protocol has a name and an optional documentation string. Each method has a name, one or more argument vectors, and an optional documentation string. There are no implementations, no actual code. Example:

(defprotocol MyProtocol
  "This is my new protocol"
  (method-one [x] "First method")
  (method-two ([x] [x y]) "Second method"))

A protocol is a contract, a set of capabilities. An object or a datatype can declare that it supports a particular protocol, meaning that it has implementations for the methods in that protocol.

Conceptually, a protocol is similar to a Java interface. But there is an important difference: protocols have no inheritance, you cannot create "subprotocols" like Java's subinterfaces.

A datatype is a named record type, with a set of named fields that can implement protocols and interfaces. They are created with defrecord. Example: (defrecord Employee [name room])

You can construct an instance of a datatype by adding a dot to the end of its name: (def emp (Employee. "Joe" 304)). Datatype instances behave like Clojure maps.

A datatype, by itself, just stores data. A protocol, by itself, doesn't do anything at all. Together they form a powerful abstraction. Once a protocol has been defined, it can be extended to support any datatype. We say the datatype implements the protocol. At that point, the protocol's methods can be called on instances of that datatype.

When creating a datatype with defrecord, you can supply method implementations for any number of protocols. The syntax is:

(defrecord name [fields...]
  SomeProtocol
    (method-one [args] ... method body ...)
    (method-two [args] ... method body ...)
  AnotherProtocol
    (method-three [args] ... method body ...))

A datatype is not required to provide implementations for every method of its protocols or interfaces. Methods lacking an implementation will throw an AbstractMethodError when called on instances of that datatype.

Performance

The number one rule when evaluating the performance of any programming language or algorithm is: test! Do not assume that one technique will necessarily be faster because it appears to have fewer steps or uses fewer variables.

The best rule-of-thumb is this: write your code in the simplest, most direct way possible, then test to see if it meets your performance expectations. If it does not, use profiling tools to identify the critical sections that matter most to performance, and tweak or rewrite those sections until they meet your performance goals.

One simple technique for speeding up large, complex functions is memoization, which is a form of caching. Each time it is called, a memoized function will store its return value in a table, along with the input arguments. If that function is called again with the same arguments, it can return the value stored in the table without repeating the calculation. Clojure has built-in support for memoization with the memoize function, which takes a function as its argument and returns a memoized version of that function.

Memoization is a classic example of trading increased memory usage for faster execution time.

Clojure allows you to add type hints to symbols and expressions to help the compiler to avoid reflective method calls. A type-hinted symbol would be written as #^{:tag hint} symbol, which is usually abbreviated as #^hint symbol. The type hint is a Java class name.

In general, you should write your code first without any type hints, then set *warn-on-reflection* and add them only where necessary for performance.