Clojure Programming
by Chas Emerick, Brian Carper & Christophe Grand
- On Amazon
- ISBN: 978-1449394707
From the three Clojure books I have read so far – Practical Clojure, The Joy of Clojure, and this one – I found Clojure Programming to be the best. I especially liked that the book not only covers the theory but also has an entire part dedicated to practical matters like working with relational databases or using Clojure for web development. This is something I missed from the other two books.
My notes
Preface
Clojure is a dynamically and strongly typed programming language hosted on the Java Virtual Machine (JVM).
Down the Rabbit Hole
Clojure code can use any Java library, Clojure libraries can in turn be used from Java, and Clojure applications can be packaged just like any Java application and deployed anywhere other Java applications can be deployed.
Clojure is part of the Lisp family.
Clojure has a REPL too, but it differs from many other languages' REPLs in that it is not an interpreter or otherwise using a limited or lightweight subset of Clojure: all code entered into a Clojure REPL is compiled to JVM bytecode as part of its evaluation, with the same result as when code is loaded from a Clojure source file.
All Clojure code is made up of expressions, each of which evaluates to a single value. The rules for that evaluation are simple:
- Lists (denoted by parentheses) are calls, where the first value in the list is the operator and the rest of the values are parameters. The first element in a list is often referred to as being in function position. Call expressions evaluate to the value returned by the call.
- Symbols evaluate to the named value in the current scope – which can be a function, a named local, a Java class, a macro, or a special form.
- All other expressions evaluate to the literal values they describe.
Lists in Lisps are often called s-expressions or sexprs – short for symbolic expressions due to the significance of symbols in identifying the values to be used in calls denoted by such lists. Valid s-expressions that can be successfully evaluated are often referred to as forms: e.g. (if condition then else)
is an if
form, [60 80 100]
is a vector form. Not all s-expressions are forms: (1 2 3)
is a valid s-expression but evaluating it will produce an error because the first value in the list is an integer, which is not callable.
Clojure code is composed of literal representations of its own data structures and atomic values; this characteristic is formally called homoiconicity, or more casually, code-as-data.
Rather than defining a syntax that will be transformed into an AST, Clojure programs are written using Clojure data structures that represent that AST directly.
The Clojure code you write in the REPL or in a text source file isn't text at all: you are programming using Clojure data structure literals.
Keywords evaluate to themselves, and are often used as accessors for the values they name in Clojure collections and types, such as hash maps and records. They are functions that look themselves up in collections passed to them. Syntactically, keywords are always prefixed with a colon, and can otherwise consist of any nonwhitespace character. A slash character (/
) denotes a namespaced keyword, while a keyword prefixed with two colons (::
) is expanded by the reader to a namespaced keyword in the current namespace. Example:
(def person {:firstname "Daniel"
:lastname "Hofstetter"})
(:lastname person)
;= "Hofstetter"
Like keywords, symbols are identifiers, but they evaluate to values in the Clojure runtime they name. Example: (average [1 2 3])
. average
here is a symbol, referring to the function held in the var named average
. Symbols must begin with a non-numeric character, and can contain *
, +
, !
, -
, _
, and ?
in addition to any alphanumeric characters. Symbols that contain a slash (/
) denote a namespaced symbol and will evaluate to the named value in the specified namespace.
Clojure directly supports rational numbers, also called ratios. Rational number literals must always be two integers separated by a slash (/
).
The Clojure reader treats strings prefixed with a hash character as regular expression literals.
There are three comment types:
- Single-line comments are indicated by prefixing the comment with a semicolon (
;
). -
Form-level comments are available using the
#_
reader macro. This cues the reader to elide the next Clojure form following the macro:(read-string "(+ 1 2 #_(* 2 2) 8)") ;= (+ 1 2 8)
. - The third way to comment code in Clojure is the
comment
:(comment (println "hello"))
. They are not elided from the reader's output in the way that#_
impacts the forms following it. Thus,comment
forms always evaluate tonil
.
Whether you use commas or not is entirely a question of personal style and preference. They are generally used only when doing so enhances the human readability of the code in question. This is most common in cases where pairs of values are listed, but more than one pair appears per line: (create-user {:name new-username, :email email})
.
The reader provides syntax for the most commonplace Clojure data structures:
'(a b :name 12.5) ;; list
['a 'b :name 12.5) ;; vector
{:name "Chas" :age 31} ;; map
#{1 2 3} ;; set
Since lists are used to denote calls in Clojure, you need to quote ('
) the list literal in order to prevent the evaluation of the list as a call.
All Clojure code is defined and evaluated within a namespace. Fundamentally, they are dynamic mappings between symbols and either vars or imported Java classes. One of Clojure's reference types, vars are mutable storage locations that can hold any value. Within the namespace where they are defined, vars are associated with a symbol that other code can use to look up the var, and therefore the value it holds. Vars are defined using the def
special form: (def x 1)
.
The current namespace is always bound to *ns*
.
All classes in the java.lang
package are imported by default into each Clojure namespace, and so can be referred to without package qualification; to refer to unimported classes, a package-qualified symbol must be used. In addition, namespaces by default alias all of the vars defined in the primary namespace of Clojure's standard library, clojure.core
.
Special forms are Clojure's primitive building blocks of computation, on top of which all the rest of Clojure is built.
quote
suppresses evaluation of a Clojure expression. The most obvious impact of this relates to symbols, which, if they name a var, evaluate to that var's value. With quote
, evaluation is suppressed, so symbols evaluate to themselves. There is reader syntax for quote
; prefixing any form with a quote character ('
) will expand into a usage of quote
.
do
evaluates all of the expressions provided to it in order and yields the last expression's value as its value. All other values are discarded, although their side effects do occur.
let
allows you to define named references that are lexically scoped to the extent of the let
expression. Said another way, let
defines locals. All locals are immutable. Example:
(defn hypot
[x y]
(let [x2 (* x x)
y2 (* y y)]
(Math/sqrt (+ x2 y2))))
Clojure provides convenience functions for accessing the first
, second
, and last
from a sequential collection.
The nth
allows you to pluck any value from a sequential collection using an index into that collection: (nth coll 3)
.
Vectors are functions of their indices: (coll 3)
.
Clojure destructuring provides a concise syntax for declaratively pulling apart collections and binding values contained therein as named locals within a let
. There are two flavors of destructuring: one that operates over sequential collections, and another that works with maps.
Example of sequential destructuring:
(def v [42 "foo" 99.2 [5 12]])
(let [[x y z] v]
(+ x z))
;= 141.2
By using &
you can gather values that lay beyond the positions you've named in your destructuring form into a sequence:
(let [[_ & rest] v]
rest)
;= ("foo" 99.2 [5 12])
You can establish a local binding for the original collection being destructured by specifying its name via the :as
option within the destructuring form:
(let [[x :as original] v]
(conj original x))
;= [42 "foo" 99.2 [5 12] 42]
Example of map destructuring:
(def m {:a 5 :b 6
:c [7 8 9]
"foo" 88})
(let [{a :a b :b} m]
(+ a b))
;= 11
Indices into vectors, strings, and arrays can be used as keys in a map destructuring form: (let [{x 1} [1 2 3]] x) ;= 2
You can use an :or
pair to provide a defaults map; if a key specified in the destructuring form is not available in the source collection, then the defaults map will be consulted:
(let [{k :unknown x :a
:or {k 50}} m]
(+ k x))
;= 55
You can use the :keys
, :strs
, and :syms
options to specify keyword, string, and symbol keys into the source map and the names the corresponding values should be bound to in the let
form without repetition:
(let [{:keys [a b]} m]
(+ a b)
;= 11
Functions are created with the fn
special form: (fn [x] (+ 10 x))
Functions with multiple arities can be created as well:
(def square-or-multiply (fn self
([x] (self x x))
([x y] (* x y))))
(square-or-multiply 3)
;= 9
(square-or-multiply 2 3)
;= 6
defn
is a macro that encapsulates the functionality of def
and fn
so that you can concisely define functions that are named and registered in the current namespace with a given name: (defn plus-10 [x] (+ 10 x))
Functions can optionally gather all additional arguments used in calls to it into a seq. Such functions are called variadic, with the gathered arguments usually called rest arguments or varargs: (defn sum [& varargs] (apply + varargs))
.
if
is Clojure's sole primitive conditional operator. Its syntax is simple: if the value of the first expression in an if
form is logically true, then the result of the if
form is the value of the second expression. Otherwise, the result is the value of the third expression, if provided. Clojure conditionals determine logical truth to be anything other than nil
or false
.
Clojure provides a number of useful imperative looping constructs, including doseq
and dotimes
, all of which are built upon recur
. recur
transfers control to the local-most loop head without consuming stack space, which is defined either by loop
or a function.
(loop [x 5]
(if (neg? x)
x
(recur (dec x))))
;= -1
Functional Programming and Concurrency
Functional Programming
In Clojure, functional programming means:
- A preference for working with immutable values; this includes the use of immutable data structures that satisfy simple abstractions, rather than mutable bags of state, and the treatment of functions as values themselves, enabling higher-order functions
- A preference for declarative processing of data over imperative control structures and iteration
- The natural incremental composition of functions, higher-order functions, and immutable data structures in order to solve complex problems by working with higher-level (or, right-level) abstractions
Because the state held by mutable objects can change, and potentially be changed without your knowledge, using them when an immutable alternative is available can only be described as dangerous.
map
accepts a single function argument, followed by one or more collections, and returns a sequence of the results of applying that function to successive members of the provided collections. More formally, any usage of map
of the form (map f [a b c])
is equivalent to [(f a) (f b) (f c)]
, usage of the form (map f [a b c] [x y z])
is equivalent to [(f a x) (f b y) (f c z)]
, and so on.
Flexibly producing any value from the application of a function to a collection is called a reduction in many circles. Clojure implements this concept via a higher-order function called reduce
. Given a function and a collection to operate over, reduce
applies the function to each of the items of the collection, accumulating and returning a single result value. On the fist "iteration" of reduce
's operation, it applies the provided function to the first two items in the collection, obtaining a result. After that, reduce
applies the function to the previous result and the next item in the collection to obtain its next result, and so on. You can also optionally provide an initial value to "seed" the reduction: (reduce + 50 [1 2 3]) ;= 56
.
Function application is the invocation of a function with a sequence of arguments, in contrast to function calls indicated by syntactic convention. It is available in Clojure via apply
. As a convenience, apply
allows you to prefix the argument sequence with any number of explicit arguments: (apply * 2 [1 2 3]) ;= 12
Whereas function application is when a function is applied to its arguments held in a sequential collection – apply
must be provided with all arguments to that function – partial application is where you can provide only some of the arguments to a function, yielding a new function that can be called with the remainder of the arguments to the original function later. partial
provides for partial application in Clojure:
(def apply-vat (partial * 1.08))
(apply-vat 10)
;= 10.8
Function composition has a very distinct meaning in functional programming circles: given any number of functions, it is the creation of a function that applies its arguments to one of the provided functions, using each successive result as the argument to the next provided function, usually in the reverse order that the functions were originally specified. It is implemented in Clojure via comp
: (def negated-sum-str (comp str - +))
Expressions that involve only pure functions are said to be referentially transparent; meaning, such expressions are semantically indistinguishable from their results.
Collections and Data Structures
All data structures in Clojure participate in the common collection abstraction. A collection is a value that you can use with the set of core collection functions:
-
conj
to add an item to a collection -
seq
to get a sequence of a collection -
count
to get the number of items in a collection -
empty
to obtain an empty instance of the same type as a provided collection -
=
to determine value equality of a collection compared to one or more other collections
The sequence abstraction defines a way to obtain and traverse sequential views over some source of values: either another collection, or successive values that are the result of some computation. Sequences – often called "seqs" – involve a couple of operations in addition to the base provided by the collection abstraction:
-
first
,rest
andnext
provide ways to consume sequences -
lazy-seq
produces a lazy sequence that is the result of evaluating an expression
The results of rest
and next
are identical for most values. They only differ in their treatment of sequences containing zero or one value: rest
will always return an empty sequence, whereas next
will return nil
if the resulting sequence is empty.
Generally, a seq is produced by a collection, either explicitly via seq
or via another function (like map
) calling seq
on its argument(s) implicitly. However, there are two ways to create a seq: cons
and list*
. cons
accepts two arguments, a value to serve as the head of the new seq, and another collection, the seq of which will serve as its tail: (cons 0 (range 1 5)) ;= (0 1 2 3 4)
. list*
is just a helper for producing seqs with any number of head values, followed by a sequence: (list* 0 1 2 (range 3 5)) ;= (0 1 2 3 4)
.
The process of accessing a lazy sequence is called realization; when all values in a lazy sequence have been computed, it is said that the sequence has been fully realized.
Lazy sequences are persistent: an item is computed once, but is still retained by the sequence. This means that as long as you maintain a reference to a sequence, you'll prevent its items from being garbage-collected. This type of fault is called head retention and can put pressure on the VM that will impact performance, potentially even causing an out of memory error if the realized portion of a sequence grows too large.
The associative abstraction is shared by data structures that link keys and values in some way. It is defined by four operations:
-
assoc
, which establishes new associations between keys and values within the given collection -
dissoc
, which drops associations for given keys from the collection -
get
, which looks up the value for a particular key in a collection -
contains?
, which is a predicate that returnstrue
only if the collection has a value associated with the given key
In many cases, dealing with indices induces complexity: either through index arithmetic and bounds-checking or through unneeded indirections. Outside of special circumstances, we feel safe in saying that excessive indexed lookup or modification is a code smell.
The indexed abstraction consists of a single function, nth
, which is a specialization of get
. They differ on how they deal with out-of-bounds indices: nth
throws an exception while get
returns nil
.
Stacks are collections that classically support last-in, first-out semantics. Clojure doesn't have a distinct stack data structure, but it does support a stack abstraction via three operations:
-
conj
, for pushing a value onto the stack -
pop
, for obtaining the stack with its top value removed -
peek
, for obtaining the value on the top of the stack
The set abstraction requires disj
, which removes values from the given set: (disj #{1 2 3} 3 1) ;= #{2}
Collections that participate in the sorted abstraction guarantee that their values will be maintained in a stable ordering that is optionally defined by a predicate or implementation of a special comparator interface. This allows you to efficiently obtain in-order and reverse-order seqs over all or a subrange of such collections' values. These operations are provided by:
-
rseq
, which returns a seq of a collection's values in reverse, with the guarantee that doing so will return in constant time -
subseq
, which returns a seq of a collection's values that fall within a specified range of keys -
rsubseq
, the same assubseq
, but the seq is in reversed order
A comparator is a two-argument function that returns a positive integer when the first argument is greater than the second one, a negative integer when the first is less than the second, and zero when both arguments are equal.
Clojure collections and the most common types of keys used in associative collections are also functions with the semantics of get
or nth
(as appropriate for the concrete type of collection involved).
Two characteristics shared by all of Clojure's data structures: they are immutable, and they are persistent.
To achieve persistence without sacrificing performance, Clojure's data structures implement structural sharing. That is, they never perform deep copies to satisfy an operation; rather, only the portions of the data structure affected by a change are swapped out, while references are retained to those parts that are uninvolved.
Transient collections are the dual to those that are persistent: while persistent collections make guarantees about the integrity of prior revisions of a value, transient collections do not. After modification, any reference to an old revision of a transient collection cannot be relied upon; it may be valid, it may be the new value, or it may be garbage. In stark contrast to nearly everything else in Clojure, transient collections are mutable:
(def x (transient []))
(def y (conj! x 1)) ; conj! is the transient analog to conj
(count y)
;= 1
(count x)
;= 1
All transient functions end with an exclamation mark to denote their invalidating behavior with regard to the collection passed as their first argument. Once you use any of these functions on a transient collection, that collection should never be touched again – even for read-only purposes.
Because transients are solely an optimization, they should be used with discretion and strictly locally, usually local to a single function.
Metadata can be attached to any Clojure data structure, sequence, record, symbol, or reference type, and always takes the form of a map:
(def a ^{:created (System/currentTimeMillis)}
[1 2 3])
(meta a)
;= {:created 1335770913770}
As a convenience, metadata that contains only slots whose keys are keywords and whose value is Boolean true
can be provided in a short form: (def a ^:private [1 2 3])
.
Generally, when you find the code you are writing painful or awkward, you are most certainly fighting the language, working against its grain. Chances are that in such cases you'll be able to find a more pleasant solution by rethinking your data structures.
Since immutability precludes back references, you cannot rely on them to navigate trees. A typical functional solution to this problem is zippers, an implementation of which can be found in the clojure.zip
namespace.
Concurrency and Parallelism
A delay is a construct that suspends some body of code, evaluating it only upon demand, when it is deref
erenced. Delays only evaluate their body of code once, caching the return value. Thus, subsequent accesses using deref
will return instantly, and not reevaluate that code:
(def d (delay (println "Running...")
:done!))
(deref d)
; Running...
;= :done!
(deref d)
;= :done!
The deref
abstraction is defined by Clojure's clojure.lang.IDeref
interface; any type that implements it acts as a container for a value. It may be dereferenced, either via deref
, or the corresponding reader syntax, @
.
A Clojure future evaluates a body of code in another thread. future
returns immediately, allowing the current thread of execution to carry on. The result of the evaluation will be retained by the future, which you can obtain by dereferencing it. Just like a delay, dereferencing a future will block if the code it is evaluating has not completed yet.
(def long-calc (future (apply + (range 1e8))))
@long-calc
;= 4999999950000000
A promise is similar to a one-time, single-value pipe: data is inserted at one end via deliver
and retrieved at the other end by deref
. Such things are sometimes called dataflow variables and are the building blocks of declarative concurrency.
(def p (promise))
(deliver p 42)
@p
;= 42
Concurrency is the coordination of multiple, usually interleaved threads of execution that are accessing or modifying some shared state.
Parallelism involves state as well, but usually in the inverse. Being an optimization technique used to efficiently utilize all of the available resources to improve the performance of an operation, approaches to parallelization generally aim to maximize the window of exclusive access to state (or, often, chunks of state) so as to minimize coordination overhead. Rather than involving interleaved threads of execution, the multiple evaluations of a parallelized operation run simultaneously.
There is a cousin of map
– pmap
– that will parallelize the application of a function across a sequence of values, returning a lazy seq of results just like map
. Use pmap
when the operation you're performing is parallelizable in the first place, and is significant enough for each value in the seq that its workload will eclipse the process coordination inherent in its parallelization.
In Clojure, there is a clear distinction between state and identity.
Identities are represented in Clojure using four reference types: var
s, ref
s, agent
s, and atom
s.
All references always contain some value (even if that value is nil
); accessing one is always done using deref
or @
.
One critical guarantee of deref
within the context of Clojure's reference types is that deref
will never block, regardless of the change semantics of the reference type being dereferenced or the operations being applied to it in other threads of execution.
A coordinated operation is one where multiple actors must cooperate in order to yield correct results. In contrast, an uncoordinated operation is one where multiple actors cannot impact each other negatively because their contexts are separated.
Synchronous operations are those where the caller's thread of execution waits or blocks or sleeps until it may have exclusive access to a given context, whereas asynchronous operations are those that can be started or scheduled without blocking the initiating thread of execution.
Atoms are identities that implement synchronous, uncoordinated, atomic compare-and-set modification. They are created using atom
. swap!
is the most common modification operation used with them, which sets the value of an atom to the result of applying some function to the atom's value.
(def sarah (atom {:name "Sarah" :age 25}))
(swap! sarah update-in [:age] + 3)
;= {:age 28, :name "Sarah"}
One thing you must keep in mind when using swap!
is that, because atoms use compare-and-set semantics, if the atom's value changes before your update function returns (as a result of action by another thread of execution), swap!
will retry, calling your update function again with the atom's newer value. swap!
will continue to retry the compare-and-set until it succeeds.
Watches are functions that are called whenever the state of a reference has changed. A watch function must take four arguments: a key, the reference that's changed, the old state of the reference, and its new state.
(defn echo-watch
[key identity old new]
(println key old "=>" new))
(def sarah (atom {:name "Sarah" :age 25}))
(add-watch sarah :echo echo-watch)
(swap! sarah update-in [:age] inc)
; :echo {:age 25, :name Sarah} => {:age 26, :name Sarah}
;= {:age 26, :name "Sarah"}
Validators enable you to constrain a reference's state however you like. A validator is a function of a single argument that is invoked just before any proposed new state is installed into a reference. If the validator returns logically false or throws an exception, then the state change is aborted with an exception: (def n (atom 1 :validator pos?))
Refs are Clojure's coordinated reference type. Using them, you can ensure that multiple identities can participate in overlapping, concurrently applied operations with:
- No possibility of the involved refs ever being in an observable inconsistent state
- No possibility of race conditions among the involved refs
- No manual use of locks, monitors, or other low-level synchronization primitives
- No possibility of deadlocks
All modifications of refs must occur within a transaction, whose scope is established by dosync
.
The only operations that should ever be performed within the scope of a transaction are things that are safe to retry, which rules out many forms of I/O.
A var is made private if the symbol that names it has a :private
slot in its metadata map: (def ^:private everything 42)
. A private function is declared by using the defn-
form.
Clojure allows you to add documentation to top-level vars via docstrings, which are usually string literals that immediately follow the symbol that names the var:
(def a
"A sample value"
5)
(doc a)
;-------------------------
;user/a
; A sample value
You can add ^:const
metadata to a var's name symbol in order to declare it as a constant to the compiler: (def ^:const everything 42)
. Any references to a constant var aren't resolved at runtime (as per usual); rather, the value held by the var is retained permanently by the code referring to the var when it is compiled.
For the most part, Clojure is lexically scoped: that is, names have values as defined by the forms that circumscribe their usage and the namespace within which they are evaluated. The exception to this rule is dynamic scope, a feature provided by vars. Vars have a root binding; this is the value bound to a var when it is defined using def
, and the one to which references to that var will evaluate in general. However, if you define a var to be dynamic, then the root binding can be overridden and shadowed on a per-thread basis using the binding
form:
(def ^:dynamic *max-value* 255)
(defn valid-value?
[v]
(<= v *max-value*))
(binding [*max-value* 500]
(valid-value? 299))
;= true
The naming convention for dynamic vars intended to be rebound with binding
is to surround them with asterisks (also known as "earmuffs").
Agents are an uncoordinated, asynchronous reference type. An agent's state may be altered via two functions, send
and send-off
. The sole difference between send
and send-off
is the type of action that may be provided to each. Actions queued using send
are evaluated within a fixed-size thread pool that is configured to not exceed the parallelizability of the current hardware. Thus, send
must never be used for actions that might perform I/O or other blocking operations. In contrast, actions queued using send-off
are evaluated within an unbounded thread pool, which allows any number of potentially blocking, non-CPU-bound actions to be evaluated concurrently.
(def a (agent 0))
(send a inc)
@a
;= 1
By default, encountering an error will cause an agent to fail silently: you'll still be able to dereference its last state, but further actions will fail to queue up. A failed agent can be salvaged with restart-agent
, which will reset the agent's state to the provided value and enable it to receive actions again. Entering a failed status is one of two failure modes supported by agents. agent
accepts an :error-mode
option of :fail
(the default) or :continue
; an agent with a failure mode of :continue
will simply ignore an error thrown by the evaluation of an agent action, carrying on with processing any actions in its queue and receiving new actions without difficulty.
Building Abstractions
Macros
When a language lacks proper means of abstraction, the result is boilerplate and repetition, both signs of fundamental weaknesses in that language. Macros are powerful because they give you a way to define entirely new levels of abstraction within the language itself.
Macros are called by the compiler with their unevaluated data structures as arguments and must return a Clojure data structure that can itself be evaluated.
Macros being a tool of abstraction, each macro call generally produces code with a larger footprint than the macro call itself. Thus, the process of replacing macro calls with the code they produce is called macroexpansion.
The most fundamental tool in debugging macros is macroexpand-1
. This function takes a data structure and taps into the Clojure compiler to return the code that will be executed in its place if it were to be evaluated. macroexpand-1
only expands the macro once, if your macro produces another macro call, and you want to continue expanding until the top level form is no longer a macro, use macroexpand
instead.
Syntax-quoting looks similar to quoting, except it uses a backtick (`
) instead. There are two differences between quotes and syntax-quotes: the latter fully qualifies unqualified symbols with the current namespace, and it allows unquoting: some elements of the list can be selectively unquoted, causing them to be evaluated within the scope of the syntax-quoted form. You can unquote elements with ~
:
(def foo 123)
`(map println [~foo])
;= (clojure.core/map clojure.core/println [123])
To unpack the contents of one list into another list you can use the unquote-splicing operator ~@
:
(let [defs '((def x 123)
(def y 456))]
`(do ~@defs))
;= (do (def x 123) (def y 456))
Macros should be used only when you need your own language constructs; it follows that they shouldn't be used where a function can be as effective.
When setting up a binding in a macro, we'd like to dynamically generate a name that will never collide with a name outside the macro's scope or inside of bodies of code provided as arguments to our macro. Fortunately, Clojure has a mechanism and a simple syntax for generating such names: gensyms. The gensym
function returns a symbol that is guaranteed to be unique. Every time it's called, it returns a new symbol.
Gensyms in macros are common enough that there is a shorthand way of using them. Any symbol ending in #
inside a syntax-quote form will be expanded automatically into a gensym, and will expand to the same gensym every time it appears. This is called an auto-gensym.
Macros should delegate most of their work to functions and keep only for them what can't be more easily done in functions: controlling evaluation.
defmacro
introduces two implicit local bindings: &env
and &form
. &env
contains a map whose keys are the names of all the current locals (the values of this map are unspecified). &form
holds the whole form currently being macro-expanded, that is, a list containing the name of the macro as a symbol and the arguments to the macro.
Datatypes and Protocols
The dynamic Expression Problem is a new name for an old problem. The goal is to have interfaces and types where one can create new types implementing existing interfaces and provide implementation of a new interface for an existing type, without recompiling existing code.
The Clojure corollary to interfaces are called protocols. A protocol consists of one or more methods, where each method can have multiple arities. All methods have at least one argument, corresponding to the privileged this
in Java and self
in Ruby and Python.
(defprotocol ProtocolName
"documentation"
(a-method [this arg1 arg2] "docstring")
(another-method [x] [x arg] "docstring"))
Unlike most other names in Clojure, the names of protocols and types are usually written in CamelCase because they compile down to native JVM interfaces and classes. This allows you to easily distinguish protocols and types from other entities in Clojure, and it allows for idiomatic usage from other JVM languages.
From a user standpoint, protocol methods are functions.
A protocol should not be designed with the user but with the protocol implementer in mind. A good protocol should consist of a small set of methods with no overlapping concerns; a good protocol is one that is easy to implement.
An example of a protocol and its implementation:
(defprotocol FriendlyGreeter
(hello [this]))
(extend-protocol FriendlyGreeter
clojure.lang.IPersistentVector
(hello [v]
(println "hello vector:" v))
clojure.lang.IPersistentMap
(hello [m]
(println "hello map:" m)))
(def v [1 2 3])
(def m [:key "value"])
(hello v)
; hello vector: [1 2 3]
(hello m)
; hello map: {:key value}
A noteworthy aspect of protocols is that you are not required to implement all methods: Clojure will simply throw an exception if you try to call an unimplemented method.
extend-type
is the dual to extend-protocol
: where extend-protocol
allows you to extend one protocol to several types, extend-type
allows you to extend several protocols to one type.
A Clojure type is a Java class, although a Clojure type definition is as easy as (defrecord Point [x y])
or (deftype Point [x y]
. Both of these forms define a new Java class Point
, with two public
and final
fields named x
and y
. Just like protocols, type names are usually written in CamelCase because they do compile down to Java classes. Creating a new Point
instance is as simple as calling its constructor: (Point. 3 4)
Records are designed to be used to model and represent application-level data, whereas deftype
types are intended to define low-level infrastructure types, such as when you are implementing a new data structure. You should find yourself using records far more than deftype
types for most classes of programs and problems.
Records participate in the associative abstraction, so you can use with records all of the facilities you're used to using with maps.
In addition to a constructor that accepts the defined fields of the type, records also offer a constructor that reflects some of their additional capabilities; namely, field extensibility and metadata support. This second constructor expects two extra arguments: a map containing slots beyond the fields specified when the record type was defined, and a map of metadata to attach to the created record: (Point. 3 4 {:metakey :metavalue} {:z 5})
.
There are two ways to implement a protocol for any given type:
- Provide implementations for protocol methods when a type is initially defined via
deftype
ordefrecord
; this is called inline implementation - Use the
extend*
functions to register implementations for a type with the protocol.
Example of an inline implementation:
(defrecord Point [x y]
FriendlyGreeter
(hello [pt]
(println "hello point:" pt)))
(def pt (Point. 3 4))
(hello pt)
; hello point: #user.Point{:x 3, :y 4}
Multimethods
A multimethod is created using a defmulti
form, and implementations of a multimethod are provided by defmethod
forms.
(defmulti fill
(fn [node value] (:tag node)))
(defmethod fill :div
[node value]
(assoc node :content [(str value)]))
(defmethod fill :input
[node value]
(assoc-in node [:attrs :value] (str value)))
(fill {:tag :div} "hello")
;= {:content ["hello"], :tag :div}
(fill {:tag :input} "hello")
;= {:attrs {:value "hello"}, :tag :input}
Clojure's multimethods allow us to define hierarchies to support whatever relationships your domain requires, including multiple inheritance. These hierarchies are defined in terms of relations between named objects (keywords or symbols) and classes. There's the global (and default) hierarchy and ones you can create as needed via make-hierarchy
.
Non-namespaced keywords (or symbols) cannot be used in the global hierarchy.
You define a hierarchical relation with derive
: (derive ::checkbox ::checkable)
.
Each time you envision writing a bunch of nested conditionals or a big cond
or defining a plethora of types just to link your data with particular functionality, you should ask yourself whether a multimethod wouldn't serve you better.
Tools, Platform, and Projects
Organizing and Building Clojure Projects
All Clojure code is defined within namespaces. If you neglect to define your own, any vars you define will be mapped into the default user
namespace.
def
and all of its variants define vars within the current namespace, which is always bound in *ns*
.
Using in-ns
, we can switch to other namespaces (creating them if they don't already exist), thereby allowing us to define vars in those other namespaces.
Assuming a namespace is already loaded, we can use refer
to add mappings to its vars for our namespace:
(def a "hello")
; #'user/a
(in-ns 'example)
(clojure.core/refer 'user)
a
;= "hello"
refer
can be used to do more than a simple "import" though: you can specify that certain vars be excluded, included, or renamed when they are mapped into the current namespace by using optional keyword args of :exclude
, :only
, and :rename
, respectively:
(clojure.core/refer 'clojure.core
:exclude '(range)
:rename '{+ add
- sub})
When some code needs to make use of functions or data defined in public vars in another namespace, require
and use
are used to: 1) Ensure that the namespaces in question are loaded, 2) optionally establish aliases for those namespaces' names, and 3) trigger the implicit use of refer
to allow code to refer to other namespaces's vars without qualification. require
provides (1) and (2); use
is built on top of it and refer
to provide (3) in a succint way.
Having to use fully qualified symbols to name vars can be a pain, hence require
allows to specify an alias for a namespace: (require '[clojure.set :as set])
When you need to require multiple namespaces that share a common prefix, you can provide to require
a sequential collection where the first element is the namespace prefix and the remaining elements are the remaining segments specifying the namespaces you'd like to load: (require '(clojure string [set :as set]))
use
provides all of the capabilities of require
, except that by default, it refer
s the given namespace after it is loaded. In addition, use
passes along all of its arguments to refer
, so you can leverage the latter's :exclude
, :only
, and :rename
options to their fullest.
It is generally good practice to avoid unconstrained usages of use
, that is, those that do not include an :only
option to explicitly name the functions that should be refer
ed into the current namespace. Doing so makes it clear what parts of other namespaces your code makes use of.
While Clojure namespaces primarily map symbols to vars, they also map symbols to Java classes and interfaces. You can use import
to add such mappings to the current namespace: (import 'java.util.Date)
. Importing a class makes its "short name" ("Date" in the example) available for use within the current namespace.
Whenever you are working on code you would like to reuse outside of a REPL, you should use the ns
macro to define your namespaces. It allows you to declaratively specify a namespace's name along with its top-level documentation and what it needs to have require
d, refer
ed, use
d, and import
ed to load successfully and work properly.
Each namespace should be defined in a separate file, and this file's location within your project's Clojure source root must correspond with the namespace's segments.
Use underscores in filenames when namespaces contain dashes. Only the filename and directories corresponding to the namespace's segments are affected – you would continue to refer to the namespace in Clojure code using its declared name. This is necessary because the JVM does not allow for dashes in class or package names, but it is generally idiomatic to use dashes instead of underscores when naming Clojure entities.
Clojure code is always compiled – there is no Clojure interpreter. Compilation, which involves generating bytecode for a given chunk of Clojure code and loading that bytecode into the host JVM, can happen in two different ways:
- At runtime; this is what happens when you use the REPL, or when you load a Clojure source file from disk. The contents of source files are compiled into bytecode, and loaded into the JVM. This bytecode and the classes it defines are not retained after the host JVM has been terminated.
- "Ahead-of-time" (AOT) compilation is the same as the runtime compilation, but the resulting bytecode is saved to disk as JVM class files.
Outside of a few select circumstances, AOT compilation is entirely optional. Since Clojure libraries and applications generally have no technical reason to demand AOT compilation, it's reasonable to prefer to distribute Clojure sources when you can.
Java and JVM Interoperability
The deep level of integration means that using Java libraries from Clojure (and vice versa) generally requires no special wrappers, conversions, or other subterfuge, and carries no performance penalty compared to equivalent Java code.
A key consideration in the design of the interop forms was that they are entirely consistent with function position: the "operation" being performed is always the first symbol in each form. Thus, in addition to being functionally analogous to "native" Java method and constructor calls, interop forms are entirely consistent with Clojure's syntax.
proxy
produces an instance of an anonymous class that implements any number of Java interfaces and/or a single concrete base class. This anonymous class is generated only once, at compile time, based on the class and interface(s) specified. After that, the cost of each runtime proxy
invocation is only that of a single call of the constructor of the generated class.
gen-class
allows you to define Java classes whose method implementations are backed by regular Clojure functions. It is intended exclusively for interop contexts, and supports a broad subset of the Java object model that makes it possible to fulfill framework and library API requirements with few exceptions.
gen-class
is the only form in Clojure that must be ahead-of-time (AOT) compiled. Without it, gen-class
forms are no-ops, as gen-class
does not define a class at runtime like all of Clojure's other class-definition forms.
Clojure recognizes metadata attached to any of its class-generation forms as annotations of the resulting classes, methods, or fields.
If you are using the results of any class-generating Clojure form from Java, you must AOT compile the namespaces containing those forms. Java's compiler needs to have those class files available on disk in order to compile Java code that used Clojure-generated classes. This is in contrast to the use of defrecord
, et al., in a Clojure-only scenario; in that case, Clojure simply generates and loads the necessary classes at runtime and loads them into the JVM without ever producing a file on disk.
REPL-Oriented Programming
There are a number of vars that are typically only bound within a REPL session that provide conveniences necessary in an interactive environment. *1
, *2
, and *3
hold the values of the most recently evaluated expressions. *e
provides the last uncaught exception that occured in the REPL session.
clojure.repl
provides a bunch of utilities that are very handy at the REPL. apropos
shows you which functions in loaded namespaces match a given regular expression or string: (apropos #"^ref")
. find-doc
does much the same, except it searches within documentation and prints all of the information associated with matching vars. There's also source
, which prints the source code of the specified function. Finally, there's doc
, which prints the documentation for a given var; and dir
, which prints a list of the public vars declared in the specified namespace.
Practicums
Numerics and Mathematics
Unlike most dynamic languages, Clojure supports primitive numerics (e.g. long
and double
) alongside boxed numerics. Primitives are not objects; they are value types that correspond directly to a machine-level type. In contrast, boxed numbers are objects that are defined by classes; for example java.lang.Long
is a box class whose sole purpose is to contain a primitive long
. Being objects, they incur a cost for each allocation, and so operations over them are necessarily slower: they must often be unboxed first (to obtain the primitive value within), and then the result of each operation may need to be boxed again (requiring an allocation of the box class appropriate for the primitive result value).
Object identity, implemented by identical?
in Clojure, is used to determine if two (or more) objects are exactly the same instance. In general, numbers will never be identical?
, even if provided as literals. The exception is that the JVM provides for a limited range fixnums. Fixnums are a pool of boxed integer values that are always used in preference to allocating a new integer. The JVM fixnum range is +-127 and so only results of operations that return integers within this range may be identical?
to other integers of the same value.
Reference equality (=
) is what is most commonly referred to as "equality": a (potentially) type-sensitive, deep comparison of values to determine if they are structurally the same. =
will never return true for comparisons of equivalent numbers of different categories: (= 1 1.0) ;= false
.
Clojure's ==
predicate implements numerical equivalence, measure of equality that is aligned with our intuitive understanding, unfettered by the artificial categories we use to delineate numeric representations. Where =
returns false
when comparing numbers of different implementation categories, ==
may return true
if the values represented by the numbers provided are numerically equivalent.
To maximize the performance of numerics-intensive code in Clojure, use primitives because they are not burdened by the allocation and garbage-collection costs associated with boxed numbers, and most operations over them are implemented at very low levels. A corollary to the notion of using primitives whenever possible where performance is a concern is that you should avoid using collections and sequences in such circumstances because when a primitive is added to a collection, it is automatically "promoted" to an instance of its corresponding boxed class.
If you want to create an array from an existing collection, you can use either into-array
or to-array
. The latter always returns an array of objects; the former will return an array of the type of the first value in the provided collection, or an array of a specified supertype.
make-array
is used to create new empty arrays of any size or dimensionality, which are initialized to the default value for the type in question (nil
for object types, false
for Boolean arrays, and zero for primitive numeric arrays).
Design Patterns
In Clojure, a superset of the functionality of aspect-oriented programming is easy to obtain thanks to vars and first-class functions. Functions can easily be passed as arguments to other functions, and Clojure allows vars to be redefined at runtime; these two features combine to allow easy "wrapping" of functions with other functions to modify behavior or results however you like.
The Robert Hooke library provides a simple and powerful way to define advices (referred to as hooks) for functions.
Testing
clojure.test
is the "official" Clojure test framework.
Assertions in clojure.test
use the is
macro. is
evaluates a single expression, tests whether the result is logically true, and returns that value. is
will report any failures, including the actual values that were obtained, along with the (optional) message provided with the assertion: (is (= 5 (+ 4 2)))
You can document tests and augment failure reports by using the testing
macro, which causes the description of a failing test's scope to be included in the error report.
Tests can be defined as standalone functions via the deftest
macro. This macro simply defines a zero-argument function for you with some metadata attached to its var that marks it as a test. deftest
tests are otherwise identical to normal functions.
are
is a helper macro in clojure.test
that templatizes assertions:
(deftest test-addition
(are [x y z] (= x (+ y z))
10 7 3
20 10 10
100 89 11))
fn
(and therefore derivatives like defn
) has direct support for writing assertions to test inputs for preconditions and test output for postconditions. Preconditions are evaluated before the body of the function; postconditions are evaluated after the body of the function has been executed, but before the return value is delivered to the called. If any condition evaluates logically false, then an error is thrown.
If the first value in the body of a Clojure function is a map with :pre
or :post
keys, this map is considered a map of pre- and/or postcondition expressions, and will be expanded to calls to assert
when the function is compiled.
The values for :pre
and :post
should be vectors, where each item in the vector is a separate assertion. Function parameters can be referred to in the preconditions. The value being returned from the function is bound to %
in postconditions.
Using Relational Databases
We have many options for interacting with relational databases from Clojure. clojure.java.jdbc
is a simple yet powerful library that acts as a thin layer between Clojure and JDBC. Korma is another Clojure library that provides a more Clojure-native interface. And finally, if Clojure's libraries do not suit your style or if you are looking to mix Clojure into an existing Java-based application, you can always fall back to one of the many mature and robust Java database libraries or frameworks.
Using Nonrelational Databases
-
Clojure and the Web
There is no definitive "Clojure stack", at least compared to the monolithic framework bonanza that often constitute web "stacks" in other languages. Instead, the Clojure community has grown a number of modular libraries over the years that collectively satisfy all of the requirements of the web developer, but that leverage Clojure's fundamental abstractions and emphasis on functional programming.
Whereas many other frameworks define fixed APIs for accessing web request data – like the requested URI, request headers, query and post parameters, body content, and so on – and still other APIs for sending web responses, Ring represents both requests and responses as regular Clojure maps. In both cases, these maps must contain certain slots, may contain others, and can be used to hold any other data you require in the course of processing them.
A Rind adapter provides a bridge between a Ring application and the local implementation details of the HTTP protocol and/or server. In short, when an HTTP request is received, an adapter deconstructs it into a request map and passes it to the Ring application to be processed. That invocation must return a response map, which the adapter uses to send an HTTP response back to the client.
The core contribution of Ring is not any of its particular implementations of anything; rather, it is the well-considered definition of the request/response data schema and key concepts of adapters, middleware, and handlers that makes it important. Indeed, thanks to these abstractions, Ring – the canonical implementation of which is fundamentally synchronous in nature to suit the synchronous nature of most web applications – can itself be swapped out for other Ring-compatible implementations. A notable example is Aleph, which offers a Ring-compatible adapter that uses Netty to serve responses to clients asynchronously, without requiring any changes to your Ring application.
A Rind handler is just a function that accepts a request map and returns a response map. All Ring applications consist of a bunch of handler functions, chained and composed and delegated to as necessary to support the desired behavior and functionality.
Middleware is any enhancement or modification of the effect of handlers.
Routing is the selection of a handler that should be used to respond to a web request, and routes are patterns of incoming request attributes that are used to drive that selection process. The most popular library for defining routes for Ring applications is Compojure.
Enlive proposes a radical way to decouple code from templates: rather than defining a special local syntax for interpolating values into templates, Enlive templates are plain HTML files with no special tags, no special attributes, no special classes, and no special syntax. Instead, content is injected into templates by Clojure code that uses selectors (heavily inspired by CSS selectors) to specify what to modify, and Clojure functions that define what transformations to apply.
Deploying Clojure Web Applications
Almost without exception, Clojure web applications are packaged and deployed as servlets, the same fundamental architecture used by web applications written in Java.
Miscellanea
Choosing Clojure Type Definition Forms Wisely
-
Introducing Clojure into Your Workplace
-
What's Next?
ClojureCLR is a port of Clojure to the .NET CLR.
ClojureScript targets JavaScript, and therefore produces code that can be run in all modern browsers, as well as in other JavaScript execution environments such as Node.js.