Creating JavaBeans with Clojure

Introduction

JavaBeans have been around since forever in the Java world. They're well supported, but not well designed, as you can see from the list of disadvantages on Wikipedia. Unfortunately, we're stuck with them. Frameworks like Apache Spark and others give us nice things in return for those beans. To create a library which is usable from Java and Scala and is compatible with Spark we must be able to create proper JavaBeans (perhaps - we'll get back to this later).

Hence, we may need to follow the JavaBean standard. The requirements are simple.

  • Getters and setters, also known as accessors, for all fields. Those are the typically verbose Java methods like long getSomeLongField() and void setSomeLongField(long value) which we perhaps wanted to escape from by moving to Clojure. The setters imply that our instance fields must be mutable.
  • JavaBean classes need to implement java.io.Serializable but this is usually easy. We just need to specify that we implement this interface and make sure the types of our fields also implement the interface.
  • A nullary constructor, ie. a constructor that takes zero arguments. This one is a bit difficult for Clojure.

JavaBeans through deftype

There are several ways to create Java classes from Clojure. This can be confusing, but in our case we can directly rule out defrecord since it only supports immutable fields.

Our options are deftype and gen-class. We'll start with deftype since it's easier to work with. Mutable fields can be created by specifying the :volatile-mutable true metadata, and we need to use definterface to specify our accessor methods.

We can use deftype to implement a JavaBean for edits to Wikipedia (we'll have more to say about this in the future).

(definterface IWikiEdit
  (^Long getTimestamp[])
  (setTimestamp [^Long timestamp])
  (^String getTitle [])
  (setTitle [^String title])
  (^long getByteDiff [])
  (setByteDiff [^Long byte-diff]))

(deftype DeftypeEditEvent
    [^{:volatile-mutable true
       :tag java.lang.Long} timestamp
     ^{:volatile-mutable true
       :tag java.lang.String} title
     ^{:volatile-mutable true
       :tag java.lang.Long} byteDiff]
  java.lang.Object
  (toString [_] (str "DeftypeEditEvent; title=" title ", byteDiff=" byteDiff))
  IEditEvent
  (getTimestamp [_] timestamp)
  (setTimestamp [_ v] (set! timestamp v))
  (getTitle [_] title)
  (setTitle [_ v] (set! title v))
  (getByteDiff [_] byteDiff)
  (setByteDiff [_ v] (set! byteDiff v))
  java.io.Serializable)

We can then create DeftypeEditEvent instances by using the factory method or by calling the associated Java class constructor directly.

(ns bean.deftype-bean-test
  (:require [bean.deftype-bean :as dt]))

(println "This is a bean:" (dt/->DeftypeEditEvent 1483138282 "hi" 123))
(println "This too:" (bean.deftype_bean.DeftypeEditEvent. 1483138282 "hi" 123))

Unfortunately this is not a true JavaBean, because it doesn't have a nullary constructor. deftype only creates a single constructor which takes as many arguments as there are fields. Can we create real JavaBeans in Clojure?

gen-class to the rescue

gen-class supports a lot of features, including nullary constructors. Due to its complexity gen-class is generally not recommended, but if we want nullary constructors for our JavaBeans it's the only way - at least if we want to stick to pure Clojure. We'll take care to avoid reflection by using type hints.

(gen-class
 :name bean.gen_class_bean.EditGenClass
 :implements [java.io.Serializable]
 :init init
 :state state
 :prefix "edit-"
 :constructors {[] []
                [Long String Long] []}
 :methods [[getTimestamp [] Long]
           [setTimestamp [Long] void]
           [getTitle [] String]
           [setTitle [String] void]
           [getByteDiff [] Long]
           [setByteDiff [Long] void]])

(defn edit-init
  ([] (edit-init 0 nil 0))
  ([timestamp title byteDiff]
   [[] (object-array [timestamp title byteDiff])]))

(defn edit-getTimestamp [this]
  (aget ^objects (.state ^bean.gen_class_bean.EditGenClass this) 0))
(defn edit-setTimestamp [this v]
  (aset ^objects (.state ^bean.gen_class_bean.EditGenClass this) 0 ^Long v))
(defn edit-getTitle [this]
  (aget ^objects (.state ^bean.gen_class_bean.EditGenClass this) 1))
(defn edit-setTitle [this v]
  (aset ^objects (.state ^bean.gen_class_bean.EditGenClass this) 1 ^String v))
(defn edit-getByteDiff [this]
  (aget ^objects (.state ^bean.gen_class_bean.EditGenClass this) 2))
(defn edit-setByteDiff [this v]
  (aset ^objects (.state ^bean.gen_class_bean.EditGenClass this) 2 ^Long v))

Instance methods are within the body of the class definition in Java, while in Clojure we define them in the namespace of the gen-class statement and with the prefix given by :prefix. We'll create a constructor with the name given by :init, which together with the prefix of our choice means that the constructor will invoke our edit-init function.

We must support two constructors. In addition to the nullary constructor, we also need a constructor as in the deftype case where we accept one argument for each field. In order for Clojure to figure out how to call the constructor of the super class, we use :constructors.

 :constructors {[] []
                [Long String Long] []}

This tells Clojure that we want to call the super class constructor with zero arguments if our constructor is called with zero arguments, but also that we want to call the super class constructor with zero arguments if our constructor is called with three arguments of the given types, one for each field.

Our constructor must return a vector of two elements. The first element contains the arguments for the super class constructor (empty in this case) and the second element is the initial value for :state. We use a Java array of objects as state since we must support mutable state.

Benchmarks

Performance is important since we may create a very large number of JavaBeans when using data processing frameworks. We'll benchmark our implementations and compare them with their Java counterparts by creating 500000 JavaBeans, saving the beans to disk, reading them back and summing their byteDiff values. We'll also look at the size of the JavaBeans persisted on disk.

You may have noticed that we have been using Long and Boolean instead of the primitive types long and boolean in our JavaBeans so far. To be fair to Java we should also consider an implementation taking advantage of primitive types. You can find the benchmark implementation on Github if you want to run the benchmarks yourself. Let's look at the results.

Benchmark results

Implementation Size on disk Mean execution time Standard deviation
Java 87554 KB 50.5 s 0.410 s
Java (primitives) 70536 KB 26.0 s 0.156 s
deftype 87554 KB 50.5 s 0.204 s
gen-class 90067 KB 39.6 s 0.106 s
clj-bean 85185 KB 36.2 s 0.480 s

Each benchmark was executed 10 times to calculate the mean and standard deviation. deftype performs exactly as Java without primitive types, Java with primitive types is better than everything else, but clj-bean is a good second.

What's clj-bean you ask? Good question.

So what's the point? Macros.

If our Clojure implementation looked as above and performed this much worse than the optimal Java implementation there wouldn't be much point to Clojure for this specific use case. We could just implement the JavaBeans as Java classes and include them in our Clojure project.

Macros are perfect for eliminating the verbosity of gen-class. With an appropriate macro we only need to specify the name of the JavaBean, its fields, and their types. The macro will take care of generating all the required constructors, state, setters and getters.

We can also reduce the size and improve the performance of our JavaBeans by changing the way we deal with state. Instead of a single array of objects as the state, we'll use an object array where each element is an array of a primitive type, or an object. Non-primitive fields will be stored in the object array while primitive fields are stored in arrays of their primitive types.

The full WikipediaEditEvent has 13 fields, so it would be very tedious to write out all the required accessors. Using a macro it's easy, which brings us back to clj-bean and its (currently) sole purpose, the defbean macro. This is what the full definition of a Wikipedia edit event looks like:

(defbean WikipediaEditEvent
  [[long timestamp]
   [String channel]
   [String title]
   [String diffUrl]
   [String user]
   [long byteDiff]
   [String summary]
   [boolean minor]
   [boolean new]
   [boolean unpatrolled]
   [boolean botEdit]
   [boolean special]
   [boolean talk]])

You can find clj-bean on GitHub. There are good tutorials out there for learning how to write macros. If you are new to macros and want to understand clj-bean you may want to go through a macro tutorial and look at the tests before moving on to the implementation.

The beauty of macros is that you as a user of them only need to understand what they do, as explained in this post, and not necessarily how they do it with all the strange quoting rules and code generation under the hood - as long as they work without any issues, anyway.

Conclusion

The optimal Java implementation still has better performance, but the final Clojure implementation isn't far off and wins big time in the number of lines required to specify the JavaBean.

This pattern of starting off with a naive implementation and refining it until it does what you want but is overly verbose is a useful tool for creating macros. By first focusing on the code and writing down the desired end goal first (the constructors, the state, and all the accessors) we get some useful tests. Once our macro can replicate this end goal we can think about optimizations, using the tests to ensure we keep the same behavior.

In the end we have a great API for generating JavaBeans with good performance. Ideally we'd get all the way to Java's performance with primitive types, but this will be difficult without generating byte code. Perhaps we'll do that one day, but the API would remain the same since it contains only the essentials for the definition of a JavaBean.