What's the (hidden) cost of Scala's lazy val?

PerformanceScalaLazy Evaluation

Performance Problem Overview


One handy feature of Scala is lazy val, where the evaluation of a val is delayed until it's necessary (at first access).

Of course, a lazy val must have some overhead - somewhere Scala must keep track of whether the value has already been evaluated and the evaluation must be synchronized, because multiple threads might try to access the value for the first time at the same time.

What exactly is the cost of a lazy val - is there a hidden boolean flag associated with a lazy val to keep track if it has been evaluated or not, what exactly is synchronized and are there any more costs?

In addition, suppose I do this:

class Something {
    lazy val (x, y) = { ... }
}

Is this the same as having two separate lazy vals x and y or do I get the overhead only once, for the pair (x, y)?

Performance Solutions


Solution 1 - Performance

This is taken from the scala mailing list and gives implementation details of lazy in terms of Java code (rather than bytecode):

class LazyTest {
  lazy val msg = "Lazy"
}

is compiled to something equivalent to the following Java code:

class LazyTest {
  public int bitmap$0;
  private String msg;

  public String msg() {
    if ((bitmap$0 & 1) == 0) {
        synchronized (this) {
            if ((bitmap$0 & 1) == 0) {
                synchronized (this) {
                    msg = "Lazy";
                }
            }
            bitmap$0 = bitmap$0 | 1;
        }
    }
    return msg;
  }

}

Solution 2 - Performance

It looks like the compiler arranges for a class-level bitmap int field to flag multiple lazy fields as initialized (or not) and initializes the target field in a synchronized block if the relevant xor of the bitmap indicates it is necessary.

Using:

class Something {
  lazy val foo = getFoo
  def getFoo = "foo!"
}

produces sample bytecode:

 0  aload_0 [this]
 1  getfield blevins.example.Something.bitmap$0 : int [15]
 4  iconst_1
 5  iand
 6  iconst_0
 7  if_icmpne 48
10  aload_0 [this]
11  dup
12  astore_1
13  monitorenter
14  aload_0 [this]
15  getfield blevins.example.Something.bitmap$0 : int [15]
18  iconst_1
19  iand
20  iconst_0
21  if_icmpne 42
24  aload_0 [this]
25  aload_0 [this]
26  invokevirtual blevins.example.Something.getFoo() : java.lang.String [18]
29  putfield blevins.example.Something.foo : java.lang.String [20]
32  aload_0 [this]
33  aload_0 [this]
34  getfield blevins.example.Something.bitmap$0 : int [15]
37  iconst_1
38  ior
39  putfield blevins.example.Something.bitmap$0 : int [15]
42  getstatic scala.runtime.BoxedUnit.UNIT : scala.runtime.BoxedUnit [26]
45  pop
46  aload_1
47  monitorexit
48  aload_0 [this]
49  getfield blevins.example.Something.foo : java.lang.String [20]
52  areturn
53  aload_1
54  monitorexit
55  athrow

Values initialed in tuples like lazy val (x,y) = { ... } have nested caching via the same mechanism. The tuple result is lazily evaluated and cached, and an access of either x or y will trigger the tuple evaluation. Extraction of the individual value from the tuple is done independently and lazily (and cached). So the above double-instantiation code generates an x, y, and an x$1 field of type Tuple2.

Solution 3 - Performance

With Scala 2.10, a lazy value like:

class Example {
  lazy val x = "Value";
}

is compiled to byte code that resembles the following Java code:

public class Example {

  private String x;
  private volatile boolean bitmap$0;

  public String x() {
    if(this.bitmap$0 == true) {
      return this.x;
    } else {
      return x$lzycompute();
    }
  }

  private String x$lzycompute() {
    synchronized(this) {
      if(this.bitmap$0 != true) {
	    this.x = "Value";
	    this.bitmap$0 = true;
	  }
	  return this.x;
    }
  }
}

Note that the bitmap is represented by a boolean. If you add another field, the compiler will increase the size of the field to being able to represent at least 2 values, i.e. as a byte. This just goes on for huge classes.

But you might wonder why this works? The thread-local caches must be cleared when entering a synchronized block such that the non-volatile x value is flushed into memory. This blog article gives an explanation.

Solution 4 - Performance

Scala SIP-20 proposes a new implementation of lazy val, which is more correct but ~25% slower than the "current" version.

The proposed implementation looks like:

class LazyCellBase { // in a Java file - we need a public bitmap_0
  public static AtomicIntegerFieldUpdater<LazyCellBase> arfu_0 =
    AtomicIntegerFieldUpdater.newUpdater(LazyCellBase.class, "bitmap_0");
  public volatile int bitmap_0 = 0;
}
final class LazyCell extends LazyCellBase {
  import LazyCellBase._
  var value_0: Int = _
  @tailrec final def value(): Int = (arfu_0.get(this): @switch) match {
    case 0 =>
      if (arfu_0.compareAndSet(this, 0, 1)) {
        val result = 0
        value_0 = result
        @tailrec def complete(): Unit = (arfu_0.get(this): @switch) match {
          case 1 =>
            if (!arfu_0.compareAndSet(this, 1, 3)) complete()
          case 2 =>
            if (arfu_0.compareAndSet(this, 2, 3)) {
              synchronized { notifyAll() }
            } else complete()
        }
        complete()
        result
      } else value()
    case 1 =>
      arfu_0.compareAndSet(this, 1, 2)
      synchronized {
        while (arfu_0.get(this) != 3) wait()
      }
      value_0
    case 2 =>
      synchronized {
        while (arfu_0.get(this) != 3) wait()
      }
      value_0
    case 3 => value_0
  }
}

As of June 2013 this SIP hasn't been approved. I expect that it's likely to be approved and included in a future version of Scala based on the mailing list discussion. Consequently, I think you'd be wise to heed Daniel Spiewak's observation:

> Lazy val is *not* free (or even cheap). Use it only if you absolutely > need laziness for correctness, not for optimization.

Solution 5 - Performance

I've written a post with regard to this issue https://dzone.com/articles/cost-laziness

In nutshell, the penalty is so small that in practice you can ignore it.

Solution 6 - Performance

given the bycode generated by scala for lazy, it can suffer thread safety problem as mentioned in double check locking http://www.javaworld.com/javaworld/jw-05-2001/jw-0525-double.html?page=1

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJesperView Question on Stackoverflow
Solution 1 - Performanceoxbow_lakesView Answer on Stackoverflow
Solution 2 - PerformanceMitch BlevinsView Answer on Stackoverflow
Solution 3 - PerformanceRafael WinterhalterView Answer on Stackoverflow
Solution 4 - PerformanceLeif WicklandView Answer on Stackoverflow
Solution 5 - PerformanceRomanView Answer on Stackoverflow
Solution 6 - PerformanceHuy LeView Answer on Stackoverflow