Sunday, January 3, 2016

Computational complexity of a .size() method of set and map views in Java

Even though I’ve been programming in Java for many years, there are still some small bits that are new to me. Today I will write about the time complexity of a .size() method of set and map views in Java (based on OpenJDK implementation, which, in fact is a reference implementation).

In Java SE, we have an interface called SortedSet which according to the documentation is:
“A Set that further provides a total ordering on its elements. (...) Several additional operations are provided to take advantage of the ordering.”
Those additional operations let us get a view of some subset of the current set. There are 3 such methods available which names are self explanatory:
SortedSet headSet(E toElement)
SortedSet subSet(E fromElement, E toElement)
SortedSet tailSet(E fromElement)
An important part here is that the returned set is backed by the original set so changes to either of them are reflected in the other one as well.

So what about the time complexity of the .size() method of such returned set? Well, we may think it is the same as the .size() of a TreeSet, one implementation of the SortedSet interface, which is constant (O(1)):
public int size() {
       return size;
}
Source 

It turns out that in OpenJDK implementation that’s not the case. The time complexity of .size() method in such view set is linear to the number of elements in it (O(n)). The details can be found in the EntrySetView class:
abstract class EntrySetView extends AbstractSet> {
           private transient int size = -1, sizeModCount;

           public int size() {
               if (fromStart && toEnd)
                   return m.size();
               if (size == -1 || sizeModCount != m.modCount) {
                   sizeModCount = m.modCount;
                   size = 0;
                   Iterator i = iterator();
                   while (i.hasNext()) {
                       size++;
                       i.next();
                   }
               }
               return size;
           }
...
}
Source

So what is happening here? First condition checks if a view spans across entire original set and if true  delegates the call to its .size().
Later we check whether the size of the view hasn’t been calculated before or if the number of structural modifications in a view tree is different than in the original tree. If either of these conditions is true then we iterate over all the set elements and count them. Obviously that operation takes linear time to the number of elements.

How this could be solved? One possible solution would be to keep views and original sets linked to each other via references. I.e. a view set would maintain a reference to the set from which it has been derived and the original set would maintain references to all its views. Then a change in it or in any of the views could be propagated.
That would have solved the problem of the .size() time complexity but it would introduce a memory footprint and also affect the computational complexity of add/remove operations. That’s because we can have more than O(log n) unique views of a given set - in fact we can have n such views so now the add/remove operations would be O(n) as the size of all the views would have to be updated.

All in all, different approaches have different trade offs and the one chosen in OpenJDK seems sensible as the add/remove might be more common operation that .size() on a view and has smaller memory footprint than the solution depicted above but it’s still good to be aware of such details when developing high throughput and low latency applications.

UPDATE

After a while, I got curious about this case in different languages. I have decided to check C# as it has a class with similar functionality and in addition .NET Core has been open sourced so we can investigate the official code.
It turns out that they took a slightly different approach at solving this problem and ensure that getting a size of the view is a constant time operation when modifying it (but not when modifying the original set).

In C# we have a class called SortedSet which has a following method:
public virtual SortedSet GetViewBetween(
 T lowerValue,
 T upperValue
)
which corresponds to the subSet in Java SortedSet interface.

Source

In order to get a size of such SortedSet in C# we need to access a property, called "Count". According to the documentation retrievial of this value is an O(1) operation.

Source

If we dive into the code of SortedSet class we will quickly see that GetViewBetween() returns a subclass of SortedSet called TreeSubSet which only modifies the "count" variable and the "Count" property is unchanged from its super class SortedSet:
        public int Count {
            get {
                VersionCheck();
                return count;
            }
        }
Source

Now we should have a look at how the VersionCheck() is implemented in TreeSubSet:
            internal override void VersionCheck() {
                VersionCheckImpl();
            }
 
            private void VersionCheckImpl() {
                Debug.Assert(underlying != null, "Underlying set no longer exists");
                if (this.version != underlying.version) {
                    this.root = underlying.FindRange(min, max, lBoundActive, uBoundActive);
                    this.version = underlying.version;
                    count = 0;
                    InOrderTreeWalk(delegate(Node n) { count++; return true; });
                }
            }
Source

The key point here is the check if the version of a subset is different than the underlying set. The version of a set changes on any structural mutation of a set (add/remove operations). If the versions are different then this operation gets more expensive. First we find a new subset with FindRange which is O(log n) operation where n is the number of elements in the original set - thanks to Red-Black tree implementation - and then we perform an In-Order traversal which is O(n) operation where n is the size of the subset.

When the versions will be different? The modifying operations of TreeSubSet call already VersionCheck() therefore the versions will be the same so a consecutive call to Count will be a constant time operation.

Here is an example for add operation:
            internal override bool AddIfNotPresent(T item) {
 
                if (!IsWithinRange(item)) {
                    ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.collection);
                }
 
                bool ret = underlying.AddIfNotPresent(item);
                VersionCheck();
#if DEBUG
                Debug.Assert(this.versionUpToDate() && this.root == this.underlying.FindRange(min, max));
#endif
 
                return ret;
            }
Source

There is a possibility to optimize this method - to avoid O(n) (n - number of elements in the subset) counting in VersionCheck() we can update the count variable here right away - if ret is true we simply increase it by 1 as we already checked if a new value is in the subset range. Similar implementation is for a removal operation. Maybe I will submit a push request on github for that :)

Now, if we modify the original set, its version will also change. Then a consecutive call to its view's Count property will detect a version change so its expensive branch will be executed.

To summarize, the Java and C# implementations of subset size computation are very similar in terms of time complexity. Even though, in C# when modifying a subset we get a constant size operation, then the add/remove operations get more expensive. So it is all about shifting the additional cost of counting the elements from one method to another.
The shift in Java though looks more reasonable to me, especially for a case of massive modifications of a view, without a need to know its size in between. In C#, the size will be recalculated after each modification.

Sunday, September 21, 2014

Memento design pattern in Java

Recently when recapping my design patterns I came across a Memento. In principle, it provides means for capturing and externalizing an object’s internal state without violating the encapsulation - only the Originator can manipulate the state saved in Memento. That would require a Memento class to implement two interfaces - a wide one for the Originator with appropriate get and set methods and a narrow one for other objects to be able to pass the Memento where necessary.

The GoF book presents one possible implementation of that pattern in C++ using a “friend” class. It defines two classes - an Originator and a Memento. The latter defines two methods marked as private - getState and setState. In addition to that it declares the Originator class as a friend thus letting it access its private members.

As Java being my primary programming language it made me think how we could implement the Memento using it. Especially given the fact that Java does not support the friend class concept.

The most straightforward solution is to put the Memento, its interfaces and the Originator in the same package. The wide interface will be package-private and the narrow one public:
package memento1;

public interface NarrowMemento {
}

interface WideMemento {
  int getState();
}

class MementoImpl implements WideMemento, NarrowMemento {

  private int state;

  MementoImpl(int state) {
    this.state = state;
  }

  @Override
  public int getState() {
    return state;
  }

}

public class Originator {
  private int state;

  public Originator(final int state) {
    this.state = state;
  }

  public NarrowMemento getMemento() {
    return new MementoImpl(state);
  }

  public void setMemento(NarrowMemento memento) {
    this.state = ((WideMemento) memento).getState();
  }
}

The important bits:
  1. Originator class is public, can be used in any package. 
  2. MementoImpl class is package private thus visible only within its package.
  3. MementoWide interface is package private. 
  4. MementoNarrow marker interface is public.
Therefore only Originator and MementoNarrow are visible outside of the package and as long as Originator encapsulates its state properly it won’t leak through Memento.
This solution works fine as long as we can ensure that the Originator is used in a different package. Otherwise, the MementoWide is visible and can be used to elicit the Originator state.

There is another approach to this problem that mitigates the packaging concerns. If we want to make sure that regardless of the packaging only the Originator can access (make a call) to methods defined in the wide interface we can use method signature security. We will alter the method from the MementoWide to include a dummy class which can be constructed only by the Originator:

package memento2;

import memento2.originator.Originator;

public interface Memento {
  int getState(Originator.Stamp stamp);
}

public class MementoImpl implements Memento {

  final int state;

  public MementoImpl(final int state) {
    this.state = state;
  }

  @Override
  public int getState(Originator.Stamp stamp) {
    Objects.requireNonNull(stamp);
    return state;
  }
}

And in a different package Originator:

package memento2.originator;

import memento2.Memento;
import memento2.MementoImpl;

public class Originator {
  private final static Stamp stamp = new Stamp();

  private int state;

  public Originator(final int state) {
    this.state = state;
  }

  public Memento getMemento() {
    return new MementoImpl(state);
  }

  public void setMemento(Memento memento) {
    this.state = memento.getState(stamp);
  }

  public static class Stamp {
    private Stamp() {
    }
  }
}
The first thing that you will probably notice is that Memento interface and an implementing class are public. Still, only the Originator will be able to make a call to the exposed method. That is because its parameter is an inner class of Originator but its constructor is private so only the Originator can create a Stamp instance and make a successful call. Of course it is possible to pass a null as an argument to the method but then such client will be smacked with an NPE.
The only way this solution can be broken is by publishing an instance of the Stamp by either Memento or Originator which requires modifying a class. Therefore it is less likely to be abused than the first approach which relies only on proper packaging.

The last solution gets as close as possible to the behaviour of the Memento implemented using C++ friend class - i.e. ensuring that only the class which state we externalize can access it.

Sunday, March 10, 2013

Peterson’s locking algorithm in Java

Recently I started reading a new book - “The Art of Multiprocessor Programming”. At the very beginning the author brings up a topic of locks’ implementation and presents Peterson’s algorithm - an elegant two-thread mutual exclusion algorithm which can be generalized for n threads but lets start slow. After a short description, that can be found on the wikipedia as well, an implementation follows:
class Peterson implements Lock {
 private boolean[] flag = new boolean[2];
 private int victim;

 public void lock() {
  int i = ThreadID.get();
  int j = 1 - i;
  flag[i] = true;
  victim = i;
  while (flag[j] && victim == i) {};
 }

 public void unlock() {
  int i = ThreadID.get();
  flag[i] = false;
 }
}
Now, if we take that as a sample Java implementation then it is seriously flawed. We have 2 problems here. First is a data race (mentioned in my previous post) and a possibility of instructions reordering as there is no happens-before relationship established between different threads in this case.

How we can solve it? We have two possible ways:

1) Mark each shared variable as a volatile. But that is actually insufficient in the case of the ‘flag’ array since declaring it as a volatile will only ensure that the reference value will be visible to other threads. The values of the array may still land only in the CPUs cache and not be flushed to the main memory. To fix that we have 2 options. We can use either AtomicBoolean[] or AtomicIntegerArray (as there is no AtomicBooleanArray) and assume 0/1 for false/true. In the former case we will need an object per each array element and in the latter case only an object for AtomicIntegerArray and array object.
Here is a code using AtomicIntegerArray:
class Peterson1 implements Lock {
 private final AtomicIntegerArray flag = new AtomicIntegerArray(2);
 private volatile int victim;
 
 public void lock() {
  int i = ThreadID.get();
  int j = 1 - i;
  flag.set(i, 1);
  victim = i;
  while(flag.get(j) == 1 && victim == i) {};
 }

 public void unlock() {
  int i = ThreadID.get();
  flag.set(i, 0);
 }
}
2) Another approach is to use the technique that I have described in my article "Flushing with a volatile". In the code above (Peterson class) we can declare ‘victim’ as a volatile. It makes writes to the victim variable visible to other threads. Now what about visibility of writes to a ‘flag’ variable? If we ensure that each write to to the ‘flag’ is followed by a write to the volatile ‘victim’, then when a thread sees an up to date ‘victim’ value, it will also see an up to date ‘flag’ value.
There are 2 places where a write to a 'flag' happens:
In lock() method:
flag[i] = true;
victim = i;
Here a write to a flag is followed by a write to a volatile victim so all is good. Then in unlock() method we have:
int i = ThreadID.get();
flag[i] = false;
If we let the unlock() method as it is, the write of a false value to the ‘flag’ variable may not be seen by the other thread. To ensure the visibility, we should add a write to a ‘victim’ variable:
victim = 1 - i; // (this is the current victim value anyway)

But that’s not all. What about reading the ‘flag’ variable? It must be preceded by a read from a volatile field. In the lock() method we have the only read from the ‘flag’ variable:
while (flag[j] && victim == i) {};
To take the advantage of the Java memory model and to make the ‘flag’ value visible, we have to reorder the above condition to:
while (victim == i && flag[j]) {};
Here is the whole code:
class Peterson2 implements Lock {
 private final boolean[] flag = new boolean[2];
 private volatile int victim;
 
 public void lock() {
  int i = ThreadID.get();
  int j = 1 - i;
  flag[i] = true;
  victim = i;
  while (victim == i && flag[j]) {};
 }

 public void unlock() {
  int i = ThreadID.get();
  flag[i] = false;
  victim = 1 - i;
 }
} 
What about a microbenchmark? I have measured the execution of such task:
class IncrementingTask implements Runnable {
 Lock lock;
 int a;
 IncrementingTask(Lock lock) {
  this.lock = lock;
 }
 @Override
 public void run() {
  for(int i = 0; i < 50000000; i++) { // when 2 threads execute this task or 10^8 for 1 thread so that the amount of work and the end result are the same. The ‘a’ field must be equal to 10^8.
   lock.lock();
   a++;
   lock.unlock();
  }
 }
} 
with Peterson1 and Peterson2 implementations, both with only one and two threads executing it, on:

$ cat /proc/cpuinfo | egrep "core id|physical id" | tr -d "\n" | sed s/physical/\\nphysical/g | grep -v ^$ | sort | uniq | wc -l
12
$ java -version java version "1.6.0_26" Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
$ cat /proc/version Linux version 2.6.32-33-server (buildd@allspice) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #72-Ubuntu SMP Fri Jul 29 21:21:55 UTC 2011 

Below are the average times (ns) for each case from 10 runs:
                   1 thread    2 threads
Peterson1    1840693030    42213715991
Peterson2    1836106908    35781376968
I find the results quiet interesting. First off, when there is no contention (column “1 thread”) the performance of both implementations is very similar with the Peterson2 being slightly ahead. When we introduce the second thread and thus increase the contention, the Peterson2 is significantly (15%) faster. But what surprises the most is the difference between the execution of the task on 1 versus 2 threads! 1 thread does the job more than 20 times faster than 2 threads! We can see clearly how expensive it is to coordinate 2 threads when they want to access the shared variable. And what if we use a Java synchronized mechanism and an intrinsic object lock? Then the average time (again from 10 runs) for 2 threads will be 2678088766 ns. It is way better than our lock implemenation but still it is slower than 1 thread with the Peterson lock acquisition overhead.

Ok, so that’s it for today. Hope you enjoyed the article and feel free to leave a comment!

Thursday, February 28, 2013

Flushing with a volatile!

Core Java provides different ways of ensuring visibility of actions on memory performed by one thread to other threads. You probably can think of synchronization, marking a variable volatile or using a thread safe collection from java.util.concurrent package.

Today we will explore another, less obvious approach. We will use a volatile variable to ensure visibility of another non-volatile variable. Lets start with a simple but flawed example:
class BadTask implements Runnable {
 boolean keepRunning = true;
 
 @Override
 public void run() {
  while(keepRunning) {
  }
  System.out.println("Done.");
 }
}

public class VolatileExp {
 public static void main(String[] args) throws InterruptedException {
  BadTask r = new BadTask();
  new Thread(r).start();
  Thread.sleep(1000);
  r.keepRunning = false;
  System.out.println("keepRunning is false");
 }
}
The intention of this code is to let BadTask run for 1 second and after that to stop it by setting keepRunning boolean to false. As simple as it may look this code is doomed to fail - the BadTask won’t stop after 1 second and will run until you terminate the program manually. If it works fine in your environment try a different one. In my case the above code fails constantly on the following:
$ cat /proc/cpuinfo | egrep "core id|physical id" | tr -d "\n" | sed s/physical/\\nphysical/g | grep -v ^$ | sort | uniq | wc -l
12
$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
$ cat /proc/version
Linux version 2.6.32-33-server (buildd@allspice) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #72-Ubuntu SMP Fri Jul 29 21:21:55 UTC 2011
If the program does not stop you may wonder what happend. In short - the main thread and the thread running BadTask have been executed on different cores. Each core has its own set of registers and caches. The new value of keepRunning has been written to one of these without being flushed to the main memory. Thus it is not visible to the code running on a different core.

Ok, how we can fix it? The simplest and the most correct way is to mark this variable volatile. Another approach would be to acquire a common lock when accessing it but that would be definetly an overkill.

So what we will do today? We will introduce another variable marked with a volatile keyword! In the above code it does not make much sense and is only for demonstrating some aspects of Java memory model. But think about a scenario where there are more variables of keepRunning nature. Have a look at the below code that does not have visibility problem anymore:
class BadTask implements Runnable {
        boolean keepRunning = true;
        volatile boolean flush = true;

        @Override
        public void run() {
                while(keepRunning) {
                        if(flush);
                }
                System.out.println("BadTask is done.");
        }
}

public class VolatileExp extends Thread {

        public static void main(String[] args) throws InterruptedException {
                BadTask r = new BadTask();
                new Thread(r).start();
                Thread.sleep(1000);
                r.keepRunning = false;
                r.flush = false;
                System.out.println("keepRunning is false");
        }
}
So as already mentioned we have introduced a new volatile variable “flush”. We do two things with it. First, we do a write operation in the main thread, right after modifying a non-volatile keepRunning variable. Second, in the thread running BadTask, we do a read operation on it.
Now, how come the value of keepRunning is flushed to the main memory? This is guaranteded by the current Java memory model. According to JSR133 “writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire”. Thus, actions on memory done by one thread before writing to a volatile variable will be visible to another thread after reading that variable.

This is an advanced technique which should be used sparingly only when the performance has the highest priority. If you are looking for a real life adaptation of it, you can have a look at the ConcurrentHashMap from java.util.concurrent package.

Saturday, January 28, 2012

Proxy and Adapter design patterns.

Recently, when speaking to some of my colleagues, I have noticed a lot of confusion between these two design patterns. Thus, today I'm going to clarify all the doubts :).

First off, both Proxy and Adapter belong to the same group of design patterns – structural one. What are structural design patterns? In simple words, they are patterns that describe ways of combining different entities with each other and thus forming larger structures.

And now a definition from Wikipedia:

“Structural design patterns are design patterns that ease the design by identifying a simple way to realize relationships between entities.”

Right. First we will scrutinize the adapter pattern. In a nutshell, it transforms an interface of one class into the interface that is expected. It enables two classes to work together even though their interfaces are not compatible with each other. There are two basic variations of the adapter design pattern: class adapter and object adapter.

  • Class adapter.

In this case, the adapter's class subclasses the class we want to adapt and in the same time it implements the desired interface that the client expects. When the call is made, appropriate inherited methods are being invoked.

  • Object adapter.

On the other hand, in the object adapter, the adapter's class does not subclass the adaptee but instead it holds a reference to it (composition). It also implements the expected interface.

There are pros and cons of each of these versions but in this article lets focus on more general concepts of the adapter pattern.

Now, what about the proxy pattern?

In general, it provides a surrogate of a given object in order to control the access to it in some way. Based on the way of controlling the access we can distinguish different types of proxy design pattern:

  • Remote proxy – it handles the interaction between remote objects.
  • Virtual proxy – it handles the situation when an object is expensive to create (lazy initialization).
  • Protection proxy - based on who calls the proxy, it controls the access to different methods.
  • Smart proxy – adds some extra operations when the objects is accessed.

Very important thing about the proxy pattern is the fact that the surrogate has the same interface as the real object.

To summarize. The adapter bridges the gap between two classes that are not compatible with each other. The proxy provides a surrogate to control the access to the real object.

Saturday, November 26, 2011

Re-throwing exceptions.

Today we will talk about re-throwing exceptions. In many cases, after examining a caught exception, it doesn't fulfill some requirements and we must re-throw it. This is a very basic and simple action in C# but for the Java developers there is a trap. In the aforementioned scenario, in Java we would just write:
throw e;
were e is the exception we caught.
Why don't we do the same in C# and see what will happen. Below is a sample code that illustrates such situation:
   class Program
{
void m1() { m2(); }
void m2() { m3(); }
void m3() { m4(); }
void m4()
{
throw new Exception("A sample exception.");
}
static void Main(string[] args)
{
try
{
new Program().m1();
}
catch(Exception e)
{
throw e;
}
}
}
Now lets execute it and examine the stack trace:
System.Exception was unhandled
Message=A sample exception.
Source=LearningCsharp
StackTrace:
at ConsoleApplication1.Program.Main(String[] args) in C:\Path\Program.cs:line 220
What did happen here? We somehow lost the whole stack trace. Now it looks like the exception was created within the Main method – in a real life scenario that may cause many problems when troubleshooting.
How to do it correctly in C# and preserve the whole stack trace? Here is the answer:
   class Program
{
void m1() { m2(); }
void m2() { m3(); }
void m3() { m4(); }
void m4()
{
throw new Exception("A sample exception.");
}
static void Main(string[] args)
{
try
{
new Program().m1();
}
catch(Exception e)
{
throw;
}
}
}
Yes! Simple
throw;
solves the issue. Now our stack trace will look as follows:
System.Exception was unhandled
Message=A sample exception.
Source=LearningCsharp
StackTrace:
at ConsoleApplication1.Program.m4() in C:\Path\Program.cs:line 210
at ConsoleApplication1.Program.m3() in C:\Path\Program.cs:line 207
at ConsoleApplication1.Program.m2() in C:\Path\Program.cs:line 206
at ConsoleApplication1.Program.m1() in C:\Path\Program.cs:line 205
at ConsoleApplication1.Program.Main(String[] args) in C:\Path\Program.cs:line 220
...
Looks much better – we can see which method thrown the exception and thus find the source of the issue.
So Java developers – bear that difference in mind and avoid confusion when analyzing a half empty stack trace! :)

Saturday, November 19, 2011

Exception handling – multiple catch.

Let's talk about the exception handling. Just to ensure that we are standing on the same planet – multiple catch of exception is a situation when we have a method that throws more than one checked exception and we want to handle some of them separetly. The following Java code shows such situation:

import java.util.ArrayList;
import java.util.List;

class Exception1 extends Exception {}
class Exception2 extends Exception {}
class Exception3 extends Exception {}

public class ExceptionHandling {
public static void throwsExceptions() throws Exception1, Exception2, Exception3 {
return;
}

public static void testMethod() {
try {
throwsExceptions();
}
catch(Exception1 e) {
// handle Exception1
}
catch(Exception2 e) {
// handle Exception2
}
catch(Exception e) {
// handle other exceptions.
}
}
}

But what if the logic for handling Exception1 and Exception2 is the same? In this case we get a dirty code duplication. Although, there is a simple remedy for that. What we could do, is to catch a base Exception class and then branch appropriately with if-else statement as the below code snippet presents:

public static void testMethod2() {
try {
throwsExceptions();
}
catch(Exception e) {
if(e instanceof Exception1 || e instanceof Exception2) {
// Do the common handling logic for Exception1 and Exception2
}
else {
// Handle other exceptions
}
}
}

So far so good. We don't have code duplication anymore. Just a note: the above examples use Java language but they also apply to C# - we would face the same issue/remedy when coding in it.
Now you may wonder what more can be done. As for C# not much really. In Java though, there is still some room for improvement. In Java 7 there has been introduced a new multiple catch mechanism so that we can catch multiple exceptions in one catch block! The following code sample shows the solution using Java 7:

public static void testMethod3() {
try {
throwsExceptions();
}
catch(Exception1 | Exception2 e) {
// Do the common handling logic for Exception1 and Exception2
}
catch(Exception e) {
// Handle other exceptions.
}
}

Here we can see how exceptions of type Exception1 and Exception2 are handled in one catch block using the '|' operator.
Summing up – a point for Java :)