Monday, March 25, 2013

A work sharing scheduler prototype for unbalanced workloads

During the past week Jan and Adam have been working on a work sharing scheduler and thread pool for Acumen's parallel interpreter. The main issue the new scheduler is designed to address is that the current scheduler is limited to a static assignment of work to the threads that are managed by the thread pool. The following model illustrates the main problem with this approach.

class A(n, mode)
  private x := 0 end
  switch mode
    case "Spread"
      if n > 0
        create A(n-1, "Spread");
        create A(n-1, "Spread")
      end;
      mode := "Work"
    case "Work"
      x = sum i*i for i in 1:100 if true
  end
end
       
class B()
end

class Main(simulator)
  private
    mode := "Init"
  end
  switch mode
    case "Init"
      create A(8, "Spread");
      create B();
      mode := "Persist"
    case "Persist"
  end
end


The above model is an example of a highly unbalanced workload, where one of the children (the A instance) of the root object corresponds to practically all the work done during the simulation. However, with only two threads to distribute among the root's children (on a dual-core machine) the static scheduler will waste one of the threads on the B instance, leaving the entire computation of the A instance (and all its descendants) sequential. There are many ways in which such an unbalanced workload could be scheduled more efficiently. For instance, one could increase the number of threads in the thread pool above the number of cores available on the machine. Such an approach will suffer from the overhead incurred by the creation and garbage collection of threads. The strategy pursued this week tries to reallocate the remaining work as evenly possible among free threads as they finish their work. The current implementation can handle unbalanced workloads reasonably well, but additional benchmarking is needed to understand how it performs on a variety of models and simulator parameter settings.

A considerable amount of time during the implementation of this scheduler was spent on debugging deadlocks in the implementation of the thread pool and scheduler. Aside from using the Eclipse debugger and good old fashioned print statements, Oracle's VisualVM tool proved to be very useful. It collects a number of tools from the JDK, including a thread inspector and most notably a way to dump the state of all threads currently running inside a given JVM. Whenever a deadlock occurs, it suffices to look in the thread dump to see what each thread is up to at the moment.