Point Pelewin: 2009-12

2009-12-18

Merge sort in Scala and Processing (2/3)

(Part 1 2 3)

For my merge sort demonstration, I wanted a factory object producing snapshots of progressive sorting in an array of integers. It would essentially be a (partial) function with the signature
(Array[Int]) ⇒ RepeatingIterator[Int].

The argument to the factory is an array of integers that represents the initial state, before sorting.

The class RepeatingIterator is derived from Iterator. It repeats infinitely by yielding the first item after the last, and so on.

  object SortStates {
    def apply (init: Array[Int]) = {
       Create stages 

       Define conversion function 

      new RepeatingIterator(Seq(init) ++
        (stages map convFn))
    }
  }

Create stages

The states of partial to complete ordering are mapped from an instance of Stages (which is a structure that will be described in part 3; suffice to say that it provides a method def map (f: List[Int] ⇒ Array[Int])).

      val size   = init.size
      val stages = new Stages(size)

Define conversion function

The function passed to map is an object whose apply method takes two arguments: a copy of the initial state, and one of the stages mapped over.

      val convFn = StateFunc(init take size) _

The StateFunc object

This object is a function (Array[Int])(List[Int]) ⇒ Array[Int] that takes as its first argument the previous state of the values. The second argument is a list that describes one stage in the merging process. After building a state with buildState, it is written back to the data parameter and finally returned.

object StateFunc {
   Define merge method 

   Define splitMerge method 

   Define buildState method 

  def apply (data: Array[Int])(stage: List[Int]) = {
    val state =
      buildState(data, stage).flatten.toArray

    for (i ← 0 until data.size)
      data(i) = state(i)

    state
  }
}

Define buildState method

The buildState method builds a new state by merging the sorted lists of the previous state. One merge operation is done for each element of the stage data structure. During each merge operation, the following constants are defined:

n: the length of the resulting sublist
s: the total length of the sublists that are already merged
r: a list of n values past the s first values of data

If the sublist length (n) is 1, it is already sorted and r is simply added to the result. Otherwise, r is passed to the method splitMerge.

  private def buildState (data: Array[Int], stage: List[Int]) = {
    for {
      i ← 0 until stage.size
      n = stage(i)
      s = (stage take i sum)
      r = (data drop s take n).toList
    } yield if (n == 1) r else splitMerge(r)
  }

Define splitMerge method

The splitMerge method splits its parameter (which has n values) into two sublists with n / 2 and n - n / 2 values, respectively. The subroutine merge is then applied to the two lists.

  private def splitMerge (r: List[Int]) = {
    val n   = r.size
    val n_2 = n / 2
    val xs1 = r take n_2
    val xs2 = r drop n_2 take (n - n_2)
    merge(xs1, xs2)
  }

Define merge method

The merge method is the same as in the mergeSort function in part 1.

  private def merge (a: List[Int], b: List[Int]): List[Int] =
    (a, b) match {
      case (_, Nil) ⇒ a
      case (Nil, _) ⇒ b
      case (ah :: at, bh :: _) if (ah <= bh) ⇒
        ah :: merge(at, b)
      case (_, bh :: bt) ⇒ bh :: merge(a, bt)
    }

And that's it for SortStates and StateFunc. In the final part I will show Stages and RepeatingIterator.

2009-12-17

Merge sort in Scala and Processing (1/3)

(Part 1 2 3)

Someone asked a question about merge sort the other day, and I was inspired to write a Scala implementation of a merge sort routine.

Assuming that we are sorting a list of integers, a recursive solution can be drawn up as follows.

in case the list has no elements, it is already sorted
in case the list has one element, it is already sorted
otherwise, we can make the list sorted by splitting it in two halves, sort each part, and the merge the parts back together

  def mergeSort (xs: List[Int]): List[Int] = {
    xs match {
      case Nil     ⇒ xs
      case List(_) ⇒ xs
      case _       ⇒
        val (xs1, xs2) = xs splitAt (xs.size/2)
        merge(mergeSort(xs1), mergeSort(xs2))
    }
  }

In this context, merging two sorted lists means (X_h means the first element, or head, of X, X_t means the rest, or tail, of X):

in case B is empty, take A
in case A is empty, take B
in case A_h ≤ B_h, take A_h and the result of merging A_t and B
otherwise, take B_h and the result of merging A and B_t

  def merge (a: List[Int], b: List[Int]): List[Int] =
    (a, b) match {
      case (_, Nil) ⇒ a
      case (Nil, _) ⇒ b
      case (ah :: at, bh :: _) if (ah <= bh) ⇒
        ah :: merge(at, b)
      case (_, bh :: bt) ⇒ bh :: merge(a, bt)
    }

This can be generalized to a sequence of anything that can be ordered:

  def mergeSort[A<%Ordered[A]] (xs:Seq[A]):Seq[A] = {
  def merge (xs1: Seq[A], xs2: Seq[A]): Seq[A] =
    (xs1, xs2) match {
      case (_, Seq()) ⇒ xs1
      case (Seq(), _) ⇒ xs2
      case (Seq(h1, t1 @ _*), Seq(h2, _*))
        if (h1<=h2) ⇒
          Seq(h1) ++ merge(t1, xs2)
      case (_, Seq(h2, t2 @ _*)) ⇒
        Seq(h2) ++ merge(xs1, t2)
  }
  xs match {
    case Seq()  ⇒ xs
    case Seq(_) ⇒ xs
    case _      ⇒
      val (xs1, xs2) = xs splitAt (xs.size/2)
      merge(mergeSort(xs1), mergeSort(xs2))
  }
}

scala> mergeSort("Hello world")
res1: Seq[Char] = List( , H, d, e, l, l, l, o, o, r, w)

Yawn.

Well, that's not very interesting, is it?

Then I thought, how about using Processing to visualize the sorting process?

A Scala/Processing applet could display a sequence of values as a dot-spread diagram, with position on the x-axis and value on the y-axis. It would basically look like this (n is the number of values, s is the pixel size of each dot):

    for (x ← 0 until n) {
      val y = f(x)
      ellipse(x * s, y * s, s-1, s-1)
    }

In Scala, objects and functions are interchangeable, so f acts as a partial function that maps x (for values in the range 0 ≤ x < n) to y values. With this definition, the loop in draw will display n dots on the screen, with a single dot in each column and row.

However, my goal was to show unsorted values getting more and more sorted, in effect an animated view of the sorting process. To do this, I need more than one state. While I'm at it, I'll define a state as an Array of integers, and the state structure as a Sequence (produced by the Seq object):

  val fStates: Seq[Array[Int]] = Seq(
    Array(4, 8, 7, 6, 2, 5, 9, 3, 1, 0),
    Array(4, 8, 7, 2, 6, 5, 9, 3, 0, 1),
    Array(4, 8, 2, 6, 7, 5, 9, 0, 1, 3),
    Array(2, 4, 6, 7, 8, 0, 1, 3, 5, 9),
    Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
  )

  var i = 0

  // ...

    val f = fStates(i)
    for (x ← 0 until f.size) {
      val y = f(x)
      ellipse(x * s, y * s, s-1, s-1)
    }
    i = (i + 1) % fStates.size

This works, but there's room for improvement. For one thing, I don't want to have to specify the values myself, and for another, the indexing exposes too much of the implementation. In the next part, I'll define a factory that produces states and provides them through an iterator, which will let me write the applet this way:

  val numVals = 70
  val dotSize =  6
  val fStates =
    SortStates(Array.range(numVals-1, -1, -1))

  override def setup {
    size(numVals * dotSize, numVals * dotSize)
    frameRate(1)
    smooth
    noStroke
    ellipseMode(CORNER)
  }

  override def draw {
    background(255)
    fill(0)
    val f = fStates.next
    for (x ← 0 until numVals) {
      val y = f(x)
      ellipse(
        x * dotSize,
        y * dotSize,
        dotSize-1,
        dotSize-1
      )
    }
  }

(In this example, the initial state has the values in reverse order.)

2009-12-03

Catamorphism, an exercise by Tony Morris

Tony Morris posted a Scala exercise where the object was to reimplement the Option API using a catamorphism. An abstract method cata takes two parameters; two predefined instances of the defining class implement the method to return either of them. The whole API can then be rewritten as applications of that method. I haven't checked his solution yet, but I've compared mine to Kris Nuttycombe's posted here (and actually nicked one or two more elegant forms from it).

Anyway, here's my version:

trait MyOption[+A] {
  import MyOption._ 

  // single abstract method
  def cata[X] (some: A => X, none: => X): X
  def map[B] (f: A => B) =
    cata(a => some(f(a)), none)
  def flatMap[B] (f: A => MyOption[B]) =
    cata(f, none)
  def getOrElse[AA >: A] (e: => AA) = cata(a => a, e)
  def filter (p: A => Boolean) =
    cata(a => if (p(a)) this else none, this)
  def foreach (f: A => Unit) { cata(f, ()) }
  def isDefined = cata(_ => true, false)
  def isEmpty = cata(_ => false, true)

  // WARNING: not defined for None
  def get = cata(a => a, error("MyOption.none.get"))

  def orElse[AA >: A] (o: MyOption[AA]) =
    cata(_ => this, o)
  def toLeft[X] (right: => X) =
    cata(a => Left(a), Right(right))
  def toRight[X] (left: => X) =
    cata(a => Right(a), Left(left))
  def toList = cata(a => List(a), Nil)
  def iterator =
    cata(a => Iterator(a), Iterator.empty)
}

object MyOption {
  def none[A] = new MyOption[A] {
    def cata[X](s: A => X, n: => X) = n
  }

  def some[A](a: A) = new MyOption[A] {
    def cata[X](s: A => X, n: => X) = s(a)
  }
}

Point Pelewin