Toys and Ranges

Posted on May 10th, 2009 by Jonathan | No Comments

Started playing around with toy languages and ended up starting a series on it on the blog. This should give me a good sandbox to play with features until I bring them into Minnow.

Speaking of new features, I just got done reading Andrei Alexandrescu’s talk on iterators. What a solid talk. It’s nice to see them moving forward, thinking about better and better abstractions. You might not like C++’s syntax, but it’s hard to argue that they haven’t incubated lots of ideas in high performance generic programming over the years(mostly via Boost and others).

This Week in Minnow Git (5/1/2009)

Posted on May 2nd, 2009 by Jonathan | No Comments

Aqualisp Continues

More aqualisp bits made it in this week, including improvements to the parser and the start of the analysis phase. Still have a bit further to go until it sputters to life, but we’re getting there.

Toy Languages

After seeing my wishlist for Minnow 1.0, it quickly dawned on me that I don’t know how to implement everything I want. Instead of muddling the main Minnow code, after watching a spunky interview with Jason Olson last night, I’ve realized the best way to toy with a language feature is to literally implement it as a toy… a toy language that is. The projects will likely be very tiny in size, not much actual utility, but they should be good sandboxes to see what the features look like in practice. Anyone interested in playing along is welcome.

The current number of toy features is currently hovering around 30, but that’s for minnow as a whole, not specifically for the 1.0 release. Lots to choose from.

This Week in Minnow Git (4/24/2009)

Posted on April 24th, 2009 by Jonathan | No Comments

Aqualisp Begins

Looking over the alpha 3 codebase, a few months on, you get the distinct impression the code works like this:

Minnow Source -> [Black Box] -> C code

With little way to get at or test the internals of the black box. As many of you TDD stars know, this is not the way to have a happy development experience. At the time I didn’t have a good way to break things up, as there was a considerable amount of setup and shared state between the various parts (even parts of trees referenced other branches).

At the same time, spending time learning Clojure and Lisp in general, the thought struck me that S-expressions are about the easiest grammar in the world to parse. A light went on. If I break up the compiler into clean stages, I could transfer the tree between stages using S-expression syntax. Testing harnesses could work on the S-expressions and check for equivalence. At the very least, it would take us from:

Minnow -> [Black Box] -> C code

to:

Minnow -> [1/2 Black Box] -> Lisp -> [1/2 Black Box] -> C code

Which is a step in the right direction. Once tested, the modules could be recombined for performance reasons, while gaining the benefit of a nice code separation (and testing!)

Though still in transition, this is what the current syntax looks like (again, of our venerable threadring benchmark):

(defactor Passer
 (def Passer__id int)
 (def Passer__next Passer))
 
(defaction Passer pass [token:int]
 (if (== token 0)
  (block
   (printi Passer__id)
   (exit))
  (block
   (msg Passer__next pass [(- token 1)]))))
 
(defaction Passer set_id_and_next [(def id int) (def next Passer)]
 (= Passer__id id)
 (= Passer__next next))
 
(defaction __Program__ main [(def args Array__string)]
 (def head Passer)
 (def curr Passer)
 (def tmp_i int)
 (def next Passer)
 
 (= head (spawn Passer []))
 (= curr head)
 (= tmp_i 1)
 (while (<= tmp_i 502)
  (cont head curr tmp_i)
  (block
   (= next (spawn Passer []))
   (msg curr set_id_and_next [tmp_i next])
   (= curr next)
   (= tmp_i (- tmp_i 1))))
 (msg curr set_id_and_next [503 head])
 (msg head pass (. (index args 0) to_int [])))

This Week in Minnow Git (4/17/2009) [late]

Posted on April 20th, 2009 by Jonathan | No Comments

A few days late, but still warm and gooey…

Aquarium Rewrite Status

After wrestling with CAS for a bit too long, and swearing at the lack of DCAS (or my own laziness for not restructuring to allow fake DCAS with paired CAS), we’re now switched over to using very tiny locks. The two areas with locks are: actor message queue and actor work queue. Speed-wise the actor message queue is nearly identical to using the CAS method, since the locked regions are only a couple instructions each. Later, as we go through optimization phases, I suspect there’s a good chance that the actor work queue will move back to CAS for performance reasons (attempting to steal from work queues shouldn’t incur any locking penalties, esp for manycore systems). Heck, I might try that out today on a working branch.

As you can tell, we’re at the polish stage, but if you’re like me, you’re wanting to know what the point of the rewrite was with some numbers.

First, the gist:

  • No kernel. What it did — cross-scheduler messaging, load balancing, actor isolation — is now done using other techniques.
  • Actors are messaged directly using safe message queues. Sending to the queue will tell you if it’s safe to enqueue/dequeue that actor for work. If you notice the actor is idle, and you are the first person messaging it, you become its owner. With this actors have no origin scheduler, and effectively are drawn from a shared idle pool
  • Once active and on a scheduler, any actor can be stolen by idle schedulers. This gives us our load balancing.
  • Because actors can be stolen around a scheduler, it becomes less important to have explicitly isolated actors. If the scheduler is busy with a long-running actor, other schedulers can steal off the starving actors behind the active task.

Now, the numbers:

Threadring (8-way Ubuntu Linux)

10m:
Alpha 3: .38s
Alpha 4: 3.0s

50m:
Alpha 3: 1.8s
Alpha 4: 15.0s

Explanation: We knew going in that threadring would suffer the most in the rewrite, as it gained a lot of speed from assuming that idle actors were paired to the local scheduler. We take a hit for not being able to make this assumption anymore.

Big bang:

1500:
Alpha 3: 1.0s
Alpha 4: 0.6s

4000:
Alpha 3: 108.0s
Alpha 4: 4.5s

Explanation: On the flip side, big bang actually is doing “real work”, and gets a performance boost for doing so. The active load balancer in alpha 4 if far superior to the alpha 3 one when there is work that can be shared.

Like I said earlier, we might be able to shave some of the fat during optimization later on, but I’m pleased with where we are. These are some pretty solid results and come from a much simpler architecture. The Aquarium library could also reasonably be used as an add-on library for C/C++ applications at this point. The code wouldn’t be idiomatic C, since sending messages to actors won’t look like function calls, but once you’re over the bump you’ve got an active load-balancing system.