Monday, August 29, 2011

We have moved to

We have moved!! You can find us here at

We have packed up and moved our blog onto a subdomain of our main site. This blog will probably stay up for a good long time but we will not be writing anything new here but we will also not be moving the posts from this blog to the other.

Erlware Team

Tuesday, October 12, 2010

Console Applications in Erlang

This tutorial is brought to you by ErlangCamp 2010 - Chicago, October 23 and 24 - already at 95% capacity! It's gonna be totally sweet.
Erlang is probably not the first language you'd think of for building console applications. Here's a typical "Hello World" application in Erlang:


hello() ->
io:format("Hello World!~n").
After compiling the module, you'd run it as an application like this:
$ erl -run hello hello -run init stop -noshell
Hello World!
Whoa, that's a lot of work just to print a simple message to standard output!

Here's the same thing in Python:
print "Hello World!"
And using it:
Now that's more like it! It's no surprise that script languages like Python and Perl are used extensively to build console applications.

So why bother using Erlang for this sort of thing? Erlang's core strength is handling extreme concurrency problems and building long running, fault tolerant applications. Surely you'd be better off sticking to Python, Perl, or even bash!

Actually, that thinking is basically correct. If you're competent in a scripting language, you probably want to start there. But consider Erlang for these reasons:
  • You're using Erlang for other applications and want to avoid introducing another runtime dependency for your console apps
  • You're a True Believer in functional languages and want to extend the goodness to your scripts
  • You need to communicate with Erlang nodes or work with Erlang persisted terms (e.g. config files)
  • You want to brag to your chums that you're an Erlang hipster and have entered the ranks of the cool kids!
Pretty powerful arguments. The good news is that there's no reason to forego Erlang just because it's reputed strengths lie elsewhere.

Improving Erlang's Hello World

Enough strategy - let's fix that fugly "Hello World" app! Create a file named "hello" that looks like this:
#!/usr/bin/env escript
main(_) ->
io:format("Hello World!~n").
Not as drop-dead simple as the Python version, but pretty close.

Let's run it:
$ escript hello
Hello World!
If you want to execute it directly, change its permission:
$ chmod 755 hello
$ ./hello
Hello World!

The secret here is escript, which is installed with the standard Erlang distribution. If you can run erl, you can run escript. By using the shebang as the first line of the script, we turned this simple file into an bona fide executable Erlang application!

At this point, we have all we need to write Erlang console applications. If you want to automate something on a system and have a hankering to use some Erlang, this is how you'd do it.

Super Charging Erlang's Hello World

Let's take things further and carve out a full fledged console application famework. It's simple!

Grab the latest getopt Erlang source from github:
This is a terrific module that lets you parse command line arguments into validated Erlang terms using the nearly ubiquitous getopt convention.

What's getopt? If you've ever run an application from a command line, you've probably already seen the convention. Here's a quick summary:
  • Command line options are differentiated from command line arguments
  • By convention, options are always, well, optional
  • Arguments may be optional but are frequently required
  • Options are designated by either a leading single-dash "-" (short form) or a double-dash "--" (long form)
  • Short form options are always a single character
  • Options may have values, which follow the option name and a space (or alternatively an equals sign "=" for long forms)
Ah heck, it's probably easier to just look at an example. Here's classic getopt:
$ man ls
Now that's the sort of high quality interface we want for our Erlang console apps!

Let's tweak our "Hello World" app with a some new features:
  • Support for a custom message
  • Align the message - left, right, or center - within a particular number of spaces
  • Print help/usage info if we ask for it
This is what our help screen should look like after we're done:
$ ./hello --help
Usage: hello [-a ] [-w ] [-h ] [message]

-a, --align alignment: left (default) | right | center
-w, --width width used by alignment (79)
-h, --help display this help and exit
message message to print (Hello World!)

Prints a message to standard output, aligning it with a
particular number of spaces (width).
"Too much work" you say! Fear not - with the getopt module, it's really simple.

First, we need a specification that getopt will use to parse command line argument. We'll add the following Erlang macro to the hello script:
[{align, $a, "align", atom,
"alignment: left (default) | right | center"},
{width, $w, "width", {integer, 79},
"width used by alignment (79)"},
{help, $h, "help", boolean,
"display this help and exit"}]).
You can read more about specifications in the getopt module documentation. The spec tells getopt what to expect in terms of arguments. This is used for both parsing the arguments and for printing usage documentation.

In our case, we have three options: one specifying the alignment, one for the width, and another for printing the program help. Each option has:
  • A name, used to identify parsed values
  • A short form (char) and long form (string) of the option
  • A type and, optionally, a default value for the option
  • Help text
With our spec in hand, let's modify the script's main function:
main(Args) ->
case getopt:parse(?SPEC, Args) of
{ok, {Opts, MsgParts}} ->
Msg = case MsgParts of
[] -> "Hello World!";
_ -> string:join(MsgParts, " ")
Align = proplists:get_value(align, Opts),
Width = proplists:get_value(width, Opts),
io:format("~s~n", [format(Msg, Align, Width)]);
{error, _} -> usage(1)
There are several functions that we still need to define, but the core application logic is all there.

The function first parses the arguments passed on the command line. getopt:parse/2 returns {ok, {Options, Arguments}} if the command line args comply with the specification. Otherwise, it returns {error, Reason}. In main/1, we print a message given validated input or display the program usage if there are problems.

Once the arguments are parsed, getting the user input is a simple matter of reading from Options (a propery list - see Erlang's proplists module for details) and from Arguments (a list of non-option arguments). Values provided by the user are converted to the expected type and missing values are filled in with default values from the spec.

Next, let's define usage/1.
usage(Exit) ->
?SPEC, "hello", "[message]",
[{"message", "message to print (Hello World!)"}]),
case Exit of
0 ->
io:format("Prints a message to standard output, "
"aligning it with a particular number "
"of\nspaces (width).\n");
_ -> ok
Here we use getopt:usage/4 to print the expected usage of the program to standard output. The 4-arity variant lets us specify additional help text for the usage. We also print detailed help text if the application is exiting normally (exit code is 0). If the application is exiting abnormally (e.g. the input from the user is invalid), we just display the usage. Finally, we terminate the application using erlang:halt/1.

Our next function is maybe_help/1:
maybe_help(Opts) ->
case proplists:get_bool(help, Opts) of
true -> usage(0);
false -> ok
This function checks for the help option and calls usage/1 if it was specified.

Here's how we format the message:
format(_, _, Width) when Width < 0 -> error("invalid width");
format(Msg, undefined, Width) -> string:left(Msg, Width);
format(Msg, left, Width) -> string:left(Msg, Width);
format(Msg, right, Width) -> string:right(Msg, Width);
format(Msg, center, Width) -> string:centre(Msg, Width);
format(_, _, _) -> error("invalid align option").
This is a dense bit of code, but it's very simple. It formats a message using one of the alignment functions in the string module. If there are problems with the input, it uses error/1 to complain:
error(Msg) ->
io:format("ERROR: ~s~n", [Msg]),
Pretty straight forward - print a message and exit with a non-zero value, indicating that an error occurred.

That's it! We have a strangely sophisticated "Hello World" application - and it's written in Erlang! Who'd have thunk?

Here's the complete hello source.

Let's try it out.
$ ./hello --align=center --width=40 You looking at me?
You looking at me?
Worked as advertised! You're encouraged to try it our for yourself. Can you handle the power??

Feel free to this script as a template for your own console applications.


We started with the user interface. It's a good idea to build your console applications around "usage" documentation. We kept the required inputs to a minimum (zero actually) relying on defaults to fill in values the user doesn't care about. The getopt scheme works perfectly for this.

We always handle the --help option when provided (i.e. maybe_help/1) by printing the full usage, including any detailed documentation, and exiting.

We leveraged the goodness of functional decomposition and Erlang's pattern matching to write clean and maintainable code.

One finally point: escript applications have access to all of Erlang's core modules and any user defined modules that are in the Erlang path. You're also free to dynamically modify the path to link to your custom modules (see the code module). Take full advantage of this by building the core of your application as compiled Erlang modules and use escript code to call into them.

Now, using your new powers, go out, write and deploy Erlang console applications throughout the world - may our jobs as Erlang developers be duly secured!

Tuesday, September 7, 2010

Flymake and Erlang

Flymake is a really useful tool for programming Erlang, or for programming in general, that gives you on the fly error detection in source files. It does this by compiling your source code in the background and showing the results in file you are editing. It doesn't actually know anything about languages or how to compile, it is built in a very abstract way such that it can be used for any language. In this case we can use it for programming Erlang. It will only catch the errors that the erlang compiler can warn about but even still it vastly reduces the edit/compile/debug loop.

Flymake is available out of the box in emacs, so all we need to is add a script to do the compile and tell flymake where it is.

I have a directory in my home directory called ~/.erlang_code. In this directory I have a file called eflymake. The contents of this file are listed below. (make sure the file is executable). This isn't my code, I pulled the snippet from the net a while ago and have forgotten the source, so unfortunatly I can't give proper attribution.

#!/usr/bin/env escript

main([File_Name]) ->
compile:file(File_Name, [warn_obsolete_guard, warn_unused_import,
warn_shadow_vars, warn_export_vars,
strong_validation, report,
{i, "../include"}]).

This will do the compilation for you, and you can see how to change the compile flags to suit your needs. Now that that is done you need to tell flymake where it is and how to use it. You can do this buy adding the following code to your emacs configuration.

(require 'flymake)
(setq flymake-log-level 3)

(defun flymake-erlang-init ()
(let* ((temp-file (flymake-init-create-temp-buffer-copy
(local-file (file-relative-name
(file-name-directory buffer-file-name))))
(list "~/.erlang_code/eflymake" (list local-file))))

(add-to-list 'flymake-allowed-file-name-masks
'("\\.erl\\'" flymake-erlang-init))

(defun my-erlang-mode-hook ()
(flymake-mode 1))

(add-hook 'erlang-mode-hook 'my-erlang-mode-hook)

You can learn this and other pro Erlang and OTP programming tips at the ErlangCamp. These kinds of simple improvements can seriously improve the efficiency of your Erlang programming workflow.

Saturday, August 14, 2010

ErlangCamp - Erlang and OTP Workshop in Chicago Oct 23 and 24

ErlangCamp is here! You may have seen the announcement on the already. ErlangCamp is a two day hands on workshop for those interested in learning how to go from novices or experienced levels of programming Erlang to being able to confidently write production grade Erlang/OTP services.

ErlangCamp is an opportunity to learn from those who have done a ton with OTP and Erlang and put many many lines of code into production at companies ranging from huge to small. The curriculum will roughly follow the progression from the book "Erlang and OTP in Action" at but present a ton of new material from a different angle. When you leave you will know how to confidently put massively parallel, fault tolerant, distributed Erlang/OTP applications into production and then manage them from there. You can see a summary of what we will cover here on the ErlangCamp sessions page.

Keep up with the Camp via Twitter or Email

To keep up with what is going on with the Camp you can follow on Twitter or register or email update on the ErlangCamp official site home page.

Meet and greet

ErlangCamp will feature a great meet and greet session at a local Chicago establishment the first evening. This will be a great chance to meet other engineers and folks from companies that are looking to hire folks that can put Erlang into production! As at all times Erlangers get together it will be a lot of fun and lively conversation over good refreshments.

International Registration Help

Registrations are coming in from all over the world right now and
so if you are not from the Chicago area and would like to attend but
have questions about logistics or just need some extra help please
feel free to contact ErlangCamp coordinators through the ErlangCamp website.

Wednesday, July 7, 2010

A Brief Overview of Concurrency.


Over the last few weeks I have had several conversations with people about concurrency, more specifically the ways in which shared information is handled in concurrent languages. I have gotten the impression that there isn't really a good understanding of whats out there in the world of concurrency. That being the case it would be a good idea to just give a quick overview of some of the mechanisms that are gaining mind share in the world of concurrency.

Aside from some engineers that are currently so deep in their rut that they can't see sunlight its been accepted that the current mainstream approach to concurrency just wont work. The idea of using mutexes and locks to try to take advantage of the up and coming massively multi-core chips is really just laughable. We can't ignore this topic. As software engineers we don't really have a choice about supporting large numbers of CPUs, thats the direction that hardware is going its up to us to figure out how to make it work in software. Fortunately a bunch of really smart folks have been thinking about this problem for a really long time. A few of the things they have been working on are slowly making their way into the gestalt of the software world.

We are going to talk about three things. They are Software Transactional Memory (STM), Dataflow Programing specifically Futures and Promises and Message Passing Concurrency. There are others, but these currently have the most mind share and the best implementations. I have limited space and limited time so I am going to stick to these three topics. You may have noticed that up to this point I have only talked about concurrency. More specifically the communication between processes in concurrent languages. Thats intentional, I am not going to talk about parallelism at all. Thats a different subject and only faintly related to concurrency, but often conflated with it. So if your looking for that you are looking in the wrong place. On another note, I am going to use processes and threads pretty much interchangeably. I know that in certain languages the two terms have very different meanings, however, in many other languages they mean the same or one of the terms doesn't apply. What I am getting at is that if you open the scope of the discussion to the languages at large the meanings become ambiguous. When I use the term process or thread I am talking about a single concurrent activity that may or may not communicate with other concurrent activities in same way. Thats about the best I can do for you.

Traditional Shared Memory Communication

Shared Memory Communication is the GOTO of our time. Like GOTO of years past its the current mainstream concurrent communication technique and has been for a long, long time. Just like GOTO, there are so many gotchas and ways to shoot yourself in the head that its scary. It so bad that this approach to concurrency has tainted an entire generation of engineers with an deeply abiding fear of concurrency in general. This great fear has crippled the ability of our industry to adapt to the new reality of multi-core systems. The sooner shared memory dies the horrible death it deserves then the better for us all. Having said that, I must now admit that, just like GOTOs, shared memory has a small niche where it probably can't be replaced. If you work in that niche then you already know you need shared memory and if you don't you don't. Just a hint, implementing business logic in services is not that niche. OK, I am all done with my rant and I feel much better, now on to the show.

Shared memory typically refers to a large block of random access memory that can be accessed by several different concurrently running processes. This block of memory is protected by some type of guard that makes sure that the block of memory isn't being accessed by more then one process at any particular time. These guards usually take the form of Locks, Mutexes, Semaphores, etc. There are a bunch of problems with this shared memory approach. There is complexity in managing the serial access to the block of memory. There is complexity managing lock contention for a heavily used resources. There is a very real possibility of creating deadlocks in your code in a way that isn't easily visible to you as a developer. There is just all kinds of nastiness here. This type of concurrency is found in all the current mainstream programming and scripting languages, C, C++, Java, Perl, Python, etc. For whatever reason its ubiquitous and we have all been exposed to it that doesn't mean we have to accept it as the status quo.

Software Transactional Memory (STM)

The first non-traditional concurrency mechanism we are going to look at is Software Transactional Memory, or STM for short. The most popular embodiment of STM is currently available in the GHC implementation of Haskell. As for the description I will let wikipedia handle it.

Software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It functions as an alternative to lock-based synchronization, and is typically implemented in a lock-free way. A transaction in this context is a piece of code that executes a series of reads and writes to shared memory. These reads and writes logically occur at a single instant in time; intermediate states are not visible to other (successful) transactions.

STM has a few benefits that aren't immediately obvious. First and foremost STM is optimistic. Every thread does what it needs to do without knowing or caring if another thread is working with the same data. At the end of a manipulation if everything is good and nothing has changed then the changes are committed. If problems or conflicts occur the change can be rolled back and retried. The cool part of this is that there isn't any waiting for resources. Threads can write to different parts of a structure without sharing any lock. The bad part about this is that you have to retry failed commits. There is also some, not insignificant, overhead involved with the transaction subsystem itself that causes a performance hit. Additionally in certain situations there may be a memory hit, ie if n processes are modifying the same amount of memory you would need O(n) of memory to support the transaction. This is a million times better then the mainstream shared memory approach and if its the only alternative available to you you should definitely use it. I still consider it shared memory at its core. Thats an argument that I have had many, many times.

Dataflow - Futures and Promises

Another approach to concurrency is the use of Futures and Promises. Its most visible implementation is in Mozart-Oz. Once again I will let wikipedia the description for me.

In computer science, futures and promises are closely related constructs used for synchronization in some concurrent programming languages. They both refer to an object that acts as a proxy for a result that is initially not known, usually because the computation of its value has not yet completed.
Lets lay down the difference between Futures and Promises before we get started. A future is a contract that a specific thread will, at some point in the future, provide a value to fill that contract. A promise is, well a promise, that at some point some thread will provide the promised value. This is very much a dataflow style of programming and is mostly found in those languages that support that style, like Mozart-Oz and Alice ML.

Futures and Promises are conceptually pretty simple. They make passing around data in concurrent systems pretty intuitive. They also serve as a good foundation on which to build up more complex structures like channels. Those languages that support Futures and Promises usually support advanced ideas like unification and in that context Futures and Promises work really well. However, although Futures and Promises remove the most egregious possibilities for dead-locking it is still possible in some cases.

In the end both of these approaches involve shared memory. They both do a reasonably good job at mitigating the insidious problems of using shared memory, but they just mitigate those problems, they don't eliminate them. The next mechanism takes a completely different approach to the problem. For that reason it does manage to eliminate most of the problems involved with shared memory concurrency. Of course, there is always a trade off and in this case the trade off is in additional memory usage and copying costs. I am getting ahead of myself let me begin at the beginning and then proceed to the end in a linear fashion.

Message Passing Concurrency

The third form of concurrency is built around message passing. Once again I will let wikipedia describe the system as it tends to be better at it then I am.
Concurrent components communicate by exchanging messages (exemplified by Erlang and Occam). The exchange of messages may be carried out asynchronously (sometimes referred to as "send and pray", although it is standard practice to resend messages that are not acknowledged as received), or may use a rendezvous style in which the sender blocks until the message is received. Message-passing concurrency tends to be far easier to reason about than shared-memory concurrency, and is typically considered a more robust, although slower, form of concurrent programming. A wide variety of mathematical theories for understanding and analyzing message-passing systems are available, including the Actor model, and various process calculi.
Message passing concurrency is about processes communicating by sending messages to one another. Semantically these messages are completely separate entities unrelated to whatever data they where built from. This means that when you are writing code that uses this form of concurrency you don't need to worry about shared state at all, you just need to worry about how the messages will flow through your system. Of course, you don't get this for free. In many cases, message passing concurrency is built by doing a deep copy of the message before sending and then sending the copy instead of the actual message. The problem here is that that copy can be quite expensive for sufficiently large structures. This additional memory usage may have negative implications for you system if you are in any way memory constrained or are sending a lot of large messages. In practice, this means that you must be aware of and manage the size and complexity of the messages that you are sending and receiving. Much like Futures and Promises the most egregious 'shoot yourself in the head' possibilities of deadlocking are removed its still possible to do. You must be aware of that in your design and implementation.


In the end any one of these approaches is so much better then the shared memory approach that it almost doesn't matter which one you choose for your next project. However, they each have very different philosophical approaches to concurrency that greatly affect how you go about designing systems. You should explore each one so that you are able to make a logical decision about which one to use for your next project. That said, opinions are like umm, I can't really complete that, but you get my drift. My opinion on the subject is the message passing concurrency is by far the best of the three, where best is defined by most conceptually simple and scalable. In the end the industry will decide which direction is right for that and head in that direction. We are still to early in the multi-core age to get any good impression of which will win out.

Wednesday, June 16, 2010

Build Process Integration


This post isn't going to be Erlang or Language oriented at all. One of my other hobbies revolves around the build process and build process tools. Over the last few years I have been spending a lot of time thinking about improving them, making the build process more transparent etc. I have a way to do that, I believe. Unfortunately, it would take the cooperation of build too implementors to get it off the ground. That, or a new implementation of the existing tools. In any case, let me describe to you what I am talking about.

Many of you may be familiar with a product called Trac. This is an open source project management tool. Actually it calls itself a 'enhanced wiki and issue tracking system for software development projects' but it includes facilities for project management, source control artifact integration, authorization and authentication etc. Its a very interesting and reasonably complete tool. However, there is one single feature that makes this product especially interesting. That feature is the ability to link between the various kinds of information that trac keeps track up. Usually this linking is done with very simple, easy to remember micro-formats. For example, you can easily link a ticket to a changeset with the notation of 'changeset:' in the ticket. Trac understands this format and will produce a click-able link when the ticket is rendered anywhere. This type of linking works between any of the artifacts stored in Trac. The other features are really just there to support that single killer feature. Unfortunately, you only get to use this feature while you are in Trac.

Thats a problem. Because this linking only works with information owned by a Trac system, Trac is forced into an 'own the world' mentality. That is if the designers want to give you the ability to link between a wiki page and an issue the product must provide functionality that implements a wiki and an issue tracking system. If it wants to give you the ability to link to artifacts in a build system it must provide a build system. This is true of any artifact that Trac, or systems, like Trac want to provide. This forces the implementors to spread their time and efforts over a range of products instead of sticking a single product and getting it right. It doesn't allow the implementors of any one product to focus on that one product. It also means that a consumer of this product doesn't have the ability to swap out parts of the product for something he may like better. This is why Trac implements a wiki, a project management system, and a source control display system. It must do this even though quite good systems already address this space in the open source world. It also means that if the developers of Trac or similar systems want to add linking to a new type of artifact they must implement the artifact in their system.

A Better Approach

There is a better approach although it will take a bit of effort to see it realized. Fundamentally this better approach is to let each system handle focus on its purpose and provide some system agnostic way to tie these artifacts together. That is, that some other system should exist that understands how artifacts are related to one another. This system would then allow users to create 'links' or relationships and query those relationships at will.

We can do this by utilizing REST based services (other ways probably work as well) and using the REST semantic. By vending an artifact at a specific unchanging url we cane provide a universally unique identifier that we can use for linking. For example, lets say we were building up a Trac like system. We put our issue artifacts in an issue tracking service at a specific url, say ''. We also put our projects at a specific url, say
''. Both of these apis vend data according to some predetermined format. With these in place it becomes very easy to link these two artifacts together. This implicitly indicates the existence of a couple of things. First and foremost that consumable formats exist for issues and project. Secondly that there exists some means of resolving these relationships. Basically, that you somewhere to put the fact that these issues are related. I think that a separate type of system should be set up that manages these relationships. For nowe I will call that a relationship management service. So with these facts established, if we wanted to associate issue 13 with project Foo we would just create a relationship between and in our relationship management service. With this service in place each individual system wouldn't need to understand linking at all nor store link information. Only clients that wanted to consume the information would need to understand linking.

There are a few and advantages here and some disadvantages. The biggest advantage is that systems don't need to understand how linking occurs with other systems. We can drop any system we want into the mix and get reasonable linking semantics. The second big advantage is we can manipulate the links in one location, traversing the entire graph of links with jumping around to each service that stores information. The disadvantage is that we have yet another service that we must manage.

How To Do It

Of course, actually accomplishing the task of building these relationships is not so simple in the pragmatic world. Several prerequisites have to be met before we can get started.

Systems must vend their data in some generally, consumable, system agnostic way.

Systems must actually vend the artifacts that other systems may find interesting.

There must be a uniform, unchanging way to resolve artifacts so that they can be linked to.

Each of these prerequisites are more complex then they may first appear. For example, the first prerequisite says that systems must vend their artifacts in some system agnostic way. However, in reality you want them to vend it in as simple a way as possible. You also want them to vend their data with either some type of schema or self describing data. Finally you want all of your systems to vend data in a similar way to reduce the complexity of linking and consumption. Similar issues exist with the second prerequisite. Artifacts that should be exposed may be sub artifacts of previously exposed artifacts. For example, in a project management system, projects are probably an exposed artifact but so are milestones which are lower in the hierarchy then projects. Exposing these in a consistent consumable way will require some forethought. Finally and most importantly, artifacts must be resolvable for linking to have any meaning at all. That means that each artifact must have a consistent and unchanging URI that can be used to identify and consume that artifact. Designing such an api will take some significant forethought on the part of the system implementors.

What we are essentially talking about is evolving a set of tools from a broad array of isolated stacks to a set of components that can plug in to a generic, flexible, component architecture. 'Plug in to' in this case just means forming consumable relationships between artifacts vended by each component. This is very similar to the way the familiar World Wide Web is designed. In our case, we are linking artifacts instead of documents and storing the links outside of the artifacts, but the design philosophy is very similar.

To that end, a set of principles that inform the creation and management of components and the relationships between the artifacts that are vended by those components needs to be designed. This must not and should not be some heavy handed attempt to mandate interoperability. It should be a set of general guidelines about what types of services must must be vended by a system. The relationship management should occur within a dedicated system as well.

This approach would free product developers to concentrate on a specific feature or set of features. It would then allow individuals or entities to consume artifacts as they will and forming relationships between arbitrary artifacts. This is an important meme that I think should be propagated along with the REST approach to web services.

Thursday, May 27, 2010

Why Not To Use Distributed Supervision

Let's take the example of distributed supervision. Let's say we have one supervisor on node A that supervises children on node B and node C. What happens if network throughput slows down on the connection from Node A to Node B? This isn't transparent to the Supervisor on Node A. What should the supervisor on Node A do in this case? It has no contact with node B, so it's unsure whether or not the child is still running. Even if the child isn't running, it couldn't start a new child on node B. Should it start a child on node C (assuming it has communication there)? What should it do if node B comes back and the child is still running there? This is the simplest of the possible distributed supervision problems. It gets more subtle from here on out.

The fact that this kind of distributed supervision is even built into Erlang has more to do with Erlang's historical platforms than with its general usefulness. In the early days, Erlang was designed to run on these big telecom machines. The machines contained a bunch of separate computers all in the same cabinet, all directly and tightly connected via a hardware backplane. The TCP pipe was integrated into this hardware backbone. If the backbone went down, the entire system was screwed. They didn't have to worry about network partitioning, slowdowns, etc. So they didn't. Given that context the approach to distribution that they took makes sense, but it also means that these things don't work really in a more real world network scenarios.

This does not mean that you shouldn't use the distribution primitives that are provided. Not at all. In many situations it's perfectly acceptable to base a distributed system on the built in node to node message passing. You just need to be aware of and handle the types of network failure cases that your distributed application is likely to encounter.supervisor on node A that supervises children on node B and node C.What happens if network throughput slows down on the connection fromNode A to Node B? This isn't transparent to the Supervisor on Node A.What should the supervisor on Node A do in this case? It has nocontact with node B, so it's unsure whether or not the child is stillrunning. Even if the child isn't running, it couldn't start a new childon node B. Should it start a child on node C (assuming it hascommunication there)? What should it do if node B comes back and thechild is still running there? This is the simplest of the possibledistributed supervision problems. It gets more subtle fromhere on out.The fact that this kind of distributed supervision is even built intoErlang has more to do with Erlang's historical platforms than with itsgeneral usefulness. In the early days, Erlang was designed to run onthese big telecom machines. The machines contained a bunch of separatecomputers all in the same cabinet, all directly and tightly connected via a hardware backplane. The TCP pipe was integrated into this hardware backbone. If the backbone went down, the entire system was screwed. They didn't have to worry about network partitioning, slowdowns, etc. So they didn't. Given thatcontext the approach to distribution that they took makes sense, but it also means that these things don't work really in a more real world network scenarios.This does not mean that you shouldn't use the distribution primitives that are provided. Not at all. In many situations it's perfectly acceptable to base a distributed system on the built in node to node message passing. You just need to be aware of and handle the types of network failure cases that your distributed application is likely to encounter."