Saturday, November 5, 2011

How Node.js Multiprocess Load Balancing Works

As of version 0.6.0 of node, load multiple process load balancing is available for node. The concept of forking and child processes isn't new to me. Yet, it wasn't obvious to me how this was implemented at first. It's quite easy to use however:

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}

cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
// Worker processes have a http server.
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}

This is quite a beautiful api, but it hides some clever details. For instance, how is possible for all child processes to each open up port 8000 and start listening?

The answer is that the cluster module is just a wrapper around child_process.fork and net.Server.listen is aware of the cluster module. Specifically, cluster.fork uses child_process.fork to create child processes with special variable set in their environments to indicate cluster was used. Specifically process.env.NODE_WORKER_ID is non-zero for such children.

envCopy['NODE_WORKER_ID'] = id;
var worker = fork(workerFilename, workerArgs, { env: envCopy });

Then net.Server.listen checks to see if process.env.NODE_WORKER_ID is set. If so, then the current process is a child created cluster. Instead of trying to start accepting connections on this port, a file handle is requested from the parent process:



//inside net.js
require('cluster')._getServer(address, port, addressType, function(handle) {
self._handle = handle;
self._listen2(address, port, addressType);
});

//inside cluster.js
cluster
._getServer = function(address, port, addressType, cb) {
assert(cluster.isWorker);

queryMaster({
cmd: "queryServer",
address: address,
port: port,
addressType: addressType
}, function(msg, handle) {
cb(handle);
});
};

Finally, this handle is listened on instead of creating a new handle. At which point you have another process that is listening on the same port. Quite clever, though I think the beauty of the api comes at the cost of some not so hideous hacks inside the api.