Number of Web Workers Limit

Javascript

Javascript Problem Overview


PROBLEM

I've discovered that there is a limit on the number of Web Workers that can be spawned by a browser.

Example

main HTML / JavaScript

<script type="text/javascript">
$(document).ready(function(){
	var workers = new Array();
	var worker_index = 0;
	for (var i=0; i < 25; i++) {
		workers[worker_index] = new Worker('test.worker.js');
		workers[worker_index].onmessage = function(event) {
			$("#debug").append('worker.onmessage i = ' + event.data + "<br>");
		};
		workers[worker_index].postMessage(i); // start the worker.		
		
		worker_index++;
	}	
});
</head>
<body>
<div id="debug">
</div>

test.worker.js

self.onmessage = function(event) {
	var i = event.data;	

	self.postMessage(i);
};

This will generate only 20 output lines in the

container when using Firefox (version 14.0.1, Windows 7).

QUESTION

Is there a way around this? The only two ideas I can think of are:

  1. Daisy chaining the web workers, i.e., making each web worker spawn the next one

Example:

<script type="text/javascript">
$(document).ready(function(){
	createWorker(0);
});

function createWorker(i) {
	
	var worker = new Worker('test.worker.js');
	worker.onmessage = function(event) {
		var index = event.data;
				
		$("#debug").append('worker.onmessage i = ' + index + "<br>");
		
		if ( index < 25) {
			index++;
			createWorker(index);
		} 
	};
	worker.postMessage(i); // start the worker.
}
</script>
</head>
<body>
<div id="debug"></div>

2) Limit the number of web workers to a finite number and modify my code to work with that limit (i.e., share the work load across a finite number of web workers) - something like this: http://www.smartjava.org/content/html5-easily-parallelize-jobs-using-web-workers-and-threadpool

Unfortunately #1 doesn't seem to work (only a finite number of web workers will get spawned on a page load). Are there any other solutions I should consider?

Javascript Solutions


Solution 1 - Javascript

Old question, let's revive it! readies epinephrine

I've been looking into using Web Workers to isolate 3rd party plugins since web workers can't access the host page. I'll help you out with your methods which I'm sure you've solved by now, but this is for teh internetz. Then I'll give some relevant information from my research.

Disclaimer: In the examples that I used your code, I've modified and cleaned the code to provide a full source code without jQuery so that you and others can run it easily. I've also added a timer which alerts the time in ms to execute the code.

In all examples, we reference the following genericWorker.js file.

genericWorker.js

self.onmessage = function(event) {
	self.postMessage(event.data);
};

##Method 1 (Linear Execution)

Your first method is nearly working. The reason why it still fails is that you aren't deleting any workers once you finish with them. This means the same result (crashing) will happen, just slower. All you need to fix it is to add worker.terminate(); before creating a new worker to remove the old one from memory. Note that this will cause the application to run much slower as each worker must be created, run, and be destroyed before the next can run.

Linear.html

<!DOCTYPE html>
<html>
<head>
	<title>Linear</title>
</head>
<body>
	<pre id="debug"></pre>
	<script type="text/javascript">
		var debug = document.getElementById('debug');
		var totalWorkers = 250;
		var index = 0;
		var start = (new Date).getTime();
		
		function createWorker() {
			var worker = new Worker('genericWorker.js');
			worker.onmessage = function(event) {
				debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
				worker.terminate();
				if (index < totalWorkers) createWorker(index);
				else alert((new Date).getTime() - start);
			};
			worker.postMessage(index++); // start the worker.
		}
		
		createWorker();
	</script>
</body>
<html>

##Method 2 (Thread Pool)

Using a thread pool should greatly increase running speed. Instead of using some library with complex lingo, lets simplify it. All the thread pool means is having a set number of workers running simultaneously. We can actually just modify a few lines of code from the linear example to get a multi-threaded example. The code below will find how many cores you have (if your browser supports this), or default to 4. I found that this code ran about 6x faster than the original on my machine with 8 cores.

ThreadPool.html

<!DOCTYPE html>
<html>
<head>
	<title>Thread Pool</title>
</head>
<body>
	<pre id="debug"></pre>
	<script type="text/javascript">
		var debug = document.getElementById('debug');
		var maxWorkers = navigator.hardwareConcurrency || 4;
		var totalWorkers = 250;
		var index = 0;
		var start = (new Date).getTime();
		
		function createWorker() {
			var worker = new Worker('genericWorker.js');
			worker.onmessage = function(event) {
				debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
				worker.terminate();
				if (index < totalWorkers) createWorker();
				else if(--maxWorkers === 0) alert((new Date).getTime() - start);
			};
			worker.postMessage(index++); // start the worker.
		}
		
		for(var i = 0; i < maxWorkers; i++) createWorker();
	</script>
</body>
<html>

#Other Methods

##Method 3 (Single worker, repeated task)

In your example, you're using the same worker over and over again. I know you're simplifying a probably more complex use case, but some people viewing will see this and apply this method when they could be using just one worker for all the tasks.

Essentially, we'll instantiate a worker, send data, wait for data, then repeat the send/wait steps until all data has been processed.

On my computer, this runs at about twice the speed of the thread pool. That actually surprised me. I thought the overhead from the thread pool would have caused it to be slower than just 1/2 the speed.

RepeatedWorker.html

<!DOCTYPE html>
<html>
<head>
	<title>Repeated Worker</title>
</head>
<body>
	<pre id="debug"></pre>
	<script type="text/javascript">
		var debug = document.getElementById('debug');
		var totalWorkers = 250;
		var index = 0;
		var start = (new Date).getTime();
		var worker = new Worker('genericWorker.js');
		
		function runWorker() {
			worker.onmessage = function(event) {
				debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
				if (index < totalWorkers) runWorker();
				else {
					alert((new Date).getTime() - start);
					worker.terminate();
				}
			};
			worker.postMessage(index++); // start the worker.
		}
		
		runWorker();
	</script>
</body>
<html>

##Method 4 (Repeated Worker w/ Thread Pool)

Now, what if we combine the previous method with the thread pool method? Theoretically, it should run quicker than the previous. Interestingly, it runs at just about the same speed as the previous on my machine.

Maybe it's the extra overhead of sending the worker reference on each time it's called. Maybe it's the extra workers being terminated during execution (only one worker won't be terminated before we get the time). Who knows. Finding this out is a job for another time.

RepeatedThreadPool.html

<!DOCTYPE html>
<html>
<head>
	<title>Repeated Thread Pool</title>
</head>
<body>
	<pre id="debug"></pre>
	<script type="text/javascript">
		var debug = document.getElementById('debug');
		var maxWorkers = navigator.hardwareConcurrency || 4;
		var totalWorkers = 250;
		var index = 0;
		var start = (new Date).getTime();
		
		function runWorker(worker) {
			worker.onmessage = function(event) {
				debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
				if (index < totalWorkers) runWorker(worker);
				else {
					if(--maxWorkers === 0) alert((new Date).getTime() - start);
					worker.terminate();
				}
			};
			worker.postMessage(index++); // start the worker.
		}
		
		for(var i = 0; i < maxWorkers; i++) runWorker(new Worker('genericWorker.js'));
	</script>
</body>
<html>

#Now for some real world shtuff

Remember how I said I was using workers to implement 3rd party plugins into my code? These plugins have a state to keep track of. I could start the plugins and hope they don't load too many for the application to crash, or I could keep track of the plugin state within my main thread and send that state back to the plugin if the plugin needs to be reloaded. I like the second one better.

I had written out several more examples of stateful, stateless, and state-restore workers, but I'll spare you the agony and just do some brief explaining and some shorter snippets.

First-off, a simple stateful worker looks like this:

StatefulWorker.js

var i = 0;

self.onmessage = function(e) {
	switch(e.data) {
		case 'increment':
			self.postMessage(++i);
			break;
		case 'decrement':
			self.postMessage(--i);
			break;
	}
};

It does some action based on the message it receives and holds data internally. This is great. It allows for mah plugin devs to have full control over their plugins. The main app instantiates their plugin once, then will send messages for them to do some action.

The problem comes in when we want to load several plugins at once. We can't do that, so what can we do?

Let's think about a few solutions.

##Solution 1 (Stateless)

Let's make these plugins stateless. Essentially, every time we want to have the plugin do something, our application should instantiate the plugin then send it data based on its old state.

data sent

{
	action: 'increment',
	value: 7
}

StatelessWorker.js

self.onmessage = function(e) {
	switch(e.data.action) {
		case 'increment':
			e.data.value++;
			break;
		case 'decrement':
			e.data.value--;
			break;
	}
	self.postMessage({
		value: e.data.value,
		i: e.data.i
	});
};

This could work, but if we're dealing with a good amount of data this will start to seem like a less-than-perfect solution. Another similar solution could be to have several smaller workers for each plugin and sending only a small amount of data to and from each, but I'm uneasy with that too.

##Solution 2 (State Restore)

What if we try to keep the worker in memory as long as possible, but if we do lose it, we can restore its state? We can use some sort of scheduler to see what plugins the user has been using (and maybe some fancy algorithms to guess what the user will use in the future) and keep those in memory.

The cool part about this is that we aren't looking at one worker per core anymore. Since most of the time the worker is active will be idle, we just need to worry about the memory it takes up. For a good number of workers (10 to 20 or so), this won't be substantial at all. We can keep the primary plugins loaded while the ones not used as often get switched out as needed. All the plugins will still need some sort of state restore.

Let's use the following worker and assume we either send 'increment', 'decrement', or an integer containing the state it's supposed to be at.

StateRestoreWorker.js

var i = 0;

self.onmessage = function(e) {
	switch(e.data) {
		case 'increment':
			self.postMessage(++i);
			break;
		case 'decrement':
			self.postMessage(--i);
			break;
		default:
			i = e.data;
	}
};

These are all pretty simple examples, but I hope I helped understand methods of using multiple workers efficiently! I'll most likely be writing a scheduler and optimizer for this stuff, but who knows when I'll get to that point.

Good luck, and happy coding!

Solution 2 - Javascript

My experience is that too many workers (> 100) decrease the performance. In my case FF became very slow and Chrome even crashed. I compared variants with different amounts of workers (1, 2, 4, 8, 16, 32). The worker performed an encryption of a string. It turned out that 8 was the optimal amount of workers, but that may differ, depending on the problem the worker has to solve.

I built up a small framework to abstract from the amount of workers. Calls to the workers are created as tasks. If the maximum allowed number of workers is busy, a new task is queued and executed later.

It turned out that it's very important to recycle the workers in such an approach. You should hold them in a pool when they are idle, but don't call new Worker(...) too often. Even if the workers are terminated by worker.terminate() it seems that there is a big difference in the performance between creating/terminating and recycling of workers.

Solution 3 - Javascript

Old question, but comes up on a search, so... There Is a configurable limit in Firefox. If you look in about:config (put as address in FF's address bar), and search for 'worker', you will see several settings, including this one:

dom.workers.maxPerDomain

Set at 20 by default. Double-click the line and change the setting. You will need to restart the browser.

Solution 4 - Javascript

The way you're chaining your Workers in the solution #1 impeach the garbage collector to terminate Worker instances because you still have a reference to them in the scope of your onmessage callback function.

Give a try with this code:

<script type="text/javascript">
var worker;
$(document).ready(function(){
    createWorker(0);
});
function createWorker(i) {
   worker = new Worker('test.worker.js');
   worker.onmessage = handleMessage;
   worker.postMessage(i); // start the worker.
}
function handleMessage(event) {
       var index = event.data;
       $("#debug").append('worker.onmessage i = ' + index + "<br>");

        if ( index < 25) {
            index++;
            createWorker(index);
        } 
    };
</script>
</head>
<body>
<div id="debug"></div>

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBillView Question on Stackoverflow
Solution 1 - JavascriptEvan KennedyView Answer on Stackoverflow
Solution 2 - JavascriptBjörn WeinbrennerView Answer on Stackoverflow
Solution 3 - JavascriptN Kearns MillsView Answer on Stackoverflow
Solution 4 - JavascriptnfroidureView Answer on Stackoverflow