Web Audio API for live streaming?

JavascriptHtmlHtml5 AudioAudio StreamingWeb Audio-Api

Javascript Problem Overview


We need to streaming live audio (from a medical device) to web browsers with no more than 3-5s of end-to-end delay (assume 200mS or less network latency). Today we use a browser plugin (NPAPI) for decoding, filtering (high, low, band), and playback of the audio stream (delivered via Web Sockets).

We want to replace the plugin.

I was looking at various Web Audio API demos and the most of our required functionality (playback, gain control, filtering) appears to be available in Web Audio API. However, it is not clear to me if Web Audio API can be used for streamed sources as most of the Web Audio API makes use of short sounds and/or audio clips.

Can Web Audio API be used to play live streamed audio?

Update (11-Feb-2015):

After a bit more research and local prototyping, I am not sure live audio streaming with Web Audio API is possible. As Web Audio API's decodeAudioData isn't really designed to handle random chunks of audio data (in our case delivered via WebSockets). It appears to need the whole 'file' in order to process it correctly.

See stackoverflow:

Now it is possible with createMediaElementSource to connect an <audio> element to Web Audio API, but it has been my experience that the <audio> element induces a huge amount of end-to-end delay (15-30s) and there doesn't appear to be any means to reduce the delay to below 3-5 seconds.

I think the only solution is to use WebRTC with Web Audio API. I was hoping to avoid WebRTC as it will require significant changes to our server-side implementation.

Update (12-Feb-2015) Part I:

I haven't completely eliminated the <audio> tag (need to finish my prototype). Once I have ruled it out, I suspect the createScriptProcessor (deprecated but still supported) will be a good choice for our environment as I could 'stream' (via WebSockets) our ADPCM data to the browser and then (in JavaScript) convert it to PCM. Similar to what to Scott's library (see below) does using the createScriptProcessor. This method doesn't require the data to be in properly sized 'chunks' and critical timing as the decodeAudioData approach.

Update (12-Feb-2015) Part II:

After more testing, I eliminated the <audio> to Web Audio API interface because, depending on source type, compression and browser, the end-to-end delay can be 3-30s. That leaves the createScriptProcessor method (See Scott's post below) or WebRTC. After talking discussing with our decision makers, it has been decided we will take the WebRTC approach. I assume it will work. But it will require changes to our server side code.

I'm going to mark the first answer, just so the 'question' is closed.

Thanks for listening. Feel free to add comments as needed.

Javascript Solutions


Solution 1 - Javascript

Yes, the Web Audio API (along with AJAX or Websockets) can be used for streaming.

Basically, you pull down (or send, in the case of Websockets) some chunks of n length. Then you decode them with the Web Audio API and queue them up to be played, one after the other.

Because the Web Audio API has high-precision timing, you won't hear any "seams" between the playback of each buffer if you do the scheduling correctly.

Solution 2 - Javascript

I wrote a streaming Web Audio API system where I used web workers to do all the web socket management to communicate with node.js such that the browser thread simply renders audio ... works just fine on laptops, since mobiles are behind on their implementation of web sockets inside web workers you need no less than lollipop for it to run as coded ... I posted full source code here

Solution 3 - Javascript

To elaborate on the comments on how to play a bunch of separate buffers stored in an array by shifting the latest one out everytime:

If you create a buffer through createBufferSource() then it has an onended event to which you can attach a callback, which will fire when the buffer has reached its end. You can do something like this to play the various chunks in the array one after the other:

function play() {
  //end of stream has been reached
  if (audiobuffer.length === 0) { return; }
  let source = context.createBufferSource();

  //get the latest buffer that should play next
  source.buffer = audiobuffer.shift();
  source.connect(context.destination);

  //add this function as a callback to play next buffer
  //when current buffer has reached its end 
  source.onended = play;
  source.start();
}

Hope that helps. I'm still experimenting on how to get this all smooth and ironed out, but this is a good start and missing in a lot of the online posts.

Solution 4 - Javascript

You have to create a new AudioBuffer and AudioBufferSourceNode both (or at least the latter) for every piece of data that you want to buffer... I tried looping the same AudioBuffer, but once you set .audioBuffer on the AudioContext, any modifications you make to the AudioBuffer become irrelevant.

(NOTE: These classes have base/parent classes you should look at as well (referenced in the docs).)


Here's my preliminary solution that I got working (forgive me for not feeling like commenting everything, after spending hours just getting this working), and it works beautifully:

class MasterOutput {
  constructor(computeSamplesCallback) {
    this.computeSamplesCallback = computeSamplesCallback.bind(this);
    this.onComputeTimeoutBound = this.onComputeTimeout.bind(this);

    this.audioContext = new AudioContext();
    this.sampleRate = this.audioContext.sampleRate;
    this.channelCount = 2;

    this.totalBufferDuration = 5;
    this.computeDuration = 1;
    this.bufferDelayDuration = 0.1;

    this.totalSamplesCount = this.totalBufferDuration * this.sampleRate;
    this.computeDurationMS = this.computeDuration * 1000.0;
    this.computeSamplesCount = this.computeDuration * this.sampleRate;
    this.buffersToKeep = Math.ceil((this.totalBufferDuration + 2.0 * this.bufferDelayDuration) /
      this.computeDuration);

    this.audioBufferSources = [];
    this.computeSamplesTimeout = null;
  }

  startPlaying() {
    if (this.audioBufferSources.length > 0) {
      this.stopPlaying();
    }

    //Start computing indefinitely, from the beginning.
    let audioContextTimestamp = this.audioContext.getOutputTimestamp();
    this.audioContextStartOffset = audioContextTimestamp.contextTime;
    this.lastTimeoutTime = audioContextTimestamp.performanceTime;
    for (this.currentBufferTime = 0.0; this.currentBufferTime < this.totalBufferDuration;
      this.currentBufferTime += this.computeDuration) {
      this.bufferNext();
    }
    this.onComputeTimeoutBound();
  }

  onComputeTimeout() {
    this.bufferNext();
    this.currentBufferTime += this.computeDuration;

    //Readjust the next timeout to have a consistent interval, regardless of computation time.
    let nextTimeoutDuration = 2.0 * this.computeDurationMS - (performance.now() - this.lastTimeoutTime) - 1;
    this.lastTimeoutTime = performance.now();
    this.computeSamplesTimeout = setTimeout(this.onComputeTimeoutBound, nextTimeoutDuration);
  }

  bufferNext() {
    this.currentSamplesOffset = this.currentBufferTime * this.sampleRate;

    //Create an audio buffer, which will contain the audio data.
    this.audioBuffer = this.audioContext.createBuffer(this.channelCount, this.computeSamplesCount,
      this.sampleRate);

    //Get the audio channels, which are float arrays representing each individual channel for the buffer.
    this.channels = [];
    for (let channelIndex = 0; channelIndex < this.channelCount; ++channelIndex) {
      this.channels.push(this.audioBuffer.getChannelData(channelIndex));
    }

    //Compute the samples.
    this.computeSamplesCallback();

    //Creates a lightweight audio buffer source which can be used to play the audio data. Note: This can only be
    //started once...
    let audioBufferSource = this.audioContext.createBufferSource();
    //Set the audio buffer.
    audioBufferSource.buffer = this.audioBuffer;
    //Connect it to the output.
    audioBufferSource.connect(this.audioContext.destination);
    //Start playing when the audio buffer is due.
    audioBufferSource.start(this.audioContextStartOffset + this.currentBufferTime + this.bufferDelayDuration);
    while (this.audioBufferSources.length >= this.buffersToKeep) {
      this.audioBufferSources.shift();
    }
    this.audioBufferSources.push(audioBufferSource);
  }

  stopPlaying() {
    if (this.audioBufferSources.length > 0) {
      for (let audioBufferSource of this.audioBufferSources) {
        audioBufferSource.stop();
      }
      this.audioBufferSources = [];
      clearInterval(this.computeSamplesTimeout);
      this.computeSamplesTimeout = null;
    }
  }
}

window.onload = function() {
  let masterOutput = new MasterOutput(function() {
    //Populate the audio buffer with audio data.
    let currentSeconds;
    let frequency = 220.0;
    for (let sampleIndex = 0; sampleIndex <= this.computeSamplesCount; ++sampleIndex) {
      currentSeconds = (sampleIndex + this.currentSamplesOffset) / this.sampleRate;

      //For a sine wave.
      this.channels[0][sampleIndex] = 0.005 * Math.sin(currentSeconds * 2.0 * Math.PI * frequency);

      //Copy the right channel from the left channel.
      this.channels[1][sampleIndex] = this.channels[0][sampleIndex];
    }
  });
  masterOutput.startPlaying();
};

Some details:

  • You can create multiple MasterOutput's and play multiple simultaneous things this way; though, you may possibly want to extract the AudioContext out of there and just share 1 amongst all your code.
  • This code sets up 2 channels (L + R) with the default sample rate from the AudioContext (48000 for me).
  • This code buffers a total of 5 seconds in advance, computing 1 second of audio data at a time, and delaying the playing and stopping of audio both by 0.1 seconds. It also keeps track of all of the audio buffer sources in case it needs to stop them if the output is to be paused; these are put into a list, and when they should be expired (that is, they no longer need to be stop()ped), they're shift()ed out of the list.
  • Note how I use audioContextTimestamp, that's important. The contextTime property lets me know when exactly the audio was started (each time), and then I can use that time (this.audioContextStartOffset) later on when audioBufferSource.start() is called, in order to time every audio buffer to the exact right time it should be played.

Edit: Yep, I was right (in the comments)! You can reuse the expired AudioBuffers if wanted. This is in many cases going to be the more "proper" way to do things.

Here are the parts of the code that would have to change for that:

...
        this.audioBufferDatas = [];
        this.expiredAudioBuffers = [];
...
    }

    startPlaying() {
        if (this.audioBufferDatas.length > 0) {

...

    bufferNext() {
...
        //Create/Reuse an audio buffer, which will contain the audio data.
        if (this.expiredAudioBuffers.length > 0) {
            //console.log('Reuse');
            this.audioBuffer = this.expiredAudioBuffers.shift();
        } else {
            //console.log('Create');
            this.audioBuffer = this.audioContext.createBuffer(this.channelCount, this.computeSamplesCount,
                this.sampleRate);
        }

...

        while (this.audioBufferDatas.length >= this.buffersToKeep) {
            this.expiredAudioBuffers.push(this.audioBufferDatas.shift().buffer);
        }
        this.audioBufferDatas.push({
            source: audioBufferSource,
            buffer: this.audioBuffer
        });
    }

    stopPlaying() {
        if (this.audioBufferDatas.length > 0) {
            for (let audioBufferData of this.audioBufferDatas) {
                audioBufferData.source.stop();
                this.expiredAudioBuffers.push(audioBufferData.buffer);
            }
            this.audioBufferDatas = [];
...

Here was my starting code, if you want something simpler, and you don't need live audio streaming:

window.onload = function() {
  const audioContext = new AudioContext();
  const channelCount = 2;
  const bufferDurationS = 5;

  //Create an audio buffer, which will contain the audio data.
  let audioBuffer = audioContext.createBuffer(channelCount, bufferDurationS * audioContext.sampleRate,
    audioContext.sampleRate);

  //Get the audio channels, which are float arrays representing each individual channel for the buffer.
  let channels = [];
  for (let channelIndex = 0; channelIndex < channelCount; ++channelIndex) {
    channels.push(audioBuffer.getChannelData(channelIndex));
  }

  //Populate the audio buffer with audio data.
  for (let sampleIndex = 0; sampleIndex < audioBuffer.length; ++sampleIndex) {
    channels[0][sampleIndex] = Math.sin(sampleIndex * 0.01);
    channels[1][sampleIndex] = channels[0][sampleIndex];
  }

  //Creates a lightweight audio buffer source which can be used to play the audio data.
  let audioBufferSource = audioContext.createBufferSource();
  audioBufferSource.buffer = audioBuffer;
  audioBufferSource.connect(audioContext.destination);
  audioBufferSource.start();
};

Unfortunately this ^ particular code is no good for live audio, because it only uses 1 AudioBuffer and AudioBufferSourceNode, and like I said, turning looping on doesn't let you modify it... But, if all you want to do is play a sine wave for 5 seconds and then stop (or loop it (set to true and done)), this will do just fine.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTonyView Question on Stackoverflow
Solution 1 - JavascriptKevin EnnisView Answer on Stackoverflow
Solution 2 - JavascriptScott StenslandView Answer on Stackoverflow
Solution 3 - JavascriptJan SwartView Answer on Stackoverflow
Solution 4 - JavascriptAndrewView Answer on Stackoverflow