Vertically or horizontally stack (mosaic) several videos using ffmpeg?

FfmpegVideo Processing

Ffmpeg Problem Overview


I have two videos of the same exact length, and I would like to use ffmpeg to stack them into one video file.

How can I do this?

Ffmpeg Solutions


Solution 1 - Ffmpeg

Use the vstack (vertical), hstack (horizontal), or xstack (custom layout) filters. It is easier and faster than other methods.

Combine/stack two videos or images

Vertical

Using the vstack filter.

enter image description here

ffmpeg -i input0 -i input1 -filter_complex vstack=inputs=2 output

Videos must have the same width.

Horizontal

Using the hstack filter.

enter image description here

ffmpeg -i input0 -i input1 -filter_complex hstack=inputs=2 output

Videos must have the same height.

With a border

Using the pad filter. This examples creates a 5px black border between the two sides.

enter image description here

ffmpeg -i input0 -i input1 -filter_complex "[0]pad=iw+5:color=black[left];[left][1]hstack=inputs=2" output

With audio

Downmix and use original channel placements

enter image description here

Add the amerge filter to combine the audio channels from both inputs:

ffmpeg -i input0 -i input1 -filter_complex "[0:v][1:v]vstack=inputs=2[v];[0:a][1:a]amerge=inputs=2[a]" -map "[v]" -map "[a]" -ac 2 output
  • This assumes each input contains a stereo audio stream.

  • -ac 2 is included to downmix to stereo in case both inputs contain multi-channel audio. For example, if both inputs are stereo, you would get a 4-channel output audio stream instead of stereo if you omit -ac 2.

Put all audio from each input into separate channels

enter image description here

Use amerge (or amix) and pan filters:

ffmpeg -i input0 -i input1 -filter_complex "[0:v][1:v]vstack=inputs=2[v];[0:a][1:a]amerge=inputs=2,pan=stereo|c0<c0+c1|c1<c2+c3[a]" -map "[v]" -map "[a]"  output
  • This assumes each input contains a stereo audio stream.

Using audio from one particular input

This example will use the audio from input1:

ffmpeg -i input0 -i input1 -filter_complex "[0:v][1:v]vstack=inputs=2[v]" -map "[v]" -map 1:a output

Adding silent audio / If one input does not have audio

If you mix inputs that have audio and inputs that do not have audio then amerge will fail because each input needs audio. You can add silent audio with the anullsrc filter to prevent this:

ffmpeg -i input0 -i input1 -filter_complex "[0:v][1:v]vstack=inputs=2[v];anullsrc[silent];[0:a][silent]amerge=inputs=2[a]" -map "[v]" -map "[a]" -ac 2 output.mp4

3 videos or images

enter image description here

ffmpeg -i input0 -i input1 -i input2 -filter_complex "[0:v][1:v][2:v]hstack=inputs=3[v]" -map "[v]" output

If you want vertical use vstack instead of hstack.


2x2 grid

enter image description here

Using xstack

ffmpeg -i input0 -i input1 -i input2 -i input3 -filter_complex "[0:v][1:v][2:v][3:v]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0[v]" -map "[v]" output

Using hstack and vstack

ffmpeg -i input0 -i input1 -i input2 -i input3 -filter_complex "[0:v][1:v]hstack=inputs=2[top];[2:v][3:v]hstack=inputs=2[bottom];[top][bottom]vstack=inputs=2[v]" -map "[v]" output

This syntax is easier to understand, but less efficient than using xstack as shown above.


2x2 grid with text

enter image description here

Using the drawtext filter:

ffmpeg -i input0 -i input1 -i input2 -i input3 -filter_complex
"[0]drawtext=text='vid0':fontsize=20:x=(w-text_w)/2:y=(h-text_h)/2[v0];
 [1]drawtext=text='vid1':fontsize=20:x=(w-text_w)/2:y=(h-text_h)/2[v1];
 [2]drawtext=text='vid2':fontsize=20:x=(w-text_w)/2:y=(h-text_h)/2[v2];
 [3]drawtext=text='vid3':fontsize=20:x=(w-text_w)/2:y=(h-text_h)/2[v3];
 [v0][v1][v2][v3]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0[v]"
-map "[v]" output

4x4

Use the xstack filter. Example for a total of 16 videos:

ffmpeg -i input0 -i input1 -i input2 -i input3 -i input4 -i input5 -i input6 -i input7 -i input8 -i input9 -i input10 -i input11 -i input12 -i input13 -i input14 -i input15 -i input16 -filter_complex "[0:v][1:v][2:v][3:v][4:v][5:v][6:v][7:v][8:v][9:v][10:v][11:v][12:v][13:v][14:v][15:v]xstack=inputs=16:layout=0_0|w0_0|w0+w1_0|w0+w1+w2_0|0_h0|w4_h0|w4+w5_h0|w4+w5+w6_h0|0_h0+h4|w8_h0+h4|w8+w9_h0+h4|w8+w9+w10_h0+h4|0_h0+h4+h8|w12_h0+h4+h8|w12+w13_h0+h4+h8|w12+w13+w14_h0+h4+h8" output.mp4

If you need to scale the inputs first:

ffmpeg -i input0 -i input1 -i input2 -i input3 -i input4 -i input5 -i input6 -i input7 -i input8 -i input9 -i input10 -i input11 -i input12 -i input13 -i input14 -i input15 -i input16 -filter_complex "[0:v]scale=iw/4:-1[v0];[1:v]scale=iw/4:-1[v1];[2:v]scale=iw/4:-1[v2];[3:v]scale=iw/4:-1[v3];[4:v]scale=iw/4:-1[v4];[5:v]scale=iw/4:-1[v5];[6:v]scale=iw/4:-1[v6];[7:v]scale=iw/4:-1[v7];[8:v]scale=iw/4:-1[v8];[9:v]scale=iw/4:-1[v9];[10:v]scale=iw/4:-1[v10];[11:v]scale=iw/4:-1[v11];[12:v]scale=iw/4:-1[v12];[13:v]scale=iw/4:-1[v13];[14:v]scale=iw/4:-1[v14];[15:v]scale=iw/4:-1[v15];[v0][v1][v2][v3][v4][v5][v6][v7][v8][v9][v10][v11][v12][v13][v14][v15]xstack=inputs=16:layout=0_0|w0_0|w0+w1_0|w0+w1+w2_0|0_h0|w4_h0|w4+w5_h0|w4+w5+w6_h0|0_h0+h4|w8_h0+h4|w8+w9_h0+h4|w8+w9+w10_h0+h4|0_h0+h4+h8|w12_h0+h4+h8|w12+w13_h0+h4+h8|w12+w13+w14_h0+h4+h8" output.mp4

Resize/scale an input

Since both videos need to have the same with for vstack, and the same height for hstack, you may need to scale one of the other videos to match the other:

Simple scale filter example to set width of input0 to 640 and automatically set height while preserving the aspect ratio:

ffmpeg -i input0 -i input2 -filter_complex "[0:v]scale=640:-1[v0];[v0][1:v]vstack=inputs=2" output

Delaying/pausing videos

This example will play the top left video while pausing the others. Once the top left video ends the top right video will play and so on.

Use the tpad, adelay, xstack, and amix filters:

ffmpeg -i top-left.mp4 -i top-right.mp4 -i bottom-left.mp4 -i bottom-right.mp4 -filter_complex "[1]tpad=start_mode=clone:start_duration=5[tr];[2]tpad=start_mode=clone:start_duration=10[bl];[3]tpad=start_mode=clone:start_duration=15[br];[0][tr][bl][br]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0[v];[1:a]adelay=5s:all=true[a1];[2:a]adelay=10s:all=true[a2];[3:a]adelay=15s:all=true[a3];[0:a][a1][a2][a3]amix=inputs=4[a]" -map "[v]" -map "[a]" output.mp4
  • This example assumes each input is 5 seconds duration. Adjust start_duration and adelay values as needed.

  • This command requires FFmpeg 4.3 or newer.

  • If you don't like the complexity of xstack you can use several hstack/vstack instead as shown in Example 4: 2x2 grid.

Solution 2 - Ffmpeg

See this answer to this question for a newer, simpler way to do this.


Old version:
You should be able to do this using the pad, movie and overlay filters in FFmpeg. The command will look something like this:

ffmpeg -i top.mov -vf 'pad=iw:2*ih [top]; movie=bottom.mov [bottom]; \
  [top][bottom] overlay=0:main_h/2' stacked.mov

First the movie that should be on top is padded to twice its height. Then the bottom movie is loaded. Then the bottom movie is overlaid on the padded top movie at an offset of half the padded movie's height.

Solution 3 - Ffmpeg

For 2 videos:

ffmpeg -i 1.mp4 -i 2.mp4 -filter_complex hstack out.mp4

For more videos(3 in this example):

ffmpeg -i 1.mp4 -i 2.mp4 -i 3.mp4 -filter_complex hstack=3 out.mp4

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJoseph TurianView Question on Stackoverflow
Solution 1 - FfmpeglloganView Answer on Stackoverflow
Solution 2 - FfmpegblahdiblahView Answer on Stackoverflow
Solution 3 - FfmpegmrgloomView Answer on Stackoverflow