How is TeamViewer so fast?

Performance Network Programming Operating System Udp Remote Desktop

Performance Problem Overview

Sorry about the length, it's kinda necessary.

Introduction

I'm developing a remote desktop software (just for fun) in C# 4.0 for Windows Vista/7. I've gotten through basic obstacles: I have a robust UDP messaging system, relatively clean program design, I've got a mirror driver (the free DFMirage mirror driver from DemoForge) up and running, and I've implemented NAT traversal for all NAT types except Symmetric NATs (present in corporate firewall situations).

Regarding screen transfer/sharing, thanks to the mirror driver, I'm automatically notified of changed screen regions and I can simply marshal the mirror driver's ever-changing screen bitmap to my own bitmap. Then I compress the screen region as a PNG and send it off from the server to my client. Things are looking pretty good, but it's not fast enough. It's just as slow as VNC (btw, I don't use the VNC protocol, just a custom amateur protocol).

From the slowest remote desktop software to the fastest, the list usually begins at all VNC-like implementations, then climbs up to Microsoft Windows Remote Desktop...and then...TeamViewer. Not quite sure about CrossLoop, LogMeIn - I haven't used them, but TeamViewer is insanely fast. It's quite literally live. I ran a tree command on Command Prompt and it updated with 20 ms delay. I can browse the web just a few milliseconds slower than on my laptop. Scrolling code vertically in Visual Studio has 50 ms lag time. Think about how robust TeamViewer's screen-transfer solution must be to accomplish all this.

VNCs use poll-based hooks for detecting screen change and brute force screen capturing/comparing at their worst. At their best, they use a mirror driver like DFMirage. I'm at this level. And they use something called the RFB protocol.

Microsoft Windows Remote Desktop apparently goes one step higher than VNC. I heard, from somewhere on StackOverflow, that Windows Remote Desktop doesn't send screen bitmaps, but actual drawing commands. That's quite brilliant, because it can just send simple text (draw this rectangle at this coordinate and color it with this gradient)! Remote Desktop really is pretty fast - and it's the standard way of working from home. And it uses something called the RDP protocol.

Now TeamViewer is a complete mystery to me. Apparently, they released their source code for Version 2 (TeamViewer is Version 7 as of February 2012). People have read it and said that Version 2 is useless - that it's just a few improvements over VNC with automatic NAT traversal.

But Version 7...it's ridiculously fast now. I mean, it's actually faster than Windows Remote Desktop. I've streamed DirectX 3D games with TeamViewer (at 1 fps, but Windows Remote Desktop doesn't even allow DirectX to run).

By the way, TeamViewer does all this without a mirror driver. There is an option to install one, and it gets just a bit faster.

The Question

My question is, how is TeamViewer so fast? It must not be possible. If you've got 1920 by 1080 resolution at even 24 bit depth (16 bit depth would be noticeably ugly), thats still 6,220,800 bytes raw. Even using libjpeg-turbo (one of the fastest JPG compression libraries used by large corporations), compressing it down to 30KB (let's be extremely generous), would take time to route through TeamViewer's servers (TeamViewer bypasses corporate Symmetric NATs by simply proxying traffic through their servers). And that libjpeg-turbo compression would take time to compress. High-quality JPG compression takes 175 milliseconds for a full 1920 by 1080 screenshot for me. And that number goes up if the host's computer runs an Atom processor. I simply don't understand how TeamViewer has optimized their screen transfer so well. Again, small-size images might be highly compressed, but take at least tens of milliseconds to compress. Large-size images take no time to compress, but take a long time to get through. Somehow, TeamViewer completes this entire process to get roughly 20-25 frames per second. I've used a network monitor, and TeamViewer is still lagless at speeds of 500 Kbps and 1 Mbps (VNC software lag for a few seconds at that transfer rate). During my tree Command Prompt test, TeamViewer was receiving inbound data at a rate of 1 Mbps and still running 5-6 fps. VNC and remote desktop don't do that. So, how?

The answers will be somewhat complicated and intricate, so please don't post your $0.02 if you're only going to say it's because they use UDP instead of TCP (would you believe they actually do use TCP just as successfully though).

I'm hoping there's a TeamViewer developer somewhere here on StackOverflow.

Potential Answers

Will update this once people reply.

My thoughts are, first of all, that TeamViewer has very fine network control. For example, they split large packets to just under the MTU size and never waste a trip. They probably have all sorts of fancy hooks to detect screen changes along with extremely fast XOR image comparisons.

Performance Solutions

Solution 1 - Performance

The most fundamental thing here probably is that you don't want to transmit static images but only changes to the images, which essentially is analogous to video stream.

My best guess is some very efficient (and heavily specialized and optimized) motion compensation algorithm, because most of the actual change in generic desktop usage is linear movement of elements (scrolling text, moving windows, etc. opposed to transformation of elements).

The DirectX 3D performance of 1 FPS seems to confirm my guess to some extent.

Solution 2 - Performance

> would take time to route through TeamViewer's servers (TeamViewer bypasses corporate Symmetric NATs by simply proxying traffic through their servers)

You'll find that TeamViewer rarely needs to relay traffic through their own servers. TeamViewer penetrates NAT and networks complicated by NAT using NAT traversal (I think it is UDP hole-punching, like Google's libjingle).

They do use their own servers to middle-man in order to do the handshake and connection set-up, but most of the time the relationship between client and server will be P2P (best case, when the hand-shake is successful). If NAT traversal fails, then TeamViewer will indeed relay traffic through its own servers.

I've only ever seen it do this when a client has been behind double-NAT, though.

Solution 3 - Performance

It sounds indeed like video streaming more than image streaming, as someone suggested. JPEG/PNG compression isn't targeted for these types of speeds, so forget them.

Imagine having a recording codec on your system that can realtime record an incoming video stream (your screen). A bit like Fraps perhaps. Then imagine a video playback codec on the other side (the remote client). As HD recorders can do it (record live and even playback live from the same HD), so should you, in the end. The HD surely can't deliver images quicker than you can read your display, so that isn't the bottleneck. The bottleneck are the video codecs. You'll find the encoder much more of a problem than the decoder, as all decoders are mostly free.

I'm not saying it's simple; I myself have used DirectShow to encode a video file, and it's not realtime by far. But given the right codec I'm convinced it can work.

Solution 4 - Performance

My random guess is: TV uses x264 codec which has a commercial license (otherwise TeamViewer would have to release their source code). At some point (more than 5 years ago), I recall main developer of x264 wrote an article about improvements he made for low delay encoding (if you delay by a few frames encoders can compress better), plus he mentioned some other improvements that were relevant for TeamViewer-like use. In that post he mentioned playing quake over video stream with no noticeable issues. Back then I was kind of sure who was the sponsor of these improvements, as TeamViewer was pretty much the only option at that time. x264 is an open source implementation of H264 video codec, and it's insanely good implementation, it's the best one. At the same time it's extremely well optimized. Most likely due to extremely good implementation of x264 you get much better results with TV at lower CPU load. AnyDesk and Chrome Remote Desk use libvpx, which isn't as good as x264 (optimization and video quality wise).

However, I don't think TeamView can beat microsoft's RDP. To me it's the best, however it works between windows PCs or from Mac to Windows only. TV works even from mobiles.

Update: article was written in January 2010, so that work was done roughly 10 years ago. Also, I made a mistake: he played call of duty, not quake. When you posted your question, if my guess is correct, TeamViewer had been using that work for 3 years. Read that blog post from web archive: x264: the best low-latency video streaming platform in the world. When I read the article back in 2010, I was sure that the "startup–which has requested not to be named" that the author mentions was TeamViewer.

Solution 5 - Performance

Oddly. but in my experience TeamViewer is not faster/more responsive than VNC, only easier to setup. I have a couple of win-boxen that I VNC over OpenVPN into (so there is another overhead layer) and that's on cheap Cable (512 up) and I find properly setup TightVNC to be much more responsive than TeamViewer to same boxen. RDP (naturally) even more so since by large part it sends GUI draw commands instead of bitmap tiles.

Which brings us to:

Why are you not using VNC? There are plethora of open source solutions, and Tight is probably on top of it's game right now.
Advanced VNC implementations use lossy compression and that seems to achieve better results than your choice of PNG. Also, IIRC the rest of the payload is also squashed using zlib. Bothj Tight and UltraVNC have very optimized algos, especially for windows. On top of that Tight is open-source.
If win boxen are your primary target RDP may be a better option, and has an opensource implementation (rdesktop)
If *nix boxen are your primary target NX may be a better option and has an open source implementation (FreeNX, albeit not as optimised as NoMachine's proprietary product).

If compressing JPEG is a performance issue for your algo, I'm pretty sure that image comparison would still take away some performance. I'd bet they use best-case compression for every specific situation ie lossy for large frames, some quick and dirty internall losless for smaller ones, compare bits of images and send only diffs of sort and bunch of other optimisation tricks.

And a lot of those tricks must be present in Tight > 2.0 since again, in my experience it beats the hell out of TeamViewer performance wyse, YMMV.

Also the choice of a JIT compiled runtime over something like C++ might take a slice from your performance edge, especially in memory constrained machines (a lot of performance tuning goes to the toilet when windows start using the pagefile intensively). And you will need memory to keep previous image states for internal comparison atop of what DF mirage gives you.

Content Type	Original Author	Original Content on Stackoverflow
Question	Jason	View Question on Stackoverflow
Solution 1 - Performance	Kimvais	View Answer on Stackoverflow
Solution 2 - Performance	Jamie Edwards	View Answer on Stackoverflow
Solution 3 - Performance	Ruud van Gaal	View Answer on Stackoverflow
Solution 4 - Performance	Pavel P	View Answer on Stackoverflow
Solution 5 - Performance	Bojan Markovic	View Answer on Stackoverflow