Meaning of Leaky Abstraction?

Programming LanguagesFunctional ProgrammingMemory LeaksAbstractionLeaky Abstraction

Programming Languages Problem Overview


What does the term "Leaky Abstraction" mean? (Please explain with examples. I often have a hard time grokking a mere theory.)

Programming Languages Solutions


Solution 1 - Programming Languages

Here's a meatspace example:

Automobiles have abstractions for drivers. In its purest form, there's a steering wheel, accelerator and brake. This abstraction hides a lot of detail about what's under the hood: engine, cams, timing belt, spark plugs, radiator, etc.

The neat thing about this abstraction is that we can replace parts of the implementation with improved parts without retraining the user. Let's say we replace the distributor cap with electronic ignition, and we replace the fixed cam with a variable cam. These changes improve performance but the user still steers with the wheel and uses the pedals to start and stop.

It's actually quite remarkable... a 16 year old or an 80 year old can operate this complicated piece of machinery without really knowing much about how it works inside!

But there are leaks. The transmission is a small leak. In an automatic transmission you can feel the car lose power for a moment as it switches gears, whereas in CVT you feel smooth torque all the way up.

There are bigger leaks, too. If you rev the engine too fast, you may do damage to it. If the engine block is too cold, the car may not start or it may have poor performance. And if you crank the radio, headlights, and AC all at the same time, you'll see your gas mileage go down.

Solution 2 - Programming Languages

It simply means that your abstraction exposes some of the implementation details, or that you need to be aware of the implementation details when using the abstraction. The term is attributed to Joel Spolsky, circa 2002. See the wikipedia article for more information.

A classic example are network libraries that allow you to treat remote files as local. The developer using this abstraction must be aware that network problems may cause this to fail in ways that local files do not. You then need to develop code to handle specifically errors outside the abstraction that the network library provides.

Solution 3 - Programming Languages

Wikipedia has a pretty good definition for this

>A leaky abstraction refers to any implemented abstraction, intended to reduce (or hide) complexity, where the underlying details are not completely hidden

Or in other words for software it's when you can observe implementation details of a feature via limitations or side effects in the program.

A quick example would be C# / VB.Net closures and their inability to capture ref / out parameters. The reason they cannot be captured is due to an implementation detail of how the lifting process occurs. This is not to say though that there is a better way of doing this.

Solution 4 - Programming Languages

Here's an example familiar to .NET developers: ASP.NET's Page class attempts to hide the details of HTTP operations, particularly the management of form data, so that developers don't have to deal with posted values (because it automatically maps form values to server controls).

But if you wander beyond the most basic usage scenarios the Page abstraction begins to leak and it becomes hard to work with pages unless you understand the class' implementation details.

One common example is dynamically adding controls to a page - the value of dynamically-added controls won't be mapped for you unless you add them at just the right time: before the underlying engine maps the incoming form values to the appropriate controls. When you have to learn that, the abstraction has leaked.

Solution 5 - Programming Languages

Well, in a way it is a purely theoretical thing, though not unimportant.

We use abstractions to make things easier to comprehend. I may operate on a string class in some language to hide the fact that I'm dealing with an ordered set of characters that are individual items. I deal with an ordered set of characters to hide the fact that I'm dealing with numbers. I deal with numbers to hide the fact that I'm dealing with 1s and 0s.

A leaky abstraction is one that doesn't hide the details its meant to hide. If call string.Length on a 5-character string in Java or .NET I could get any answer from 5 to 10, because of implementation details where what those languages call characters are really UTF-16 data-points which can represent either 1 or .5 of a character. The abstraction has leaked. Not leaking it though means that finding the length would either require more storage space (to store the real length) or change from being O(1) to O(n) (to work out what the real length is). If I care about the real answer (often you don't really) you need to work on the knowledge of what is really going on.

More debatable cases happen with cases like where a method or property lets you get in at the inner workings, whether they are abstraction leaks, or well-defined ways to move to a lower level of abstraction, can sometimes be a matter people disagree on.

Solution 6 - Programming Languages

I'll continue in the vein of giving examples by using RPC.

In the ideal world of RPC, a remote procedure call should look like a local procedure call (or so the story goes). It should be completely transparent to the programmer such that when they call SomeObject.someFunction() they have no idea if SomeObject (or just someFunction for that matter) are locally stored and executed or remotely stored and executed. The theory goes that this makes programming simpler.

The reality is different because there's a HUGE difference between making a local function call (even if you're using the world's slowest interpreted language) and:

  • calling through a proxy object
  • serializing your parameters
  • making a network connection (if not already established)
  • transmitting the data to the remote proxy
  • having the remote proxy restore the data and call the remote function on your behalf
  • serializing the return value(s)
  • transmitting the return values to the local proxy
  • reassembling the serialized data
  • returning the response from the remote function

In time alone that's about three orders (or more!) of magnitude difference. Those three+ orders of magnitude are going to make a huge difference in performance that will make your abstraction of a procedure call leak rather obviously the first time you mistakenly treat an RPC as a real function call. Further a real function call, barring serious problems in your code, will have very few failure points outside of implementation bugs. An RPC call has all of the following possible problems that will get slathered on as failure cases over and above what you'd expect from a regular local call:

  • you might not be able to instantiate your local proxy
  • you might not be able to instantiate your remote proxy
  • the proxies may not be able to connect
  • the parameters you send may not make it intact or at all
  • the return value the remote sends may not make it intact or at all

So now your RPC call which is "just like a local function call" has a whole buttload of extra failure conditions you don't have to contend with when doing local function calls. The abstraction has leaked again, even harder.

In the end RPC is a bad abstraction because it leaks like a sieve at every level -- when successful and when failing both.

Solution 7 - Programming Languages

What is abstraction?

Abstraction is a way of simplifying the world. It means you don't have to worry about what is actually happening under the hood, or behind the curtain. It means something is idiot proof.

Example of Abstraction: The Complexities of Flying a 737/747 are "abstracted" away

Planes are very complicated pieces of machinery. You have jet engines, oxygen systems, electrical systems, landing gear systems etc. but the pilot doesn't have to worry about the intricacies of the jet engine..all that is "abstracted away". This means that the pilot need only worry about steering the plane: left to go left, and right to go right, pull up to gain elevation, and push down to descend.

It's simple enough......actually I lied: controlling the steering wheel is a little bit more complicated. In an ideal world, that's the only thing the pilot should be worried about. But this isn't the case in real life: if you fly a plane like a monkey, without any real understanding of how a plane operates, or of any of the implementation details, then you'll likely crash and kill everyone on board.

Leaky Abstractions in 737 Example

In reality, a pilot does have to worry about a LOT of important things - not everything has been abstracted away: pilots have to worry about wind speed, thrust, angles of attack, fuel, altitude, weather problems, angles of descent, and whether the pilot is going in the right direction. Computers can help the pilot in these tasks, but not everything is automated / simplified.

e.g. If the pilot pulls up too hard on the column - the plane will obey, but then the pilot will risk stalling the plane, and once stalled, it is mighty difficult to regain control of it, before it comes crashing back down to the ground.

In other words, it is not enough for the pilot to simply control the steering wheel without knowing anything else.........nooooo.......the pilot must know about the underlying risks and limitations of the plane before the pilot flies one.......the pilot must know how the plane works, and how the plane flies; the pilot must know implementation details.....the pilot must know that pulling up too hard will lead to a stall, or that landing too steeply will destroy the plane.

Those things are not abstracted away. A lot of things are abstracted away, but not everything. The pilot need only worry about the steering column, and perhaps one or two other things. The abstraction is "leaky".

Leaky Abstractions in Code

......it's the same thing in your code. If you don't know the underlying implementation details, then more often than not, you'll work yourself into a corner.

Here is an example in coding:

ORMs abstract a lot of the hassle in dealing with database queries, but if you've ever done something like:

User.all.each do |user|
   puts user.name # let's print each user's name
end

Then you will realise that's a nice way to kill your app if you've got more than a couple of million of users. Not everything is abstracted away. You need to know that calling User.allwith 25 million users is going to spike your memory usage, and is going to cause problems. You need to know some underlying details. The abstraction is leaky.

Solution 8 - Programming Languages

An example in the django ORM many-to-many example:

Notice in the Sample API Usage that you need to .save() the base Article object a1 before you can add Publication objects to the many-to-many attribute. And notice that updating the many-to-many attribute saves to the underlying database immediately, whereas updating a singular attribute is not reflected in the db until the .save() is called.

The abstraction is that we are working with an object graph, where single-value attributes and mult-value attributes are just attributes. But the implementation as a relational database backed data store leaks... as the integrity system of the RDBS appears through the thin veneer of an object interface.

Solution 9 - Programming Languages

The fact that at some point, which will guided by your scale and execution, you will be needed to get familiar with the implementation details of your abstraction framework in order to understand why it behave that way it behave.

For example, consider this SQL query:

SELECT id, first_name, last_name, age, subject FROM student_details;

And its alternative:

SELECT * FROM student_details;

Now, they do look like a logically equivalent solutions, but the performance of the first one is better due the individual column names specification.

It's a trivial example but eventually it comes back to Joel Spolsky quote:

> All non-trivial abstractions, to some degree, are leaky.

At some point, when you will reach a certain scale in your operation, you will want to optimize the way your DB (SQL) works. To do it, you will need to know the way relational databases works. It was abstracted to you in the beginning, but it's leaky. You need to learn it at some point.

Solution 10 - Programming Languages

Assume, we have the following code in a library:

Object[] fetchDeviceColorAndModel(String serialNumberOfDevice)
{
    //fetch Device Color and Device Model from DB.
    //create new Object[] and set 0th field with color and 1st field with model value. 
}

When the consumer calls the API, they get an Object[]. The consumer has to understand that the first field of the object array has color value and second field is the model value. Here the abstraction has leaked from library to the consumer code.

One of the solutions is to return an object which encapsulates Model and Color of the Device. The consumer can call that object to get the model and color value.

DeviceColorAndModel fetchDeviceColorAndModel(String serialNumberOfTheDevice)
{
    //fetch Device Color and Device Model from DB.
    return new DeviceColorAndModel(color, model);
}

Solution 11 - Programming Languages

Leaky abstraction is all about encapsulating state. very simple example of leaky abstraction:

$currentTime = new DateTime();

$bankAccount1->setLastRefresh($currentTime);
$bankAccount2->setLastRefresh($currentTime);
$currentTime->setTimestamp($aTimestamp);

class BankAccount {
    // ...

    public function setLastRefresh(DateTimeImmutable $lastRefresh)
    {
        $this->lastRefresh = $lastRefresh;
    } }

and the right way(not leaky abstraction):

class BankAccount
{
    // ...

    public function setLastRefresh(DateTime $lastRefresh)
    {
        $this->lastRefresh = clone $lastRefresh;
    }
}

more description here.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGeonneView Question on Stackoverflow
Solution 1 - Programming LanguagesMark E. HaaseView Answer on Stackoverflow
Solution 2 - Programming LanguagestvanfossonView Answer on Stackoverflow
Solution 3 - Programming LanguagesJaredParView Answer on Stackoverflow
Solution 4 - Programming LanguagesJeff SternalView Answer on Stackoverflow
Solution 5 - Programming LanguagesJon HannaView Answer on Stackoverflow
Solution 6 - Programming LanguagesJUST MY correct OPINIONView Answer on Stackoverflow
Solution 7 - Programming LanguagesBenKoshyView Answer on Stackoverflow
Solution 8 - Programming Languageshash1babyView Answer on Stackoverflow
Solution 9 - Programming LanguagesJohnnyView Answer on Stackoverflow
Solution 10 - Programming LanguagesNiranjan RView Answer on Stackoverflow
Solution 11 - Programming LanguagesAlireza Rahmani khaliliView Answer on Stackoverflow