Purity vs Referential transparency

Language AgnosticFunctional ProgrammingSide EffectsReferential TransparencyPurely Functional

Language Agnostic Problem Overview


The terms do appear to be defined differently, but I've always thought of one implying the other; I can't think of any case when an expression is referentially transparent but not pure, or vice-versa.

Wikipedia maintains separate articles for these concepts and says:

From Referential transparency:

> If all functions involved in the > expression are pure functions, then > the expression is referentially > transparent. Also, some impure > functions can be included in the > expression if their values are > discarded and their side effects are > insignificant. >

From Pure expressions:

> > Pure functions are required to > construct pure expressions. [...] Pure > expressions are often referred to as > being referentially transparent.

I find these statements confusing. If the side effects from a so-called "impure function" are insignificant enough to allow not performing them (i.e. replace a call to such a function with its value) without materially changing the program, it's the same as if it were pure in the first place, isn't it?

Is there a simpler way to understand the differences between a pure expression and a referentially transparent one, if any? If there is a difference, an example expression that clearly demonstrates it would be appreciated.

Language Agnostic Solutions


Solution 1 - Language Agnostic

If I gather in one place any three theorists of my acquaintance, at least two of them disagree on the meaning of the term "referential transparency." And when I was a young student, a mentor of mine gave me a paper explaining that even if you consider only the professional literature, the phrase "referentially transparent" is used to mean at least three different things. (Unfortunately that paper is somewhere in a box of reprints that have yet to be scanned. I searched Google Scholar for it but I had no success.)

I cannot inform you, but I can advise you to give up: Because even the tiny cadre of pointy-headed language theorists can't agree on what it means, the term "referentially transparent" is not useful. So don't use it.


P.S. On any topic to do with the semantics of programming languages, Wikipedia is unreliable. I have given up trying to fix it; the Wikipedian process seems to regard change and popular voting over stability and accuracy.

Solution 2 - Language Agnostic

All pure functions are necessarily referentially transparent. Since, by definition, they cannot access anything other than what they are passed, their result must be fully determined by their arguments.

However, it is possible to have referentially transparent functions which are not pure. I can write a function which is given an int i, then generates a random number r, subtracts r from itself and places it in s, then returns i - s. Clearly this function is impure, because it is generating random numbers. However, it is referentially transparent. In this case, the example is silly and contrived. However, in, e.g., Haskell, the id function is of type a - > a whereas my stupidId function would be of type a -> IO a indicating that it makes use of side effects. When a programmer can guarantee through means of an external proof that their function is actually referentially transparent, then they can use unsafePerformIO to strip the IO back away from the type.

Solution 3 - Language Agnostic

I'm somewhat unsure of the answer I give here, but surely somebody will point us in some direction. :-)

"Purity" is generally considered to mean "lack of side-effects". An expression is said to be pure if its evaluation lacks side-effects. What's a side-effect then? In a purely functional language, side-effect is anything that doesn't go by the simple beta-rule (the rule that to evaluate function application is the same as to substitute actual parameter for all free occurrences of the formal parameter).

For example, in a functional language with linear (or uniqueness, this distinction shouldn't bother at this moment) types some (controlled) mutation is allowed.

So I guess we have sorted out what "purity" and "side-effects" might be.

Referential transparency (according to the Wikipedia article you cited) means that variable can be replaced by the expression it denotes (abbreviates, stands for) without changing the meaning of the program at hand (btw, this is also a hard question to tackle, and I won't attempt to do so here). So, "purity" and "referential transparency" are indeed different things: "purity" is a property of some expression roughly means "doesn't produce side-effects when executed" whereas "referential transparency" is a property relating variable and expression that it stands for and means "variable can be replaced with what it denotes".

Hopefully this helps.

Solution 4 - Language Agnostic

These slides from one ACCU2015 talk have a great summary on the topic of referential transparency.

From one of the slides:

> A language is referentially transparent if (a) > every subexpression can be replaced by any other > that’s equal to it in value and (b) all occurrences of > an expression within a given context yield the > same value.

You can have, for instance, a function that logs its computation to the program standard output (so, it won't be a pure function), but you can replace calls for this function by a similar function that doesn't log its computation. Therefore, this function have the referential transparency property. But... the above definition is about languages, not expressions, as the slides emphasize.

> [...] it's the same as if it were pure in the first place, isn't it?

From the definitions we have, no, it is not.

> Is there a simpler way to understand the differences between a pure expression and a referentially transparent one, if any?

Try the slides I mentioned above.

Solution 5 - Language Agnostic

I'll quote John Mitchell http://books.google.co.in/books/about/Concepts_in_Programming_Languages.html?id=7Uh8XGfJbEIC"> Concept in programming language. He defines pure functional language has to pass declarative language test which is free from side-effects or lack of side effects.

> "Within the scope of specific deceleration of x1,...,xn , all occurrence of an expression e containing only variables x1,...,xn have the same value."

In linguistics a name or noun phrase is considered referentially transparent if it may be replaced with the another noun phrase with same referent without changing the meaning of the sentence it contains.

Which in 1st case holds but in 2nd case it gets too weird.

> Case 1: "I saw Walter get into his new car."

And if Walter own a Centro then we could replace that in the given sentence as:

> "I saw Walter get into his Centro"

Contrary to first :

>Case #2 : He was called William Rufus because of his read beard.

Rufus means somewhat red and reference was to William IV of England.

"He was called William IV because of his read beard." looks too awkward.

Traditional way to say is, a language is referentially transparent if we may replace one expression with another of equal value anywhere in the program without changing the meaning of the program.

So, referential transparency is a property of pure functional language. And if your program is free from side effects then this property will hold.

So give it up is awesome advice but get it on might also look good in this context.

Solution 6 - Language Agnostic

> The nice thing about standards is that there are so many of them to choose from. > >Andrew S. Tanenbaum.

...along with definitions of referential transparency:

  • from page 176 of Functional programming with Miranda by Ian Holyer:

    > ### 8.1 Values and Behaviours > > The most important property of the semantics of a pure functional language is that the declarative and operational views of the language coincide exactly, in the following way: > > | Every expression denotes a value, and there are values
    corresponding to all possible program behaviours. The
    behaviour produced by an expression in any context
    is completely determined by its value, and vice versa.| > |-| > > This principle, which is usually rather opaquely called referential transparency, can also be pictured in the following way: > > picture of referential transparency

  • and from Nondeterminism with Referential Transparency in Functional Programming Languages by F. Warren Burton: > [...] the property that an expression always has the same value in the same environment [...]

...for various other definitions, see Referential Transparency, Definiteness and Unfoldability by Harald Søndergaard and Peter Sestoft.

Instead, we'll begin with the concept of "purity". For the three of you who didn't know it already, the computer or device you're reading this on is a solid-state Turing machine, a model of computing intrinsically connected with effects. So every program, functional or otherwise, needs to use those effects To Get Things DoneTM.

What does this mean for purity? At the assembly-language level, which is the domain of the CPU, all programs are impure. If you're writing a program in assembly language, you're the one who is micro-managing the interplay between all those effects - and it's really tedious!

Most of the time, you're just instructing the CPU to move data around in the computer's memory, which only changes the contents of individual memory locations - nothing to see there! It's only when your instructions direct the CPU to e.g. write to video memory, that you observe a visible change (text appearing on the screen).

For our purposes here, we'll split effects into two coarse categories:

  • those involving I/O devices like screens, speakers, printers, VR-headsets, keyboards, mice, etc; commonly known as observable effects.
  • and the rest, which only ever change the contents of memory.

In this situation, purity just means the absence of those observable effects, the ones which cause a visible change to the environment of the running program, maybe even its host computer. It is definitely not the absence of all effects, otherwise we would have to replace our solid-state Turing machines!


Now for the question of 42 life, the Universe and everything what exactly is meant by the term "referential transparency" - instead of herding cats trying to bring theorists into agreement, let's just try to find the original meaning given to the term. Fortunately for us, the term frequently appears in the context of I/O in Haskell - we only need a relevant article...here's one: from the first page of Owen Stephen's Approaches to Functional I/O:

> Referential transparency refers to the ability to replace a sub-expression with one of equal value, without changing the value of the outer expression. Originating from Quine the term was introduced to Computer Science by Strachey.

Following the references:

  • From page 9 of 39 in Christopher Strachey's Fundamental Concepts in Programming Languages:

    > One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression, the only thing we need to know about the sub-expression is its value. Any other features of the sub-expression, such as its internal structure, the number and nature of its components, the order in which they are evaluated or the colour of the ink in which they are written, are irrelevant to the value of the main expression.

  • From page 163 of 314 in Willard Van Ormond Quine's Word and Object:

    > [...] Quotation, which thus interrupts the referential force of a term, may be said to fail of referential transparency2. [...] I call a mode of confinement Φ referentially transparent if, whenever an occurrence of a singular term t is purely referential in a term or sentence ψ(t), it is purely referential also in the containing term or sentence Φ(ψ(t)).

    with the footnote:

    > 2 The term is from Whitehead and Russell, 2d ed., vol. 1, p. 665.

Following that reference:

  • From page 709 of 719 in Principa Mathematica by Alfred North Whitehead and Bertrand Russell:

    > When an assertion occurs, it is made by means of a particular fact, which is an instance of the proposition asserted. But this particular fact is, so to speak, "transparent"; nothing is said about it, bit by means of it something is said about something else. It is the "transparent" quality which belongs to propositions as they occur in truth-functions.

Let's try to bring all that together:

  1. Whitehead and Russell introduce the term "transparent";
  2. Quine then defines the qualified term "referential transparency";
  3. Strachey then adapts Quine's definition in defining the basics of programming languages.

So it's a choice between Quine's original or Strachey's adapted definition. You can try translating Quine's definition for yourself if you like - everyone who's ever contested the definition of "purely functional" might even enjoy the chance to debate something different like what "mode of containment" and "purely referential" really means...have fun! The rest of us will just accept that Strachey's definition is a little vague ("In essence [...]") and continue on:

> One useful property of expressions is referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression , the only thing we need to know about the sub-expression is its value. Any other features of the sub-expression, such as its internal structure, the number and nature of its components, the order in which they are evaluated or the colour of the ink in which they are written, are irrelevant to the value of the main expression.

(emphasis by me.)

Regarding that description ("that if we wish to find the value of [...]"), a similar, but more concise statement is given by Peter Landin in The Next 700 Programming Languages:

> the thing an expression denotes, i.e., its "value", depends only on the values of its sub-expressions, not on other properties of them.

Thus:

> One useful property of expressions is referential transparency. In essence this means the thing an expression denotes, i.e., its "value", depends only on the values of its sub-expressions, not on other properties of them.

Strachey provides some examples:

  • (page 12 of 39) > We tend to assume automatically that the symbol x in an expression such as 3_x_2 + 2_x_ + 17 stands for the same thing (or has the same value) on each occasion it occurs. This is the most important consequence of referential transparency and it is only in virtue of this property that we can use the where-clauses or λ-expressions described in the last section.

  • (and on page 16) > When the function is used (or called or applied) we write f[ε] where ε can be an expression. If we are using a referentially transparent language all we require to know about the expression ε in order to evaluate f[ε] is its value.

So referential transparency, by Strachey's original definition, implies purity - in the absence of an order of evaluation, observable and other effects are practically useless...

Solution 7 - Language Agnostic

Pure functions are those that return the same value on every call, and do not have side effects.

Referential transparency means that you can replace a bound variable with its value and still receive the same output.

Both pure and referentially transparent:

def f1(x):
    t1 = 3 * x
    t2 = 6
    return t1 + t2

Why is this pure?

Because it is a function of only the input x and has no side-effects.

Why is this referentially transparent?

You could replace t1 and t2 in f1 with their respective right hand sides in the return statement, as follows

def f2(x):
    return 3 * x + 6

and f2 will still always return the same result as f1 in every case.

Pure, but not referentially transparent:

Let's modify f1 as follows:

def f3(x):
    t1 = 3 * x
    t2 = 6
    x = 10
    return t1 + t2

Let us try the same trick again by replacing t1 and t2 with their right hand sides, and see if it is an equivalent definition of f3.

def f4(x):
    x = 10
    return 3 * x + 6

We can easily observe that f3 and f4 are not equivalent on replacing variables with their right hand sides / values. f3(1) would return 9 and f4(1) would return 36.

Referentially transparent, but not pure:

Simply modifying f1 to receive a non-local value of x, as follows:

def f5:
    global x
    t1 = 3 * x
    t2 = 6
    return t1 + t2

Performing the same replacement exercise from before shows that f5 is still referentially transparent. However, it is not pure because it is not a function of only the arguments passed to it.

Observing carefully, the reason we lose referential transparency moving from f3 to f4 is that x is modified. In the general case, making a variable final (or those familiar with Scala, using vals instead of vars) and using immutable objects can help keep a function referentially transparent. This makes them more like variables in the algebraic or mathematical sense, thus lending themselves better to formal verification.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAniView Question on Stackoverflow
Solution 1 - Language AgnosticNorman RamseyView Answer on Stackoverflow
Solution 2 - Language AgnosticsclvView Answer on Stackoverflow
Solution 3 - Language AgnosticArtyom ShalkhakovView Answer on Stackoverflow
Solution 4 - Language AgnosticvinipsmakerView Answer on Stackoverflow
Solution 5 - Language AgnosticPushpaView Answer on Stackoverflow
Solution 6 - Language AgnosticatraversView Answer on Stackoverflow
Solution 7 - Language AgnosticNikhil KumarView Answer on Stackoverflow