Difference between State, ST, IORef, and MVar

VariablesHaskellMonadsState MonadIoref

Variables Problem Overview


I am working through Write Yourself a Scheme in 48 Hours (I'm up to about 85hrs) and I've gotten to the part about Adding Variables and Assignments. There is a big conceptual jump in this chapter, and I wish it had been done in two steps with a good refactoring in between rather then jumping at straight to the final solution. Anyway…

I've gotten lost with a number of different classes that seem to serve the same purpose: State, ST, IORef, and MVar. The first three are mentioned in the text, while the last seems to be the favored answer to a lot of StackOverflow questions about the first three. They all seem to carry a state between consecutive invocations.

What are each of these and how do they differ from one another?


In particular these sentences don't make sense:

> Instead, we use a feature called state threads, letting Haskell manage the aggregate state for us. This lets us treat mutable variables as we would in any other programming language, using functions to get or set variables.

and

> The IORef module lets you use stateful variables within the IO monad.

All this makes the line type ENV = IORef [(String, IORef LispVal)] confusing - why the second IORef? What will break if I'll write type ENV = State [(String, LispVal)] instead?

Variables Solutions


Solution 1 - Variables

The State Monad : a model of mutable state

The State monad is a purely functional environment for programs with state, with a simple API:

  • get
  • put

Documentation in the mtl package.

The State monad is commonly used when needing state in a single thread of control. It doesn't actually use mutable state in its implementation. Instead, the program is parameterized by the state value (i.e. the state is an additional parameter to all computations). The state only appears to be mutated in a single thread (and cannot be shared between threads).

The ST monad and STRefs

The ST monad is the restricted cousin of the IO monad.

It allows arbitrary mutable state, implemented as actual mutable memory on the machine. The API is made safe in side-effect-free programs, as the rank-2 type parameter prevents values that depend on mutable state from escaping local scope.

It thus allows for controlled mutability in otherwise pure programs.

Commonly used for mutable arrays and other data structures that are mutated, then frozen. It is also very efficient, since the mutable state is "hardware accelerated".

Primary API:

  • Control.Monad.ST
  • runST -- start a new memory-effect computation.
  • And STRefs: pointers to (local) mutable cells.
  • ST-based arrays (such as vector) are also common.

Think of it as the less dangerous sibling of the IO monad. Or IO, where you can only read and write to memory.

IORef : STRefs in IO

These are STRefs (see above) in the IO monad. They don't have the same safety guarantees as STRefs about locality.

MVars : IORefs with locks

Like STRefs or IORefs, but with a lock attached, for safe concurrent access from multiple threads. IORefs and STRefs are only safe in a multi-threaded setting when using atomicModifyIORef (a compare-and-swap atomic operation). MVars are a more general mechanism for safely sharing mutable state.

Generally, in Haskell, use MVars or TVars (STM-based mutable cells), over STRef or IORef.

Solution 2 - Variables

Ok, I'll start with IORef. IORef provides a value which is mutable in the IO monad. It's just a reference to some data, and like any reference, there are functions which allow you to change the data it refers to. In Haskell, all of those functions operate in IO. You can think of it like a database, file, or other external data store - you can get and set the data in it, but doing so requires going through IO. The reason IO is necessary at all is because Haskell is pure; the compiler needs a way to know which data the reference points to at any given time (read sigfpe's "You could have invented monads" blogpost).

MVars are basically the same thing as an IORef, except for two very important differences. MVar is a concurrency primitive, so it's designed for access from multiple threads. The second difference is that an MVar is a box which can be full or empty. So where an IORef Int always has an Int (or is bottom), an MVar Int may have an Int or it may be empty. If a thread tries to read a value from an empty MVar, it will block until the MVar gets filled (by another thread). Basically an MVar a is equivalent to an IORef (Maybe a) with extra semantics that are useful for concurrency.

State is a monad which provides mutable state, not necessarily with IO. In fact, it's particularly useful for pure computations. If you have an algorithm that uses state but not IO, a State monad is often an elegant solution.

There is also a monad transformer version of State, StateT. This frequently gets used to hold program configuration data, or "game-world-state" types of state in applications.

ST is something slightly different. The main data structure in ST is the STRef, which is like an IORef but with a different monad. The ST monad uses type system trickery (the "state threads" the docs mention) to ensure that mutable data can't escape the monad; that is, when you run an ST computation you get a pure result. The reason ST is interesting is that it's a primitive monad like IO, allowing computations to perform low-level manipulations on bytearrays and pointers. This means that ST can provide a pure interface while using low-level operations on mutable data, meaning it's very fast. From the perspective of the program, it's as if the ST computation runs in a separate thread with thread-local storage.

Solution 3 - Variables

Others have done the core things, but to answer the direct question:

> All this makes the line type ENV = > IORef [(String, IORef LispVal)] > confusing. Why the second IORef? What > will break if I do type ENV = State > [(String, LispVal)] instead?

Lisp is a functional language with mutable state and lexical scope. Imagine you've closed over a mutable variable. Now you've got a reference to this variable hanging around inside some other function -- say (in haskell-style pseudocode) (printIt, setIt) = let x = 5 in (\ () -> print x, \y -> set x y). You now have two functions -- one prints x, and one sets its value. When you evaluate printIt, you want to lookup the name of x in the initial environment in which printIt was defined, but you want to lookup the value that name is bound to in the environment in which printIt is called (after setIt may have been called any number of times).

There are ways besids the two IORefs to do this, but you certainly need more than the latter type you've proposed, which doesn't allow you to alter the values that names are bound to in a lexically-scoped fashion. Google the "funargs problem" for a whole lot of interesting prehistory.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJohn F. MillerView Question on Stackoverflow
Solution 1 - VariablesDon StewartView Answer on Stackoverflow
Solution 2 - VariablesJohn LView Answer on Stackoverflow
Solution 3 - VariablessclvView Answer on Stackoverflow