Haskell record syntax

Layout Problem Overview

Haskell's record syntax is considered by many to be a wart on an otherwise elegant language, on account of its ugly syntax and namespace pollution. On the other hand it's often more useful than the position based alternative.

Instead of a declaration like this:

data Foo = Foo { 
  fooID :: Int, 
  fooName :: String 
} deriving (Show)

It seems to me that something along these lines would be more attractive:

data Foo = Foo id   :: Int
               name :: String
               deriving (Show)

I'm sure there must be a good reason I'm missing, but why was the C-like record syntax adopted over a cleaner layout-based approach?

Secondly, is there anything in the pipeline to solve the namespace problem, so we can write id foo instead of fooID foo in future versions of Haskell? (Apart from the longwinded type class based workarounds currently available.)

Layout Solutions

Solution 1 - Layout

Well if no one else is going to try, then I'll take another (slightly more carefully researched) stab at answering these questions.

tl;dr

Question 1: That's just the way the dice rolled. It was a circumstantial choice and it stuck.

Question 2: Yes (sorta). Several different parties have certainly been thinking about the issue.

Read on for a very longwinded explanation for each answer, based around links and quotes that I found to be relevant and interesting.

Why was the C-like record syntax adopted over a cleaner layout-based approach?

Microsoft researchers wrote a History of Haskell paper. Section 5.6 talks about records. I'll quote the first tiny bit, which is insightful:

> One of the most obvious omissions from early versions of Haskell was the absence of records, offering named ﬁelds. Given that records are extremely useful in practice, why were they omitted?

The Microsofties then answer their own question

> The strongest reason seems to have been that there was no obvious “right” design.

You can read the paper yourself for the details, but they say Haskell eventually adopted record syntax due to "pressure for named fields in data structures".

> By the time the Haskell 1.3 design was under way, in 1993, the user pressure for named ﬁelds in data structures was strong, so the committee eventually adopted a minimalist design...

You ask why it is why it is? Well, from what I understand, if the early Haskellers had their way, we might've never had record syntax in the first place. The idea was apparently pushed onto Haskell by people who were already used to C-like syntax, and were more interested in getting C-like things into Haskell rather than doing things "the Haskell way". (Yes, I realize this is an extremely subjective interpretation. I could be dead wrong, but in the absence of better answers, this is the best conclusion I can draw.)

Is there anything in the pipeline to solve the namespace problem?

First of all, not everyone feels it is a problem. A few weeks ago, a Racket enthusiast explained to me (and others) that having different functions with the same name was a bad idea, because it complicates analysis of "what does the function named ___ do?" It is not, in fact, one function, but many. The idea can be extra troublesome for Haskell, since it complicates type inference.

On a slight tangent, the Microsofties have interesting things to say about Haskell's typeclasses:

> It was a happy coincidence of timing that Wadler and Blott happened to produce this key idea at just the moment when the language design was still in ﬂux.

Don't forget that Haskell was young once. Some decisions were made simply because they were made.

Anyways, there are a few interesting ways that this "problem" could be dealt with:

Type Directed Name Resolution, a proposed modification to Haskell (mentioned in comments above). Just read that page to see that it touches a lot of areas of the language. All in all, it ain't a bad idea. A lot of thought has been put into it so that it won't clash with stuff. However, it will still require significantly more attention to get it into the now-(more-)mature Haskell language.

Another Microsoft paper, OO Haskell, specifically proposes an extension to the Haskell language to support "ad hoc overloading". It's rather complicated, so you'll just have to check out Section 4 for yourself. The gist of it is to automatically (?) infer "Has" types, and to add an additional step to type checking that they call "improvement", vaguely outlined in the selective quotes that follow:

> Given the class constraint Has_m (Int -> C -> r) there is only one instance for m that matches this constraint...Since there is exactly one choice, we should make it now, and that in turn fixes r to be Int. Hence we get the expected type for f: f :: C -> Int -> IO Int...[this] is simply a design choice, and one based on the idea that the class Has_m is closed

Apologies for the incoherent quoting; if that helps you at all, then great, otherwise just go read the paper. It's a complicated (but convincing) idea.

Chris Done has used Template Haskell to provide duck typing in Haskell in a vaguely similar manner to the OO Haskell paper (using "Has" types). A few interactive session samples from his site:

λ> flap ^. donald
*Flap flap flap*
λ> flap ^. chris
I'm flapping my arms!

fly :: (Has Flap duck) => duck -> IO ()
fly duck = do go; go; go where go = flap ^. duck

λ> fly donald
*Flap flap flap*
*Flap flap flap*
*Flap flap flap*

This requires a little boilerplate/unusual syntax, and I personally would prefer to stick to typeclasses. But kudos to Chris Done for freely publishing his down-to-earth work in the area.

Solution 2 - Layout

I just thought I'd add a link addressing the namespace issue. It seems that overloaded record fields for GHC are coming in GHC 7.10 (and are probably already in HEAD), using the OverloadedRecordFields extension.

This would allow for syntax such as

data Person = Person { id :: Int, name :: String }
data Company { name :: String, employees :: [Person] }

companyNames :: Company -> [String]
companyNames c = name c : map name (employees c)

Solution 3 - Layout

[edit] This answer is just some random thoughts of mine on the matter. I recommend my other answer over this one, because for that answer I took a lot more time to look up and reference other people's work.

Record syntax

Taking a few stabs in the dark: your "layout-based" proposed syntax looks a lot like non-record-syntax data declarations; that might cause confusion for parsing (?)

--record
data Foo = Foo {i :: Int, s :: String} deriving (Show)
--non-record
data Foo = Foo Int String deriving (Show)
--new-record
data Foo = Foo i :: Int, s :: String deriving (Show)

--record
data LotsaInts = LI {a,b,c,i,j,k :: Int}
--new-record
data LostaInts = LI a,b,c,i,j,k :: Int

In the latter case, what exactly is :: Int applied to? The whole data declaration?

Declarations with the record syntax (currently) are similar to construction and update syntax. Layout-based syntax would not be clearer for these cases; how do you parse those extra = signs?

let f1 = Foo {s = "foo1", i = 1}
let f2 = f1 {s = "foo2"}

let f1 = Foo s = "foo1", i = "foo2"
let f2 = f1 s = "foo2"

How do you know f1 s is a record update, as opposed to a function application?

Namespacing

What if you want to intermingle usage of your class-defined id with the Prelude's id? How do you specify which one you're using? Can you think of any better way than qualified imports and/or the hiding keyword?

import Prelude hiding (id)

data Foo = Foo {a,b,c,i,j,k :: Int, s :: String}
               deriving (Show)
               
id = i

ghci> :l data.hs
ghci> let foo = Foo 1 2 3 4 5 6 "foo"
ghci> id foo
4
ghci> Prelude.id f1
Foo {a = 1, b = 2, c = 3, i = 4, j = 5, k = 6, s = "foo"}

These aren't great answers, but they're the best I've got. I personally don't think record syntax is that ugly. I do feel there is room for improvement with the namespacing/modules stuff, but I have no idea how to make it better.

Solution 4 - Layout

As of June 2021, it has been half-implemented by three opt-in language extensions and counting:

https://gitlab.haskell.org/ghc/ghc/-/wikis/records/overloaded-record-fields

Even with all three extensions enabled, basic stuff like

len2 :: Point -> Double
len2 p = (x p)^2 + (y p)^2  -- fails!

still won't work if, say, there's a Quaternion type with x and y fields as well. You would have to do this:

len2 :: Point -> Double
len2 p = (x (p :: Point))^2 + (y (p :: Point))^2

or this:

len2 :: Point -> Double
len2 (MkPoint {x = px, y = py}) = px^2 + py^2

Even if the first example did work, it would still be opt-in, so odds are that it will be another two decades before the extension is widely adopted by the libraries that any real application must rely on.

It's ironic when a deal breaker like this is not an issue in a language like C.

One point of interest, though: Idris 2 has actually fixed this. It isn't really ready yet either, though.

Content Type	Original Author	Original Content on Stackoverflow
Question	Rob Agar	View Question on Stackoverflow
Solution 1 - Layout	Dan Burton	View Answer on Stackoverflow
Solution 2 - Layout	crockeea	View Answer on Stackoverflow
Solution 3 - Layout	Dan Burton	View Answer on Stackoverflow
Solution 4 - Layout	enigmaticPhysicist	View Answer on Stackoverflow