Why is GHC so large/big?

HaskellGhc

Haskell Problem Overview


Is there a simple answer: Why is GHC so big?

  • OCaml: 2MB
  • Python: 15MB
  • SBCL: 9MB
  • OpenJRE - 26MB
  • GHC: 113MB

Not interested in evangelism of "Why I shouldn't care about the size if Haskell is the right tool"; this is a technical question.

Haskell Solutions


Solution 1 - Haskell

It's a bit silly really. Every library that comes with GHC is provided in no less than 4 flavours:

  • static
  • dynamic
  • profiled
  • GHCi

The GHCi version is just the static version linked together in a single .o file. The other three versions all have their own set of interface files (.hi files) too. The profiled versions seem to be about twice the size of the unprofiled versions (which is a bit suspicious, I should look into why that is).

Remember that GHC itself is a library, so you're getting 4 copies of GHC. Not only that, but the GHC binary itself is statically linked, so that's 5 copies of GHC.

We recently made it so that GHCi could use the static .a files. That will allow us to get rid of one of these flavours. Longer term, we should dynamically link GHC, but that's a bigger change because that would entail making dynamic linking the default - unlike in C, with GHC you have to decide up front whether you're going to link dynamically or not. And we need more changes (e.g. to Cabal and the package system, amongst other things) before this is really practical.

Solution 2 - Haskell

Probably we should compare apples to apples and oranges to oranges. JRE is a runtime, not a developer kit. We may compare: source size of the development kit, the size of the compiled development kit and the compiled size of the minimal runtime.

OpenJDK 7 source bundle is 82 MB (download.java.net/openjdk/jdk7) vs GHC 7 source bundle, which is 23 MB (haskell.org/ghc/download_ghc_7_0_1). GHC is not big here. Runtime size: openjdk-6-jre-headless on Ubuntu is 77 MB uncompressed vs Haskell helloworld, statically linked with its runtime, which is <1 MB. GHC is not big here.

Where GHC is big, is the size of the compiled development kit:

GHC disk usage

GHC itself takes 270 MB, and with all the libraries and utilities that come together it takes over 500 MB. And yes, it's a lot, even with base libraries and a build tool/dependency manager. Java development platform is smaller.

GHC:

$ aptitude show ghc6 | grep Size
Uncompressed Size: 388M

against OpenJDK withdependencies:

$ aptitude show openjdk-6-jdk openjdk-6-jre openjdk-6-jre-headless ant maven2 ivy | grep Size
Uncompressed Size: 34.9M
Uncompressed Size: 905k
Uncompressed Size: 77.3M
Uncompressed Size: 1,585k
Uncompressed Size: 3,736k
Uncompressed Size: 991k

But it is still more than 100 MB, not 26 MB as you write.

Heavyweight things in ghc6 and ghc6-prof are:

$ dpkg -L ghc6 | grep '\.a$' | xargs ls -1ks | sort -k 1 -n -r | head -3
57048 /usr/lib/ghc-6.12.1/ghc-6.12.1/libHSghc-6.12.1.a
22668 /usr/lib/ghc-6.12.1/Cabal-1.8.0.2/libHSCabal-1.8.0.2.a
21468 /usr/lib/ghc-6.12.1/base-4.2.0.0/libHSbase-4.2.0.0.a
$ dpkg -L ghc6-prof | grep '\.a$' | xargs ls -1ks | sort -k 1 -n -r | head -3
112596 /usr/lib/ghc-6.12.1/ghc-6.12.1/libHSghc-6.12.1_p.a
 33536 /usr/lib/ghc-6.12.1/Cabal-1.8.0.2/libHSCabal-1.8.0.2_p.a
 31724 /usr/lib/ghc-6.12.1/base-4.2.0.0/libHSbase-4.2.0.0_p.a

Please note how big is libHSghc-6.12.1_p.a. So the answer seems to be static linking and profiling versions for every library out there.

Solution 3 - Haskell

My guess -- lots and lots of static linking. Each library needs to statically link its dependencies, which in turn need to statically link theirs and soforth. And this is all compiled often both with and without profiling, and even without profiling the binaries aren't stripped and so hold lots of debugger information.

Solution 4 - Haskell

Because it bundles gcc and a bunch of libraries, all statically linked.

At least on Windows.

Solution 5 - Haskell

Here's the directory size breakdown on my box:

https://spreadsheets.google.com/ccc?key=0AveoXImmNnZ6dDlQeHY2MmxPcEYzYkpweEtDSS1fUlE&hl=en

It looks like the largest directory (123 MB) is the binaries for compiling the compiler itself. The documents weigh in at an astounding 65 MB. Third place is Cabal at 41 MB.

The bin directory is 33 MB, and I think that only a subset of that is what's technically required to build Haskell applications.

Solution 6 - Haskell

Short answer is that it's because all executables are statically linked, may have debug info in them and libraries are included in multiple copies. This has already been said by other commenters.

Dynamic linking is possible and will reduce the size dramatically. Here is an example Hello.hs:

main = putStrLn "Hello world"

I build with GHC 7.4.2 on Windows.

ghc --make -O2 gives Hello.exe of 1105Ks

Running strip on it leaves 630K

ghc --make -O2 -dynamic gives 40K

Stripping it leaves just 13K.

It's dependencies are 5 dlls with total size of 9.2 MBs unstripped and 5.7 MB stripped.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChristopher DoneView Question on Stackoverflow
Solution 1 - HaskellSimon MarlowView Answer on Stackoverflow
Solution 2 - HaskellsastaninView Answer on Stackoverflow
Solution 3 - HaskellsclvView Answer on Stackoverflow
Solution 4 - HaskellMarkoView Answer on Stackoverflow
Solution 5 - HaskellJacobView Answer on Stackoverflow
Solution 6 - HaskellnponeccopView Answer on Stackoverflow