Why is the F# version of this program 6x faster than the Haskell one?

PerformanceHaskellF#

Performance Problem Overview


Haskell version(1.03s):

module Main where
  import qualified Data.Text as T
  import qualified Data.Text.IO as TIO
  import Control.Monad
  import Control.Applicative ((<$>))
  import Data.Vector.Unboxed (Vector,(!))
  import qualified Data.Vector.Unboxed as V

  solve :: Vector Int -> Int
  solve ar =
    V.foldl' go 0 ar' where
      ar' = V.zip ar (V.postscanr' max 0 ar)
      go sr (p,m) = sr + m - p

  main = do
    t <- fmap (read . T.unpack) TIO.getLine -- With Data.Text, the example finishes 15% faster.
    T.unlines . map (T.pack . show . solve . V.fromList . map (read . T.unpack) . T.words)
      <$> replicateM t (TIO.getLine >> TIO.getLine) >>= TIO.putStr

F# version(0.17s):

open System

let solve (ar : uint64[]) =
    let ar' = 
        let t = Array.scanBack max ar 0UL |> fun x -> Array.take (x.Length-1) x
        Array.zip ar t
    
    let go sr (p,m) = sr + m - p
    Array.fold go 0UL ar'

let getIntLine() =
    Console.In.ReadLine().Split [|' '|]
    |> Array.choose (fun x -> if x <> "" then uint64 x |> Some else None)    

let getInt() = getIntLine().[0]

let t = getInt()
for i=1 to int t do
    getInt() |> ignore
    let ar = getIntLine()
    printfn "%i" (solve ar)

The above two programs are the solutions for the Stock Maximize problem and times are for the first test case of the Run Code button.

For some reason the F# version is roughly 6x faster, but I am pretty sure that if I replaced the slow library functions with imperative loops that I could speed it up by at least 3 times and more likely 10x.

Could the Haskell version be similarly improved?

I am doing the above for learning purposes and in general I am finding it difficult to figure out how to write efficient Haskell code.

Performance Solutions


Solution 1 - Performance

If you switch to ByteString and stick with plain Haskell lists (instead of vectors) you will get a more efficient solution. You may also rewrite the solve function with a single left fold and bypass zip and right scan (1). Overall, on my machine, I get 20 times performance improvement compared to your Haskell solution (2).

Below Haskell code performs faster than the F# code:

import Data.List (unfoldr)
import Control.Applicative ((<$>))
import Control.Monad (replicateM_)
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C

parse :: ByteString -> [Int]
parse = unfoldr $ C.readInt . C.dropWhile (== ' ')

solve :: [Int] -> Int
solve xs = foldl go (const 0) xs minBound
    where go f x s = if s < x then f x else s - x + f s

main = do
    [n] <- parse <$> B.getLine
    replicateM_ n $ B.getLine >> B.getLine >>= print . solve . parse

1. See edits for an earlier version of this answer which implements solve using zip and scanr.
2. HackerRank website shows even a larger performance improvement.

Solution 2 - Performance

If I wanted to do that quickly in F# I would avoid all of the higher-order functions inside solve and just write a C-style imperative loop:

let solve (ar : uint64[]) =
  let mutable sr, m = 0UL, 0UL
  for i in ar.Length-1 .. -1 .. 0 do
    let p = ar.[i]
    m <- max p m
    sr <- sr + m - p
  sr

According to my measurements, this is 11x faster than your F#.

Then the performance is limited by the IO layer (unicode parsing) and string splitting. This can be optimised by reading into a byte buffer and writing the lexer by hand:

let buf = Array.create 65536 0uy
let mutable idx = 0
let mutable length = 0

do
  use stream = System.Console.OpenStandardInput()
  let rec read m =
    let c =
      if idx < length then
        idx <- idx + 1
      else
        length <- stream.Read(buf, 0, buf.Length)
        idx <- 1
      buf.[idx-1]
    if length > 0 && '0'B <= c && c <= '9'B then
      read (10UL * m + uint64(c - '0'B))
    else
      m
  let read() = read 0UL
  for _ in 1UL .. read() do
    Array.init (read() |> int) (fun _ -> read())
    |> solve
    |> System.Console.WriteLine

Solution 3 - Performance

Just for the record, the F# version is also not optimal. I don't think it really matters at this point, but if people wanted to compare the performance, then it is worth noting that it can be made faster.

I have not tried very hard (you can certainly make it even faster by using restricted mutation, which would not be against the nature of F#), but simple change to use Seq instead of Array in the right places (to avoid allocating temporary arrays) makes the code about 2x to 3x faster:

let solve (ar : uint64[]) =
    let ar' = Seq.zip ar (Array.scanBack max ar 0UL)    
    let go sr (p,m) = sr + m - p
    Seq.fold go 0UL ar'

If you use Seq.zip, you can also drop the take call (because Seq.zip truncates the sequence automatically). Measured using #time using the following snippet:

let rnd = Random()
let inp = Array.init 100000 (fun _ -> uint64 (rnd.Next()))
for a in 0 .. 10 do ignore (solve inp) // Measure this line

I get around 150ms for the original code and something between 50-75ms using the new version.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMarko GrdinićView Question on Stackoverflow
Solution 1 - Performancebehzad.nouriView Answer on Stackoverflow
Solution 2 - PerformanceJ DView Answer on Stackoverflow
Solution 3 - PerformanceTomas PetricekView Answer on Stackoverflow