Better performance with a left associative `.|`? #509

kabuhr · 2024-05-03T16:06:02Z

In a recent Stack Overflow question, Kwobny asked why .| was infixr. The potential "performance issue" raised in that question wasn't very compelling (i.e., so what if ((...((c1 .| c2) .| c3) ... .| c999999) .| c1000000) | return () is faster than the alternative), but I ended up running a few quick Criterion benchmarks on simple, pure conduits:

rassoc1 :: ConduitT () Void Identity Integer
rassoc1 = yieldMany [0..1000000] .| mapC (*2) .| mapC (+4) .| sumC

rassoc2 :: ConduitT () Void Identity Integer
rassoc2 = yieldMany [1..] .| filterC (50 `divides`) .| takeC 10000 .| mapC (*2) .| sumC
    where d `divides` n = n `mod` d == 0

versus their left-associative equivalents (see below) and was surprised to discover that the left-associative versions were orders of magnitude faster than the right-associative versions (like 50ms to 10ms for rassoc1 versus its left-associative equivalent, and 8ms to 100us for rassoc2).

I admittedly haven't spent a lot of time studying this, but I thought I'd raise an issue here in case there's a quick explanation. Was the decision to encourage right associative conduits made early on because they "looked right", or is there some specific advantage to right associative conduits that makes the default associativity of .| the right choice (pun!). Maybe left associative conduits don't actually achieve constant memory usage? Maybe there's some advantage in the ordering of monadic effects for right associative conduits? Or maybe the performance issue above is just a missing REWRITE rule, and right associative conduits achieve better performance in more realistic workflows. Any thoughts?

The benchmark:

import Criterion
import Criterion.Main
import Conduit

(|.) ::  Monad m => ConduitT a b m () -> ConduitT b c m r -> ConduitT a c m r
(|.) = (.|)
infixl 2 |.

d `divides` n = n `mod` d == 0

rassoc1, rassoc2 :: ConduitT () Void Identity Integer
rassoc1 = yieldMany [0..1000000] .| mapC (*2) .| mapC (+4) .| sumC
rassoc2 = yieldMany [1..] .| filterC (50 `divides`) .| takeC 10000 .| mapC (*2) .| sumC

lassoc1, lassoc2 :: ConduitT () Void Identity Integer
lassoc1 = yieldMany [0..1000000] |. mapC (*2) |. mapC (+4) |. sumC
lassoc2 = yieldMany [1..] |. filterC (50 `divides`) |. takeC 10000 |. mapC (*2) |. sumC

main = defaultMain
  [ bench "rassoc1" $ whnf runConduit rassoc1
  , bench "lassoc1" $ whnf runConduit lassoc1
  , bench "rassoc2" $ whnf runConduit rassoc2
  , bench "lassoc2" $ whnf runConduit lassoc2
  ]

{-
   $ ghc --version
   The Glorious Glasgow Haskell Compilation System, version 9.4.8
   $ ghc -O ConduitAssoc1.hs && ./ConduitAssoc1
   Loaded package environment from /u/buhr/.ghc/x86_64-linux-9.4.8/environments/default
   benchmarking rassoc1
   time                 49.88 ms   (49.60 ms .. 50.58 ms)
   ...
   benchmarking lassoc1
   time                 11.51 ms   (11.28 ms .. 11.76 ms)
   ...
   benchmarking rassoc2
   time                 8.887 ms   (8.881 ms .. 8.893 ms)
   ...
   benchmarking lassoc2
   time                 118.8 μs   (118.0 μs .. 120.4 μs)
   ...
-}

The text was updated successfully, but these errors were encountered:

snoyberg · 2024-05-05T11:50:51Z

I don't remember a specific motivation. I am surprised by this, since we used the codensity transform which should avoid the worst performance issues of left vs right associativity. The history of conduit though has lots of cases where new versions of GHC have changed behavior enough that performance degraded when upgrading, especially around rewrite rules.

kwobny · 2024-05-05T18:40:37Z

Hi, I am the original poster of the stack overflow question. Here is the link to the question, which explains why I suspected the performance issue.
https://stackoverflow.com/questions/78420681/in-the-haskell-conduit-library-why-is-the-operator-infixr
Furthermore, I believe the associativity advantage snoyberg refers to regarding the codensity transform applies to monadic composition only. Fusing of components is a different kind of composition for which I believe this advantage does not apply.

In addition, I'd like to admit I'm still kind of a beginner in haskell, so what I say might not be thorough or use the proper language. I also edited the question just now to more clearly explain my reasoning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better performance with a left associative `.|`? #509

Better performance with a left associative `.|`? #509

kabuhr commented May 3, 2024 •

edited

Loading

snoyberg commented May 5, 2024

kwobny commented May 5, 2024

Better performance with a left associative .|? #509

Better performance with a left associative .|? #509

Comments

kabuhr commented May 3, 2024 • edited Loading

snoyberg commented May 5, 2024

kwobny commented May 5, 2024

Better performance with a left associative `.|`? #509

Better performance with a left associative `.|`? #509

kabuhr commented May 3, 2024 •

edited

Loading