You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a recent Stack Overflow question, Kwobny asked why .| was infixr. The potential "performance issue" raised in that question wasn't very compelling (i.e., so what if ((...((c1 .| c2) .| c3) ... .| c999999) .| c1000000) | return () is faster than the alternative), but I ended up running a few quick Criterion benchmarks on simple, pure conduits:
versus their left-associative equivalents (see below) and was surprised to discover that the left-associative versions were orders of magnitude faster than the right-associative versions (like 50ms to 10ms for rassoc1 versus its left-associative equivalent, and 8ms to 100us for rassoc2).
I admittedly haven't spent a lot of time studying this, but I thought I'd raise an issue here in case there's a quick explanation. Was the decision to encourage right associative conduits made early on because they "looked right", or is there some specific advantage to right associative conduits that makes the default associativity of .| the right choice (pun!). Maybe left associative conduits don't actually achieve constant memory usage? Maybe there's some advantage in the ordering of monadic effects for right associative conduits? Or maybe the performance issue above is just a missing REWRITE rule, and right associative conduits achieve better performance in more realistic workflows. Any thoughts?
The benchmark:
import Criterion
import Criterion.Main
import Conduit
(|.) :: Monad m => ConduitT a b m () -> ConduitT b c m r -> ConduitT a c m r
(|.) = (.|)
infixl 2 |.
d `divides` n = n `mod` d == 0
rassoc1, rassoc2 :: ConduitT () Void Identity Integer
rassoc1 = yieldMany [0..1000000] .| mapC (*2) .| mapC (+4) .| sumC
rassoc2 = yieldMany [1..] .| filterC (50 `divides`) .| takeC 10000 .| mapC (*2) .| sumC
lassoc1, lassoc2 :: ConduitT () Void Identity Integer
lassoc1 = yieldMany [0..1000000] |. mapC (*2) |. mapC (+4) |. sumC
lassoc2 = yieldMany [1..] |. filterC (50 `divides`) |. takeC 10000 |. mapC (*2) |. sumC
main = defaultMain
[ bench "rassoc1" $ whnf runConduit rassoc1
, bench "lassoc1" $ whnf runConduit lassoc1
, bench "rassoc2" $ whnf runConduit rassoc2
, bench "lassoc2" $ whnf runConduit lassoc2
]
{-
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.4.8
$ ghc -O ConduitAssoc1.hs && ./ConduitAssoc1
Loaded package environment from /u/buhr/.ghc/x86_64-linux-9.4.8/environments/default
benchmarking rassoc1
time 49.88 ms (49.60 ms .. 50.58 ms)
...
benchmarking lassoc1
time 11.51 ms (11.28 ms .. 11.76 ms)
...
benchmarking rassoc2
time 8.887 ms (8.881 ms .. 8.893 ms)
...
benchmarking lassoc2
time 118.8 μs (118.0 μs .. 120.4 μs)
...
-}
The text was updated successfully, but these errors were encountered:
I don't remember a specific motivation. I am surprised by this, since we used the codensity transform which should avoid the worst performance issues of left vs right associativity. The history of conduit though has lots of cases where new versions of GHC have changed behavior enough that performance degraded when upgrading, especially around rewrite rules.
Hi, I am the original poster of the stack overflow question. Here is the link to the question, which explains why I suspected the performance issue. https://stackoverflow.com/questions/78420681/in-the-haskell-conduit-library-why-is-the-operator-infixr
Furthermore, I believe the associativity advantage snoyberg refers to regarding the codensity transform applies to monadic composition only. Fusing of components is a different kind of composition for which I believe this advantage does not apply.
In addition, I'd like to admit I'm still kind of a beginner in haskell, so what I say might not be thorough or use the proper language. I also edited the question just now to more clearly explain my reasoning.
In a recent Stack Overflow question, Kwobny asked why
.|
wasinfixr
. The potential "performance issue" raised in that question wasn't very compelling (i.e., so what if((...((c1 .| c2) .| c3) ... .| c999999) .| c1000000) | return ()
is faster than the alternative), but I ended up running a few quick Criterion benchmarks on simple, pure conduits:versus their left-associative equivalents (see below) and was surprised to discover that the left-associative versions were orders of magnitude faster than the right-associative versions (like 50ms to 10ms for
rassoc1
versus its left-associative equivalent, and 8ms to 100us forrassoc2
).I admittedly haven't spent a lot of time studying this, but I thought I'd raise an issue here in case there's a quick explanation. Was the decision to encourage right associative conduits made early on because they "looked right", or is there some specific advantage to right associative conduits that makes the default associativity of
.|
the right choice (pun!). Maybe left associative conduits don't actually achieve constant memory usage? Maybe there's some advantage in the ordering of monadic effects for right associative conduits? Or maybe the performance issue above is just a missingREWRITE
rule, and right associative conduits achieve better performance in more realistic workflows. Any thoughts?The benchmark:
The text was updated successfully, but these errors were encountered: