Releases: typelevel/cats-effect
v3.4.4
This is the thirty-fourth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.
This release fixes a memory leak in Deferred
. The memory leak in question is relatively small, but can accumulate over a long period of time in certain common applications. Additionally, this leak regresses GC performance slightly for almost all Cats Effect applications. For this reason, it is highly recommended that users upgrade to this release as soon as possible if currently using version 3.4.3.
User-Facing Pull Requests
- #3336 – Update scala-js-macrotask-executor to 1.1.1 (@armanbilge)
- #3334 – Fix compile using
CallbackStack#clear
method (@armanbilge) - #3324 – Specialize
CallbackStack
on JS (@armanbilge) - #3333 – Avoid leaks in
IODeferred
(@durban) - #3307 – Fix propagation of
ExitCase
inResource#{both,combineK}
(@armanbilge)
Thank you so very much!
v3.4.3
This is the thirty-third release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.
Despite being a patch release, this update contains two major notable feature additions: full tracing support for Scala Native applications (including enhanced exceptions!), and significantly improved performance for Deferred
when IO
is the base monad. Regarding the latter, since Deferred
is at the core of most concurrent logic written against Cats Effect, it is expected that this change will result in some noticeable performance improvements in most applications, though it is hard to predict exactly how pronounced this effect will be.
User-Facing Pull Requests
- #3284 – Added a specialized version of
Deferred
based onIOFiber
's machinery (@djspiewak) - #3226 – Release loser eagerly in
Resource.race
(@armanbilge) - #3315 – Use configurable
reportFailure
forMainThread
(@armanbilge) - #3305 – More detailed warning for starvation checker (@armanbilge)
- #3310 –
IOLocal
micro-optimizations (@armanbilge) - #3195 – Tracing for Scala Native (@armanbilge)
- #3322 – Documentation fixes and improvements (@djspiewak)
Very special thanks to all of you!
v3.4.2
This is the thirty-second release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.
User-Facing Pull Requests
- #3290 – make
Deferred#complete
uncancelable (@durban) - #3281 – Implement
Ref
without wrappingAtomicReference
on JS/Native (@armanbilge) - #3280 – Suspend
cell
read inAtomicCell#evalModify
(@armanbilge) - #3275 – Fix 'Dispatcher' deprecaton annotation 'message' and 'since' (@seigert)
- #3269, #3267, #3257 – Documentation fixes and improvements (@iRevive)
Thank you so much!
v3.4.1
This is the thirty-first release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. The primary purpose of this release is to address a minor link-time regression which manifested when extending IOApp
with a class
(not a trait
) which was in turn extended by another class. In this scenario, the resulting main class would hang on exit if the intervening extension class had not been recompiled against Cats Effect 3.4.0. Note that this issue with separate compilation and IOApp
does remain in a limited form: the MainThread
executor is inaccessible when linked in this fashion. The solution is to ensure that all compilation units which extend IOApp
(directly or indirectly) are compiled against Cats Effect 3.4.0 or later.
User-Facing Pull Requests
- #3254 – Workaround for
IOApp
deadlock (@armanbilge) - #3255, #3253 – Documentation fixes and improvements (@iRevive)
Thank you, everyone!
v3.4.0
This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.
A Note on Release Cadence
While Cats Effect minor releases are always guaranteed to be fully backwards compatible with prior releases, they are not forwards compatible with prior releases, and partially as a consequence of this, can (and often do) break source compatibility. In other words, sources which compiled and linked successfully against prior Cats Effect releases will continue to do so, but recompiling those same sources may fail against a subsequent minor release.
For this reason, we seek to balance the inconvenience this imposes on downstream users against the need to continually improve and advance the ecosystem. Our target cadence for minor releases is somewhere between once every three months and once every six months, with frequent patch releases shipping forwards compatible improvements and fixes in the interim.
Unfortunately, Cats Effect 3.3.0 was released over ten months ago, meaning that the 3.4.0 cycle has required considerably more time than usual to come to fruition. There are several reasons for this, but long and short is that this is expected to be an unusual occurrence. We currently expect to release Cats Effect 3.5.0 sometime in Spring 2023, in line with our target cadence.
Major Changes
As this has been a longer than usual development stretch (between 3.3.0 and 3.4.0), this release contains a large number of significant changes and improvements. Additionally, several improvements that we're very excited about didn't quite make the cutoff and have been pushed to 3.5.0. This section details some of the more impactful changes in this release.
High Performance Queue
One of the core concurrency utilities in Cats Effect is Queue
. Despite its ubiquity in modern applications, the implementation of Queue
has always been relatively naive, based entirely on immutable data structures, Ref
, and Deferred
. In particular, the core of the bounded Queue
implementation since 3.0 looks like the following:
final class BoundedQueue[F[_]: Concurrent, A](capacity: Int, state: Ref[F, State[F, A]])
final case class State[F[_], A](
queue: ScalaQueue[A],
size: Int,
takers: ScalaQueue[Deferred[F, Unit]],
offerers: ScalaQueue[Deferred[F, Unit]])
The ScalaQueue
type refers to scala.collection.immutable.Queue
, which is a relatively simple Bankers Queue implementation within the Scala standard library. All end-user operations (e.g. take
) within this implementation rely on Ref#modify
to update internal state, with Deferred
functioning as a signalling mechanism when take
or offer
need to semantically block (because the queue is empty or full, respectively).
This implementation has several advantages. Notably, it is quite simple and easy to reason about. This is actually an important property since lock-free queues, particularly multi-producer multi-consumer queues, are extremely complex to implement correctly. Additionally, as it is built entirely in terms of Ref
and Deferred
, it is usable in any context which has a Concurrent
constraint on F[_]
, allowing for a significant amount of generality and abstraction within downstream frameworks.
Despite its simplicity, this implementation also does surprisingly well on performance metrics. Anecdotal use of Queue
within extremely hot I/O processing loops shows that it is rarely, if ever, the bottleneck on performance. This is somewhat surprising precisely because it's implemented in terms of these purely functional abstractions, meaning that it is relatively representative of the kind of performance you can expect out of Cats Effect as an end user when writing complex concurrent logic in terms of the Concurrent
abstraction.
Despite all this though, we always knew we could do better. Persistent, immutable data structures are not known for getting the absolute top end of performance out of the underlying hardware. Lock-free queues in particular have a very rich legacy of study and optimization, due to their central position in most practical applications, and it would be unquestionably beneficial to take advantage of this mountain of knowledge within Cats Effect. The problem has always been two fold: first, the monumental effort of implementing an optimized lock-free async queue essentially from scratch, and second, how to achieve this kind of implementation without leaking into the abstraction and forcing an Async
constraint in place of the Concurrent
one.
The constraint problem is particularly thorny, since numerous downstream frameworks have built around the fact that the naive Queue
implementation only requires Concurrent
, and it would not make much sense to force an Async
constraint when no surface functionality is being changed or added (only performance improvements). However, any high-performance implementation would require access to Async
, both to directly implement asynchronous suspension (rather than redirecting through Deferred
) and to safely suspend the side-effects required to manipulate mutable data structures.
This problem has been solved by using runtime casing on the Concurrent
instance behind the scenes. In particular, whenever you construct a Queue.bounded
, the runtime type of that instance is checked to see if it is secretly an Async
. If it is, the higher performance implementation is transparently used instead of the naive one. In practice, this should apply at almost all possible call sites, meaning that the new implementation represents an entirely automatic and behind the scenes performance improvement.
As for the implementation, we chose to start from the foundation of the industry-standard JCTools Project. In particular, we ported the MpmcArrayQueue
implementation from Java to Scala, making slight adjustments along the way. In particular:
- The pure Scala implementation can be cross-compiled to Scala.js (and Scala Native), avoiding the need for extra special casing
- Several minor optimizations have been elided, most notably those which rely on
sun.misc.Unsafe
for manipulation of directional memory fences - Through the use of a statically allocated exception as a signalling mechanism, we were able to add support for
null
values without introducing extra boxing - Sizes are not quantized to powers of 2. This imposes a small but measurable cost on all operations, which must use modular arithmetic rather than bit masking to map around the ring buffer
All credit goes to Nitsan Wakart (and other JCTools contributors) for this data structure.
This implementation is used to contain the fundamental data within the queue, and it handles an enormous number of very subtle corner cases involving numerous producers and consumers all racing against each other to read from and write to the same underlying data, but it is insufficient on its own to implement the Cats Effect Queue
. In particular, when offer
fails on MpmcArrayQueue
(because the queue is full), it simply rejects the value. When offer
fails on Cats Effect's Queue
, the calling fiber is blocked until space is available, encoding a form of backpressure that sits at the heart of many systems.
In order to achieve this semantic, we had to not only implement a fast bounded queue for the data, but also a fast unbounded queue to contain any suspended fibers which are waiting a condition on the queue. We could have used ConcurrentLinkedQueue
(from the Java standard library) for this, but we can do even better on performance with a bit of specialization. Additionally, due to cancelation, each listener needs to be able to efficiently remove itself from the queue, regardless of how far along it is in line. To resolve these issues, Viktor Klang and myself have built a more optimized implementation based on atomic pointer chaining. It's actually possible to improve on this implementation even further (among other things, by removing branching), which should arrive in a future release.
Congratulations on traversing this entire wall of text! Have a pretty performance chart as a reward:
This has been projected onto a linear relative scale. You can find the raw numbers here. In summary, the new queues are between 2x and 4x faster than the old ones.
The bottom line on all of this is that any application which relies on queues (which is to say, most applications) should see an automatic improvement in performance of some magnitude. As mentioned at the top, the queue data structure itself does not appear to be the performance bottleneck in any practical application, but every bit helps, and free performance is still free performance!
Hardened Queue
Semantics
As a part of the rework of the core data structures, it was decided to make a very subtle change to the semantics of the Queue
data structure while under heavy load, particularly in true multi-producer, multi-consumer (MPMC) scenarios. Under certain circumstances, the previous implementation of Queue
could actually lose data. This manifested when one fiber enqueued a value, while another fiber dequeued that value and was canceled during the dequeue. When this happened, it...
v3.4.0-RC2
This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.
For a more comprehensive treatment of all changes between 3.3.x and 3.4.0, please see the RC1 release notes. The following notes only cover the changes between RC1 and RC2.
User-Facing Pull Requests
- #3197 – Avoid spurious wake-up in
Concurrent
queues at capacity (@djspiewak) - #3193 – Fixed a few issues in the async queue (@djspiewak)
- #3194 – Reimplemented
Queue.synchronous
to resolve fifo issues (@djspiewak) - #3186 – Fix
NullPointerException
inRingBuffer#toList
(@RafalSumislawski) - #3183 – Fix deregistration of fiber from monitoring when
IOCont.Get
gets cancelled (@RafalSumislawski) - #3182 – Fix issue with fibers not getting deregistered from monitoring... (@RafalSumislawski)
A very special thanks to all!
v3.4.0-RC1
This is the thirtieth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.4.x release. Note that source compatibility has been broken with 3.3.x in some minor areas. Since those changes require active choice on the part of users to decide the best adjusted usage for their specific scenario, we have chosen to not provide scalafixes which automatically patch the affected call sites.
With this release, we're taking the unusual step of going through a release candidate cycle prior to 3.4.0 final. This process is designed to make it easier for the downstream ecosystem to try the new release and identify subtle incompatibilities or real world issues that are hard for us to entirely eliminate in-house. Binary- and source-compatibility is not guaranteed between release candidates, or between RCs and the final release, though major changes are very unlikely. If you represent a downstream framework or application, please do take the time to try out this release candidate and report any issues! We're particularly interested in feedback from applications which make heavy use of Queue
.
A Note on Release Cadence
While Cats Effect minor releases are always guaranteed to be fully backwards compatible with prior releases, they are not forwards compatible with prior releases, and partially as a consequence of this, can (and often do) break source compatibility. In other words, sources which compiled and linked successfully against prior Cats Effect releases will continue to do so, but recompiling those same sources may fail against a subsequent minor release.
For this reason, we seek to balance the inconvenience this imposes on downstream users against the need to continually improve and advance the ecosystem. Our target cadence for minor releases is somewhere between once every three months and once every six months, with frequent patch releases shipping forwards compatible improvements and fixes in the interim.
Unfortunately, Cats Effect 3.3.0 was released over ten months ago, meaning that the 3.4.0 cycle has required considerably more time than usual to come to fruition. There are several reasons for this, but long and short is that this is expected to be an unusual occurrence. We currently expect to release Cats Effect 3.5.0 sometime in Spring 2023, in line with our target cadence.
Major Changes
As this has been a longer than usual development stretch (between 3.3.0 and 3.4.0), this release contains a large number of significant changes and improvements. Additionally, several improvements that we're very excited about didn't quite make the cutoff and have been pushed to 3.5.0. This section details some of the more impactful changes in this release.
High Performance Queue
One of the core concurrency utilities in Cats Effect is Queue
. Despite its ubiquity in modern applications, the implementation of Queue
has always been relatively naive, based entirely on immutable data structures, Ref
, and Deferred
. In particular, the core of the bounded Queue
implementation since 3.0 looks like the following:
final class BoundedQueue[F[_]: Concurrent, A](capacity: Int, state: Ref[F, State[F, A]])
final case class State[F[_], A](
queue: ScalaQueue[A],
size: Int,
takers: ScalaQueue[Deferred[F, Unit]],
offerers: ScalaQueue[Deferred[F, Unit]])
The ScalaQueue
type refers to scala.collection.immutable.Queue
, which is a relatively simple Bankers Queue implementation within the Scala standard library. All end-user operations (e.g. take
) within this implementation rely on Ref#modify
to update internal state, with Deferred
functioning as a signalling mechanism when take
or offer
need to semantically block (because the queue is empty or full, respectively).
This implementation has several advantages. Notably, it is quite simple and easy to reason about. This is actually an important property since lock-free queues, particularly multi-producer multi-consumer queues, are extremely complex to implement correctly. Additionally, as it is built entirely in terms of Ref
and Deferred
, it is usable in any context which has a Concurrent
constraint on F[_]
, allowing for a significant amount of generality and abstraction within downstream frameworks.
Despite its simplicity, this implementation also does surprisingly well on performance metrics. Anecdotal use of Queue
within extremely hot I/O processing loops shows that it is rarely, if ever, the bottleneck on performance. This is somewhat surprising precisely because it's implemented in terms of these purely functional abstractions, meaning that it is relatively representative of the kind of performance you can expect out of Cats Effect as an end user when writing complex concurrent logic in terms of the Concurrent
abstraction.
Despite all this though, we always knew we could do better. Persistent, immutable data structures are not known for getting the absolute top end of performance out of the underlying hardware. Lock-free queues in particular have a very rich legacy of study and optimization, due to their central position in most practical applications, and it would be unquestionably beneficial to take advantage of this mountain of knowledge within Cats Effect. The problem has always been two fold: first, the monumental effort of implementing an optimized lock-free async queue essentially from scratch, and second, how to achieve this kind of implementation without leaking into the abstraction and forcing an Async
constraint in place of the Concurrent
one.
The constraint problem is particularly thorny, since numerous downstream frameworks have built around the fact that the naive Queue
implementation only requires Concurrent
, and it would not make much sense to force an Async
constraint when no surface functionality is being changed or added (only performance improvements). However, any high-performance implementation would require access to Async
, both to directly implement asynchronous suspension (rather than redirecting through Deferred
) and to safely suspend the side-effects required to manipulate mutable data structures.
This problem has been solved by using runtime casing on the Concurrent
instance behind the scenes. In particular, whenever you construct a Queue.bounded
, the runtime type of that instance is checked to see if it is secretly an Async
. If it is, the higher performance implementation is transparently used instead of the naive one. In practice, this should apply at almost all possible call sites, meaning that the new implementation represents an entirely automatic and behind the scenes performance improvement.
As for the implementation, we chose to start from the foundation of the industry-standard JCTools Project. In particular, we ported the MpmcArrayQueue
implementation from Java to Scala, making slight adjustments along the way. In particular:
- The pure Scala implementation can be cross-compiled to Scala.js (and Scala Native), avoiding the need for extra special casing
- Several minor optimizations have been elided, most notably those which rely on
sun.misc.Unsafe
for manipulation of directional memory fences - Through the use of a statically allocated exception as a signalling mechanism, we were able to add support for
null
values without introducing extra boxing - Sizes are not quantized to powers of 2. This imposes a small but measurable cost on all operations, which must use modular arithmetic rather than bit masking to map around the ring buffer
All credit goes to Nitsan Wakart (and other JCTools contributors) for this data structure.
This implementation is used to contain the fundamental data within the queue, and it handles an enormous number of very subtle corner cases involving numerous producers and consumers all racing against each other to read from and write to the same underlying data, but it is insufficient on its own to implement the Cats Effect Queue
. In particular, when offer
fails on MpmcArrayQueue
(because the queue is full), it simply rejects the value. When offer
fails on Cats Effect's Queue
, the calling fiber is blocked until space is available, encoding a form of backpressure that sits at the heart of many systems.
In order to achieve this semantic, we had to not only implement a fast bounded queue for the data, but also a fast unbounded queue to contain any suspended fibers which are waiting a condition on the queue. We could have used ConcurrentLinkedQueue
(from the Java standard library) for this, but we can do even better on performance with a bit of specialization. Additionally, due to cancelation, each listener needs to be able to efficiently remove itself from the queue, regardless of how far along it is in line. To resolve these issues, Viktor Klang and myself have built a more optimized implementation based on atomic pointer chaining. It's actually possible to improve on this implementation even further (among other things, by removing branching), which should arrive in a future release.
Congratulations on traversing this entire wall of text! Have a pretty performance chart as a reward:
This has been projected onto a linear relative scale. You can find the raw numbers here. In summary, the new queues are between 2x and 4x faster than the old ones.
The bottom line on all of this is that any application which relies on queues (which is to say, most applications) should see an automatic improvement in performance of some magnitude. As mentioned at...
v3.3.14
This is the twenty-ninth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.
This release contains significant fixes for the interruptibleMany
function, which could (under certain circumstances) result in a full runtime deadlock.
User-Facing Pull Requests
- #3081 – Improved granularity of
interruptible
loops (@durban) - #3074 – Resolve race condition in
interruptibleMany
after interruption (@djspiewak) - #3064 – Handle
Uncancelable
andOnCancel
insyncStep
interpreter (@armanbilge) - #3069 – Documentation fixes and improvements (@TonioGela)
Special thanks to all of you!
v3.3.13
This is the twenty-eighth release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.
User-Facing Pull Requests
- #3053 – Updated native image config for GraalVM 21.0 (@djspiewak)
- #3054 – Fix new blocking worker thread naming change for bincompat (@djspiewak)
- #3012 – Rename worker threads in blocking regions (@aeons)
- #3036 – Properly declare constants on Scala.js (@armanbilge)
- #3056, #2897, #3047 – Documentation fixes and improvements (@djspiewak, @Daenyth, @TimWSpence)
Thank you very much!
v3.3.12
This is the twenty-seventh release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.
User-Facing Pull Requests
- #2991 –
Resource#evalOn
should use provided EC for both acquire and release (@armanbilge) - #2972 – Fix leaking array ref in
Random
(@catostrophe) - #2963 – Override
racePair
in_asyncForIO
(@durban) - #2993, #2990, #2955 – Documentation fixes and improvements (@TimWSpence, @bplommer)
Thank you, all of you!