Skip to content

Releases: uber/cadence

v0.5.5 Release

12 Mar 17:42
d181c67
Compare
Choose a tag to compare

This release include one bug fixes .

Cadence Visibility Bugfix For Cross DC

#1521 visibility RecordWorkflowExecutionClosed API should use query timestamp > RecordWorkflowExecutionStarted API query timestamp for the same workflow ID & run ID combination

The issue fixed above mainly impact cross DC passive cluster, leaking open visibility record of closed workflow, because timestamp used for Cassandra query is wrong.

v0.5.4 Release

08 Mar 18:58
68a2d45
Compare
Choose a tag to compare

This release include some bug fixes and feature improvements.

New Features and Improvements

Cadence Cross DC Improvements

#1492 Handle missing start workflow replication events
#1496 Reduce unnecessary replication calls
#1502 Handle out of order workflow replication
#1512 Fix workflow reorder

Elastic Search For Visibility Store Improvements

#1493 Fix race condition in ES bulk processor

Documentation

#1495 docker/README.md: CLI: how to register new domain
#1505 Update README.md

Archival Support

This feature is still in development and not ready for production.
#1474 Support GetWorkflowExecutionHistory API from archival
#1489 Update archival config
#1515 In cluster paused state delete history until backfill is in place

SQL Improvement

This feature is still in development and not ready for production.
#1490 persistence: API changes to support tasklist cleanup

Matching Service Improvement

#1500 matching: tasklistmanager refactoring

Bugfixes

#1486 Workflow retry incorrectly set attempt on start event of new run
#1499 Fix CronSchedule for child workflow
#1514 Fix workflow reset input validation
#1509 Bugfix: there shall be no events in history after workflow finish event
#1501 Change visibility closed_time to use last event timestamp
#1516 Fix write complete event backward compatibility issue

MISC

#1506 Add context timeout logging in frontend RPC handler

v0.5.3 Release

23 Feb 02:32
d64ded3
Compare
Choose a tag to compare

This release include some bug fixes and feature improvements.

New Features and Improvements

Elastic Search For Visibility Store

This feature is still in testing and not ready for production.
d64ded3 Add admin CLI tool to refill message to ElasticSearch (#1476)
cf04e3f Add CLI admin kafka parse for visibility data (#1462)
4ea4568 Add rate limit and metric to visibility on ElasticSearch (#1472)
2079852 Fix race for handling duplicate kafka message (#1475)

SQL Support

This feature is still in development and not ready for production.
6876051 sql: delete current_executions row as part of workflow deletion (#1461)
fc01b09 sql: add support for workflow retry params (#1460)
99ca975 sql: schema perf optimizations (#1446)

Archival Support

This feature is still in development and not ready for production.
3e9b12e Archival workflow implementation (#1441)

Other New Features

5bda240 Support read from closed execution v2 (#1470) (Issue #1024)
ae5bb31 config/rpc: support binding on user specified ip addr (#1471)

Bug Fixes

ae34391 History task processing will verify whether to process task only once (#1429)
38f9b01 Workflow continue as new replication should also take signal event into account (#1450)

Misc

8d6adbf Add missing schema to Docker image (#1473)
97b7ea7 Refactor duplicate code in CLI (#1468)
24b6a45 Use multi-stage build in dockerfile (#1447)

v0.5.2 Release

20 Feb 06:34
Compare
Choose a tag to compare

This release has some features like Workflow Reset and Cron Schedules which are ready for production. It also has schema changes and make sure cluster is upgraded to v0.13 version of Cassandra schema before deploying this release.

Schema Changes

Cassandra

V0.13

ced0e3b - Implement Workflow Reset on Active Side (#1352)
108981c - Workflow Reset XDC implementation (#1377)
5429aaf - Move history events out of mutableState to eventsCache (#1288)

New Features and Improvements

Workflow Reset

Workflow reset feature is now ready for production use. This relies on new store of history events 'events_v2', so make sure enable eventsV2 for domain before support for reset is allowed for workflow executions within that domain.
ced0e3b - Implement Workflow Reset on Active Side (#1352)
108981c - Workflow Reset XDC implementation (#1377)
acab59e - Fix reset bugs: 1. transient decision 2. task version 3. replay signals 4. dedup reset request by requestID (#1411)
5320df3 - Bugfix: workflow reset, when terminating current workflow, should set task version (#1421)
0dad9eb - Bugfix: workflow reset persistenct layer nil pointer (#1430)

Cron Support

StartWorkflow execution now also accepts cron schedule as workflow options. This makes it super easy to write cron jobs where you need to start a workflow based on cron schedule. Cadence server now automatically start a new run of workflow execution based on the cron schedule passed on the original start api call.
fae4032 - Use UTC for cron schedule (#1378)
d7cf958 - Fix cron schedule (#1444)

Archival Support

This feature is still in development and not ready for production.
7a0afef - Move historyV2 delete into util helper and add comment (#1376)
90a79cd - Support xdc domain archival config (#1373)
0365256 - Add cron support to CLI (#1386)
8730ca6 - Handle more input for domain archival config crud operations (#1398)
3ba1bec - Add blobstore functionality, compress blobs, and construct well formed blob bodies (#1399)
9a14b57 - Schedule timer task to send signal to archival system workflow (#1416)
edb7d45 - Add metric and retryable blobstore clients (#1410)

Elastic Search For Visibility Store

This feature is still in development and not ready for production.
0fda8e2 - Handle TTL for doc in ElasticSearch (#1389)
d27f279 - Fix visibility producer init process (#1408)
295251d - Add esVisibiltyManager for list workflow APIs on ElasticSearch (#1407)

Stability Fixes

78c47f9 - Fix data race (#1371)
5429aaf - Move history events out of mutableState to eventsCache (#1288)
def120e - Handle race condition in replication Q (#1427)
7b76232 - Always send out sync shard (#1452)

Bug Fixes

74cd70f - Fix replicate WorkflowStart to include cron and retry (#1420)
954647e - Allow query task for non-active domains (#1424)
d5fb347 - Bugfix: logic getting events from events cache should deal with failure (#1431)

CLI

b1008ed - Admin show allows dumping history into file as regular show command (#1391)
0365256 - Add cron support to CLI (#1386)

Misc

2f4092c - Update contributing instruction for PR titles (#1393)
a3634d6 - Release 0.5.0 in docker compose and fix release instruction (#1396)
5e40d42 - Adding travis_wait command so builds don't timeout in Travis CI (#1400)
dc2d9bf - Add frontend DC redirection functionality and policy (#1409)
1568cd6 - Parallelize travis builds for Cadence (#1428)
b4766a9 - Remove feature flags (#1425)

Release bug fix patch 0.5.1

18 Jan 19:45
Compare
Choose a tag to compare

Please use this one in favor of v0.5.0.
This version is mostly the same as v0.5.0, but including multiple bug fixes. The most important one is fixing busy loop in matching engine

V0.5.0 Release - Please use 0.5.1

15 Jan 23:02
Compare
Choose a tag to compare

Schema Changes

Cassandra

V0.12

This schema change will support cron workflow, history events V2, signal count enforcement and archival.

New Features and Improvements

Cron workflow

Native cron workflow is supported. Instead of using workflow.Sleep, you just need to specify the CronSchedule as part of the workflow config. This will make implementing the cron use case much easier and reliable.

  • 32eeacc 2018-12-20 Add server side cron support (#1342) [GitHub]

New History Events Persistence(eventsV2)

We rewrite out persistence layer for history events, which is more performant(35%), and allowing more features(forking). We will introduce a new feature(reset) in the next release based on it.

  • c53c610 2019-01-04 Fix bug for eventsV2 (#1367) [GitHub]
  • 4676004 2018-12-17 Fix eventsV2 detecting corrupted history too aggressive (#1338) [GitHub]
  • a71807c 2018-12-07 Fix bufferedReplicationTask for continueAsNew with eventsV2 (#1320) [GitHub]
  • 7ff9844 2018-12-05 Rename useEventsV2 for consistency and backward compatibility (#1313) [GitHub]
  • d82d4bf 2018-12-03 EventsV2 fixes: continueAsNew/getWFHistory bug, and add local integration tests (#1298) [GitHub]
  • e218544 2018-11-29 Events v2 conflict resolver bug (#1294) [GitHub]
  • 1e64b68 2018-11-29 Fix a bug in eventsV2 (#1290) [GitHub]
  • 379e124 2018-11-28 Remove historyBranches Map for using eventsV2 (#1286) [GitHub]
  • dec70b4 2018-11-16 Add more EventsV2 tests and fix bugs (#1252) [GitHub]
  • e8c5b5e 2018-11-13 Support events v2 for workflow history (#1218) [GitHub]
  • 46568e3 2018-11-08 Rewrite eventsV2 to get rid of Paxos write (#1225) [GitHub]
  • f6d4148 2018-10-29 Fix race in historyV2 test (#1207) [GitHub]
  • ebc991d 2018-10-23 Add History V2: API and Cassandra implementation (#1156) [GitHub]

Auto Re-replication

This feature will give our cross-DC cluster the ability to self recover when our replication message bus(currently Kafka) is losing data.

  • 3148978 2019-01-03 Handle edge case when doing history re-replication from standby side. (#1358) [GitHub]
  • 77f6559 2018-12-21 Serialize the request to fetch missing kafka message from remote & edge case handling (#1353) [GitHub]
  • f419d49 2018-12-20 New functionality for cross DC handling case if Kafka is not lossless [GitHub]

Limit enforcement(history size and signal count)

Configurable limits are introduced to protect our system from being abused.

  • 4c1f246 2019-01-02 size limit: max ID length (#1329) [GitHub]
  • 7fc51ca 2018-12-11 Fail inflight decision when buffered events exceeds limit (#1326) [GitHub]
  • b01495b 2018-12-11 SizeLimit: enforce hitory size/count limit (#1325) [GitHub]
  • c84ac99 2018-12-07 Size limit: blob size limit (#1309) [GitHub]
  • ff8281b 2018-12-06 Add page size limit for GetWorkflowExecutionHistory (#1315) [GitHub]
  • 3bbe39f 2018-11-28 Enforce limit on maximum number of signals per execution (#1270) [GitHub]

Listing workflow ordered by closed time

  • 96c83f7 2018-11-27 Add Disable List open/close workflow by secondary index filter (#1282) [GitHub]
  • 2cf59fb 2018-11-02 Change list workflow to order by close time (#1221) [GitHub]
  • 567dc71 2018-10-30 CLI add list workflow by closed status (#1210) [GitHub]

CLI

  • 39f0f40 2018-12-26 CLI: print treeID/branchID for eventsV2 branchToken in admin wf desc command (#1354) [GitHub]
  • 383c699 2018-12-21 Update cli to be able to set client factory from outside module (#1355) [GitHub]
  • 41824ce 2018-12-07 Refactor cli and add server side admin clients (#1321) [GitHub]
  • acda8ec 2018-12-07 Various fixes for admin commands (#1305) [GitHub]
  • 349f26c 2018-12-05 CLI improvement on ErrorAndExit (#1312) [GitHub]
  • 3d37cb9 2018-12-04 Change CLI to exit on any errors with exit code 1 (#1304) [GitHub]
  • 9ee1b2d 2018-11-29 Refactor client lib separating client retrieval logic (#1291) [GitHub]
  • 14579b4 2018-11-26 Various Admin Operation Commands (#1274) [GitHub]
  • ff8563f 2018-11-26 Support header mode on kafka admin tool [GitHub]
  • 889c3d2 2018-11-22 Fix markOffset for merge DLQ (#1273) [GitHub]
  • f7bca10 2018-11-21 DLQ merge tool (#1271) [GitHub]
  • d256ba1 2018-11-19 Rereplicate events from json file (#1265) [GitHub]
  • abca884 2018-11-19 Fix bug with kafka tool not processing last message (#1267) [GitHub]
  • bde9362 2018-11-08 Skip err mode (#1230) [GitHub]
  • 567dc71 2018-10-30 CLI add list workflow by closed status (#1210) [GitHub]
  • f956b1f 2018-10-25 Add admin kafka cli to decode messages (#1203) [Samar Abbas - Uber]

MySQL (in development)

  • 860d64a 2018-11-20 sql: createWFExecution - remove redundant write lock on shards (#1226) [GitHub]
  • e69b36d 2018-11-13 sql: support for getTimerIndexTasks pagination (#1224) [GitHub]
  • 73b753a 2018-10-31 Added sql visibility persistence (#1196) [GitHub]
  • f2ebf3c 2018-10-20 Merged tx_execute with runTransaction (#1181) [GitHub]

Archival (in development)

  • 4814b92 2019-01-02 Add archival config and support file based blobstore (#1362) [GitHub]
  • 0fe330b 2018-12-21 Enable archival config per domain (#1351) [GitHub]
  • c0c38f7 2018-12-18 Add blobstore interface and plumb into frontend (#1340) [GitHub]
  • a25afc9 2018-12-05 Add archival interface and plumb into sysworkflow (#1311) [GitHub]
  • f5aa78f 2018-11-27 Disable system workflow (#1284) [Quanzheng Long]
  • 9885e5f 2018-11-21 Add all plumbing for system workflows (#1259) [GitHub]
  • 948ee2a 2018-11-14 Do not start client worker as part of worker role (#1254) [GitHub]

Visibility with ElastiSearch (in development)

  • 6d858c3 2018-11-20 Add publish visibility for open workflow to Kafka (#1256) [GitHub]
  • 08f5147 2018-11-02 Add visibility sampling back, and add limit to List APIs (#1220) [GitHub]
  • abcbb38 2018-11-02 Add metrics and logs to visibility (#1217) [GitHub]
  • 720ddb5 2018-12-19 Add metrics for kafka producer client (#1345) [GitHub]

Misc

  • 8e51efe 2019-01-07 Bugfix: decision task processing in transfer active task processor (#1369) [Wenquan Xing]
  • 2e916c9 2019-01-04 Return run ID for describe workflow execution (#1370) [Wenquan Xing]
  • 0ff94d9 2019-01-02 Handle standby side workflow timeout & missing events (#1356) [GitHub]
  • 9e58131 2019-01-02 Move schema change in 0.13 to 0.12 (#1363) [GitHub]
  • e5f00c2 2018-12-28 Fix incorrect history length in visibility (#1361) [GitHub]
  • bff2bc2 2018-12-19 Bugfix: timer active / failover processor should notify active processor (#1347) [GitHub]
  • 75f771c 2018-12-18 Add test case verifying that workflow ID reuse policy is hornored. (#1343) [GitHub]
  • 893c0a7 2018-12-14 Bugfix: continue as new timer is not set (#1337) [GitHub]
  • 59d60b6 2018-12-14 Logging improvement for persistence errors (#1334) [GitHub]
  • 8f3f5c1 2018-12-14 Fix linter (#1332) [GitHub]
  • f02be5d 2018-12-13 Record failure reason for retry (#1330) [GitHub]
  • 5ec5787 2018-12-13 Bugfix: GetActivityTimerTaskIfNeeded did not set the LastTimeoutVisib… (#1327) [GitHub]
  • 7d85142 2018-12-11 Add new option WorkflowIdReusePolicy on creating new workflow in cli (#1328) [GitHub]
  • e4cfadf 2018-12-10 Garbage collection of signals if those events are stale (#1279) [GitHub]
  • fec345a 2018-12-07 Adding check on request id not set (#1319) [GitHub]
  • ac9b856 2018-12-06 Fix query task before first decision task completed (#1186) [GitHub]
  • ea7b70e 2018-12-05 fix nil pointer error during logging (#1310) [Quanzheng Long]
  • 7238eec 2018-12-05 Add logs for heartbeat timeout debugging (#1306) [Quanzheng Long]
  • 38e1a0b 2018-12-03 Optimize failover performance by batching domain failovers (#1280) [GitHub]
  • 5a8d1ed 2018-12-03 Bugfix: timer gate is not properly closed (#1299) [GitHub]
  • 5676197 2018-12-01 Fixed retry policy expiration interval validation (#1287) [GitHub]
  • af8dc18 2018-11-30 remove permission for domain failover (#1295) [GitHub]
  • 974e6cb 2018-11-30 Fix flaky TestVisibility (#1296) [GitHub]
  • 504555b 2018-11-28 Add cadence web to readme (#1289) [GitHub]
  • e5718b4 2018-11-26 Few bugfixes: (#1278) [GitHub]
  • 2885d0c 2018-11-19 Add attempt to PendingActivityInfo for DescribeWorkflow response (#1205) [GitHub]
  • 92dab95 2018-11-19 Use long timeout and short timeout on frontend (#1260) [GitHub]
  • ba14055 2018-11-16 Set context timeout for various history service api calls (#1239) [GitHub]
  • eecdbba 2018-11-16 Make client timeouts configurable and increase frontend timeout (#1258) [GitHub]
  • 272acd8 2018-11-14 Make worker config dynamic (#1253) [GitHub]
  • 5100907 2018-11-14 Bugfix: missing error check when allocating task IDs for timer tasks (#1249) [yiminc]
  • bcd373f 2018-11-14 Start cadence client worker upon worker role startup (#1244) [GitHub]
  • e8dbfab 2018-11-12 Include shard-id tag with failures from execution manager (#1235) [GitHub]
  • b074c80 2018-11-08 Make worker required service for cadence startup (#1228) [GitHub]
  • 9226ce1 2018-11-02 Add log for more visibility on timer tasks (#1215) [GitHub]
  • f12f908 2018-11-01 Fix heartbeat timeout after retry (#1219) [GitHub]
  • dee4c74 2018-10-30 Disallow activity completion from previous attempts (#1214) [GitHub]
  • 379dd11 2018-10-30 Move history-size to common metric (#1212) [GitHub]
  • ec36698 2018-10-29 Emit metric on history sizes by frontend (#1211) [GitHub]
  • c0cb4e5 2018-10-27 update docker for release v0.4.0 (#1209) [GitHub]
  • 0d10b48 2018-10-25 Bugfix: Do not clear heartbeat timestamp when retry (#1204) [GitHub]
  • 1086b6a 2018-10-22 Update generated code to use latest thriftrw and update lock to be in sync with toml (#1200) [GitHub]
  • f0c5eac 2018-10-21 fix mispell (#1197) [Maxim Fateev]

v0.4.0 Release

23 Oct 22:41
81c0089
Compare
Choose a tag to compare

Cadence Cross DC is now ready for production in this release. It has 2 versions of schema changes and make sure cluster is upgraded to v0.11 version of Cassandra schema before deploying this release. MySQL support is still in development and not ready for production yet.

Schema Changes

Cassandra

V0.10

9f7df73 - Handle workflow history reset, without deletion (#997)
73b18a1 - Handle 2 edge case when start cross dc workflow (#1113)

V0.11

c183384 - Implement server side retry for workflow/child workflow
5c93511 - Emit metric for various mutable state stats (#1136)
e49263d - Event thrift encoding and refactoring dataInterfaces (#1145)
54eecab - Handle activity retry and activity hearbeat for cross DC (#1166)

MySQL

42df0fc - Moved cassandra schema to schema/cassandra (#1115)
ca90227 - SQL Persistence Implementation (#1121)
9e44e46 - mysql: cleanup history, shard, task and domain persistence (#1147)
a174a4d - Next batch of SQL persistence fixes (#1176)
4eb6818 - persistence: fixes for mysql (#1184)
81c0089 - mysql: bug fixes (#1195)

New Features and Improvements

Cadence Cross DC

Cadence CrossDC feature is now production ready with this release. It is an optional feature and existing clusters can be upgraded to use this feature in production. It requires Kafka for shipping replication tasks and a new worker role in Cadence which hosts replicator to support consuming replication tasks from Kafka to apply to local cluster. Please see services/worker for more details.

39b84a0 - Add new dynamic config (#1012)
6c68917 - Abstract Kafka interface (#1003)
9f7df73 - Handle workflow history reset, without deletion (#997)
a75e30d - Bugfix: decision attempt should be handled by history replication (#1005)
bf1e4e1 - Replication info should only contains last event version & ID of other cluster (#998)
c6ef084 - Bugfix: handle the buffered events during failover (#1001)
e8c1aee - Handle domain not active error on matching side (#1011)
bd79b91 - Should not apply / buffer stale replication event (#1020)
dd25233 - Push to matching engine when standby activity / decision task exceed time limit. (#1019)
e6c5a67 - Handle history replication race case (#1025)
ea816df - Upgrade uber-go/kafka-client to 0.2.x (#1028)
5a38f2f - Bugfix: current workflow execution deletion is missing replication state, causing cassandra fail to delete the entire record (#1030)
96e49a8 - Bugfix: when processing timer / transfer task and retry on DomainNotActiveError after expiration time, should complete task (#1039)
d8ec0e0 - Bugfix: failover timer ack manager should never do look ahead (#1036)
7d2ccdc - Bugfix: when encounter rate limit exceeded error when processing task, the task should be marked as completed (#1041)
71e126f - Fix leaking kafka consumer (#1062)
2cf0577 - Bug fix: add missing filed for create/update domain on replication (#1061)
0300251 - Bugfix: history replicator, when calling terminate workflow, should use standby logic rather than active logic, since active logic will do check on version and return domain not active (#1058)
da38e42 - Bugfix: reset workflow should also reset the current execution record (#1057)
1a49292 - Add integration tests for xdc (#1078)
e488d84 - Add more cases to xdc integration test (#1086)
faa8421 - Bugfix: continue as new timer should use start time set by the event time (#1079)
1bb2faa - Bugfix: should not flush buffer if the workflow has finished (#1092)
63b5612 - Bugfix: history replicator should use last write version, not start version to check for conflict resolution. (#1090)
6bc789d - Add handling of CurrentWorkflowConditionFailedError (#1101)
4fcf57c - Add new error: internal failure, which indicate code bug. (#1119)
df53bbc - Use thriftrw binary encoding for kafka message (#1120)
4fe90bb - Bugfix error is not handled for replication task ack (#1122)
43740b2 - For Cross DC, temporarily disable activity heartbeat timer task generation on standby (#1130)
b1e68cc - Bugfix: state builder should refresh activity / user timer if activit… (#1126)
ba97e71 - Drop sync shard time task if time diff > 10 min (#1132)
5955efa - Non global domain should use empty event version (#1142)
73b18a1 - Handle 2 edge case when start cross dc workflow (#1113)
505fa98 - Bugfix: domain replication should not replicate non global domain (#1165)
02f2001 - Change timer ack manager to have readlevel per cluster (#1075)
54eecab - Handle activity retry and activity hearbeat for cross DC (#1166)
a96a8e5 - Handle Transient Decision On Standby Cluster (#1183)

MySQL Support (In Development)

1a22c01 - Refactoring of persistence unit tests (#1111)
42df0fc - Moved cassandra schema to schema/cassandra (#1115)
ca90227 - SQL Persistence Implementation (#1121)
21d69b7 - Fixed SQL metadata persistence (#1133)
593ac83 - Renamed SQL implementation classes to reflect interface names (#1138)
6207cbe - Renamed testBase.WorkflowMgr to ExecutionManager (#1140)
9e44e46 - mysql: cleanup history, shard, task and domain persistence (#1147)
e16cb02 - persistence: config refactoring to support heterogeneous datastores (#1168)
86a6f71 - persistence-test: add DBPort option (#1173)
e70b067 - persistence-test: add schemaDir option (#1171)
a174a4d - Next batch of SQL persistence fixes (#1176)
4eb6818 - persistence: fixes for mysql (#1184)
9e9969b - mysql: convert sqlx batch deletes to normal deletes (#1193)
81c0089 - mysql: bug fixes (#1195)

Server Side Retries

c183384 - Implement server side retry for workflow/child workflow
f3a9acb - retry expiration for activity (#907)
dc1922d - Extend SCHEDULE_TO_START and SCHEDULE_TO_CLOSE to expiration on retry policy (#1089)
f805d2d - Retry with heartbeat progress (#1091)

Visibility Sampling

61068e4 - Add PriorityTokenBucket (#1048)
a5732f0 - Add visibility sampling (#1059)
a0cd3f5 - Adding support for long retention to sampled workflows (#1141)
61e6eaf - Add sampling for long rentention in visibility (#1158)

Thrift Encoding For History Events

e49263d - Event thrift encoding and refactoring dataInterfaces (#1145)
468b3e3 - Bug fix: nil checking for passing bufferedReplicationTask object (#1154)
6f17033 - Add dynamic config for encoding type (#1153)
a93e79e - Bug fix: nil pointer error for appendHistory (#1157)
1391eb7 - Fix request.NewBufferedEvents (#1160)
4a3c01f - Add unit tests for bufferedEventCount/Size and clear buffered events … (#1161)
496235b - Bugfix: shard context should get default encoding before acquire shard lock, or otherwise, deadlock will happen (#1163)

Stability Fixes

bb56513 - Optimze matching getTasks with frequent break (#1008)
98e4094 - Add service protection to signal input size (#1015)
6eea91f - Add service protection to history service (#1029)
bd37bac - Handle potential coroutine leak issue (#1033)
48b77aa - Use token aware round robin policy for cassandra (#1056)
59e184f - Do not acquire all shard when doing start up. (#1047)
44f5d46 - Fix leak task for timer and transfer queue processor (#1072)
ae390d9 - Enable range delete on timer / transfer tasks (#1043)
986e93c - Fix history corruption due to deletion of events (#1087)
ef4d157 - Fix executions table conditional update failure detection (#1095)
b03dc32 - Add shard protection in case task is pending for too long (#1131)
b91d4c0 - Increase the signal size limit to 2 MB (#1149)
41cf0cf - Refactor task processing logic (#1107)
3df20a9 - Out of order history events (#1191)

Bug Fixes

2d99fa3 - Add timeout timer for renewed decision task (#1023)
ba508e2 - Change timer processor logic to Fix timer skipped issue (#970)
dc689c4 - Skip staled batch of events (#1042)
0e276d5 - Bugfix: cancel timer may cause nil pointer error (#1045)
411ea26 - Bugfix: should notify new timer if current timer scan is a failure (#1050)
86b6003 - Bugfix: reset sticky tasklist should not update mutable state if workflow finished (#1096)
b7200f4 - Fix execution table update condition for current_run_id checking and fix error msg (#1098)
1499a13 - Bugfix for missing heartbeat timer (#1108)
430d371 - Should check whether workflow has closed before processing decision (#1109)
e539407 - Bugfix: when schedule a new decision, if there is any event flushed out of the buffer, the new decision cannot be a transient decision (#1135)
6a30c59 - Bugfix: when failing decision, should also reset timer builder (#1146)
83aef38 - Bugfix: fix racing condition when a shard is to be removed and re-initialized again (#1118)
825db3a - Fix nil pointer error during ConditionFailedError (#1152)
8a00010 - Use domain ID + workflow ID + run ID as history cache key instead of … (#1151)
8693242 - Fix race issue in signalwithstart (#1167)
ab3ec46 - Force a non transient decision on sticky timeout (#1175)
bf434b7 - Bugfix: should only create one session per execution store factory (#1182)
ea5a0dd - Check if domain exists before do domain registration, (#1188)
93d05b3 - Fix sticky query timeout (#1187)
36851f2 - Add WorkflowIDResuePolicy to SignalWithStart (#1185)
dce3ea7 - When workflow finish, clear stickness (#1194)

Operational Improvements

052e9ac - Fix some issues on metrics and logging (#1009)
427f151 - Add metrics recording signal size (#1034)
a2cb9e3 - Return fmt.Errorf for internal error instead of thrift error (#1035)
524a1a7 - define metrics scope for ResetStickyTaskList (#1037)
95c7ec9 - Add metrics for task process latency (#1044)
de42b32 - Add metrics for SyncMatch (#1049)
87b12c2 - Fix retry timer metric scope (#1067)
779f2b4 - Seperate metrics of client API from service API (#1088)
3f7a3b0 - Fix missing and inconsistent metrics (#1102)
58d5da4 - Optimize debug loging for dynamic config no value (#1104)
96d0f19 - Add context timeout flag to list...

Read more

v0.3.15 Release

27 Jul 18:43
122cad8
Compare
Choose a tag to compare

This release fixes some critical stability issues introduced in v0.3.14. You can skip v0.3.14 and directly upgrade to this release on top of v0.3.13, but make sure to upgrade schema as outlined in release notes for v0.3.14.

Stability Fixes

122cad8 - Bugfix: timer queue active processor missing return err (#1000)
9905bf7 - Add jitter to ack level update (#999)

CLI Improvements

2912570 - Fix CLI show history to better display long text (#994)

Schema Changes

Make sure to apply schema changes from v0.3.14 before upgrading to this release. All the schema changes required by new features are backwards compatible. Please make sure to deploy cadence schema 0.9 using cadence-cassandra-tool . Following are the changes which requires 0.9 version of schema:
710b688 - Add customized data map to domain type (#863)
3a79fc0 - Apply 5 min delay for standby task (#920)

v0.3.14 Release

24 Jul 22:17
287a0da
Compare
Choose a tag to compare

Schema Changes

This release includes changes to Cadence core schema. All the schema changes required by new features are backwards compatible. Please make sure to deploy cadence schema 0.9 using cadence-cassandra-tool before deploying this release of Cadence server. Following are the changes which requires 0.9 version of schema:

New Features and Improvements

Cross DC Support (In Development)

We have made a lot of progress on Cross DC replication support for Cadence workflows, but it is still not ready for production release yet. We are code complete on this feature and currently in the testing phase. Please see XDC V1 release for a list of pending tasks we are currently working on before this feature is ready for production. Here is the list of changes which went into this release for XDC support.

  • 8f6952d - only global domain will use the v2 domain table (#831)
  • 0fecb80 - Support for retrying messages within replicator processor (#827)
  • b7d4dea - Multiple bugfixes (#823)
  • 1a167d7 - Ut state builder (#808)
  • 583fe1c - Handle more edge case when apply events in history replicator (#836)
  • 3d019ba - handling case when reset workflow history, the target workflow has already done continue as new (#853)
  • 524538d - some behavior change on worker (#847)
  • a8958f0 - compact duplicate code, add more logging (#846)
  • 278a773 - Add EnableGlobalDomain as dynamic config (#858)
  • 9f78eca - bugfix: non global domain workflow, when started, should not generate replication task (#862)
  • 710b688 - Add customized data map to domain type (#863)
  • 36c617d - retry timer task for 100 times (#872)
  • b501bf1 - Optimize replication task generation (#869)
  • 0e397c5 - optimize queue ack manager, in case the task IDs are not sequencial. (#875)
  • 5530848 - handle DomainNotActive error in timer / transfer queue (#878)
  • 5cdfeb2 - dynamic set the retry for retryable error on worker processor (#883)
  • 622d1a0 - bugfix: buffered replication task should be persisted (#884)
  • 515df70 - Make task processors reselient to LimitExceededError (#891)
  • ec2c6da - Bugfix: failover trigger should also notify activer timer / transfer … (#886)
  • 9ea16c0 - Add more metrics and log, remove some dead code (#893)
  • e5f07f2 - Make worker retry on 2 level, make history replicator allow option to force buffer events (#894)
  • 6edece4 - Update service transient retry error predicate (#904)
  • ce946a7 - Bugfix: worker should not retry on non err (#908)
  • 6fa8589 - Fix replicationTask for updating domainData (#923)
  • 3651ac8 - Replication worker gets stuck on busy retry (#938)
  • 77a9979 - include service tag on logs emitted by worker role (#939)
  • 3c9a409 - Handle service deployment causing buffer events not flushed (#940)
  • 5ab74ef - bugfix: workflow can be reset after finished, while close transfer task is already created, causing panic (#935)
  • fcc9b0a - Reset of mutable state should use event version (#937)
  • cb87bd7 - Simplify replication worker retry logic (#945)
  • 8156cde - Minimize failover impact (#948)
  • 31b98fa - Pass through request timeout to replicator code path (#961)
  • 23ba2b9 - Yarpc error code 13 should not be retriable (#958)
  • 302f239 - Few performance optimization. (#960)
  • a496241 - Add time sync functionality, to move the view of time of a remote cluster (#952)
  • ab7d664 - Bugfix: timer for standby activity is not generated (#968)
  • b21b809 - Adding more metrics, fix some bugs (#965)
  • 152d202 - Only generate replication task for updated which has history events (#973)
  • 3a79fc0 - Apply 5 min delay for standby task (#920)
  • ddd0307 - Clear buffers on resetting of mutable state (#978)
  • 0a3ff55 - Bugfix replication protocol (#979)
  • aa5e689 - Add metrics on shard info (#980)
  • 4769b55 - Bugfix: when buffering events, the last write version should remain the same (#982)
  • 992a81a - Bugfix: rpc call cancel function should not use defer in for loop (#986)
  • 09a1a9c - Bugfix: when failover happen and workflow has pending decision, new events on the active dide (aftter the failover) will be buffered. workflow execution context should first store the standby events (after the failover), then flush the active events. (#991)
  • 287a0da - Handle deletion of task during failover (#976)

Dynamic Config

Converted all Cadence service config knobs to support dynamically updating them based on external config store. Checkout the dynamic config interface in package to bootstrap your own config store to allow dynamically updating service config.

  • 1733b61 - Change cadence config to dynamic (#851)
  • 49fd3c0 - Refactor and fix matching dynamic configs (#857)
  • 57e8196 - Add test to ensure dynamic config key has mapped value (#922)

Stability Fixes

  • 50b5d43 - add jitter to timer / transfer / replication queue sanity scan (#864)
  • 4dbdfd4 - Return ShardOwnershipLostError as catch all from persistence layer (#873)
  • 1844352 - Backoff on shard creation and adjust default number of persistence conns (#876)
  • da1ccd6 - Fix missing decision timeout for transient decisions (#889)
  • f11fff3 - Rate limit request on matching host (#899)
  • 308cfc0 - Deadlock in matching engine (#909)
  • ad7b522 - bugfix: rate limiter is not passed as pointer (#906)
  • 7eaf360 - Fix leaking goroutine in matching (#911)
  • a9a3727 - Fix check idle tasklist to include active task writers (#916)
  • 98ea187 - Use current time as reference when timing out activities (#931)
  • cbbff60 - Add dedupe logic for heartbeat timer creation (#962)
  • 37c98ee - Force close decision on limit exceeded during task processing (#971)

Bug Fixes

  • 85dbdf5 - Resolve #711: CLI panics on domain update when domain doesn't exist (#838)
  • 1276b22 - bugfix: domain cache previously has a hard expiration, which cannot be updated, this expiration, in combination with periodical refresh of domains in v2 table, can cause domain missing for a short period of time. (#855)
  • 11c8e38 - Fix bug in merging domain data (#918)
  • 2b92fd6 - Bugfix: deadlock when domain failover (#919)
  • fd7a1db - Batch of fixes (#927)
  • 1864329 - Fix lock issue for timer ack level (#942)
  • 19e4de2 - Add maximum timeout protection (#946)
  • 17023c7 - Fix flaky test for visibility (#977)
  • c4650a1 - Fix SignalWithStart open visibility not recorded (#974)
  • c19a5e6 - Whitelist context deadline exceeded error for retry (#981)
  • 9b832fa - Change protection of decision timeout to warn instead of reject (#993)

Operational Improvements

CLI Improvements

  • c52d560 - CLI: support domain data operation for xdc (#930)

Misc

v0.3.13 Release

11 Jun 19:50
8d47f5a
Compare
Choose a tag to compare
v0.3.13 Release Pre-release
Pre-release

Change log:
8d47f5a V0.3.12 patch (#829)
476c9bb Prevent duplicate user timer creation (#832)
8f6952d only global domain will use the v2 domain table (#831)
8441b49 Admin CLI: add describeHistoryHost (#826)
dccf888 Add ActivityScheduleTimeout deduction logic (#822)
7154288 Fix CLI parseTime for listworkflow (#824)
34c6e50 Fix print event version in CLI (#821)
5f40ae1 CLI: add history event version and full detail options (#817)
0d5583f Use workflowID as partition key on replication message (#820)
b947875 bugfix: should use shard's domain notification version (#818)
9eaa3dc Conflict Resolver bugfix (#816)
54e34f4 Reliable Domain Change Notification (#777)
42f44b6 Fix bug in adiminCLI: convert domain name to donmainID using domainCache (#812)
6573843 Always enable sticky when worker ask new task from complete (#811)
166ef58 Ut conflict resolver (#806)
772d653 Implement describeMutableState (#805)
1eb53c3 fix typo in getMutableState (#801)
a7ebd05 Multiple bugfixes (#803)
dd00c27 Fix error during Ringpop refresh (#802)
edfa972 Mutiple Bugfixes (#794)
4ae0147 CrossDC bugfixes to replication task generation and conflict resolver (#799)
ad90819 Update and move dockerfile-cli, add to auto build (#793)
076fb3d Add Dockerfile for CLI (#730)
0aef123 Add tips/directions for prod setups with cadence-cassandra-tool (#788)