Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose operations from TaskChampion #373

Open
djmitche opened this issue Aug 28, 2022 · 19 comments
Open

Expose operations from TaskChampion #373

djmitche opened this issue Aug 28, 2022 · 19 comments
Assignees
Labels
topic:per-task-ops Issues related to exposing per-task operations
Milestone

Comments

@djmitche
Copy link
Collaborator

djmitche commented Aug 28, 2022

Background

Every change to a task is represented internally as an "Operation"
https://github.com/GothenburgBitFactory/taskwarrior/blob/41992d484909bd865bc252e99e588c5c3f37c71a/taskchampion/taskchampion/src/storage/op.rs#L10-L38
This is how we accomplish task sync -- different replicas share the set of operations they have applied, and resolve any conflicts in a reasonable, consistent way.

Note that a single command-line invocation may produce several operations. For example

task 123 start +inprogress prio:M

would create four Update operations: one to set the start property to the current time, one to add the tag_inprogress property, one to set the priority property to M and one to update the modified property to the current time.

There are also UndoPoint operations, which do not actually change anything but act as a signpost for how many operations to revert when running task undo -- that command basically reverts operations until it reaches an UndoPoint. The UndoPoint operations are not sent to the server.

Once operations are successfully sent to the server, they are no longer stored locally. This avoids the local task database growing forever.

Currently, none of this is visible in Taskchampion's public API. This bug would change that.

Motivation

We'll need this for a few reasons:

The task undo command previously showed a "diff" of what it would change, and asked the user for permission. This is not possible with the current Taskchampion API (see GothenburgBitFactory/taskwarrior@4b814bc). But if there was a way to query Taskchampion to say "hey, what are the operations back to the latest UndoPoint?" then it could generate a diff view from that.

The task info command has historically used the undo information to show the history for a particular task. This operated by scanning the entire undo history for records matching that task's uuid, and formatting those nicely for display. Again, that's currently impossible (support was removed in GothenburgBitFactory/taskwarrior#3060). If there was a way to query Taskchampion to say "hey, what operations have happened for this task?" then we could bring back that functionality.

Looking forward, one of the things we've considered in #372 is to allow a user of the Taskchampion API to provide a list of operations to apply, instead of calling methods like task.start(..) or task.set_description(..). A different possibility we've considered is that a user of the Taskchampion API could call methods like task.start(..) or task.set_description(..) and build up a list of operations, then "commit" those all at once. This would allow the task command to confirm changes with the user before committing them, for example. All of this paragraph is out of scope for this issue, but should be kept in mind when designing the API.

Details

Off the top of my head, I think this can be broken into two parts:

Operations for Undo

  • Making the ReplicaOp type public
  • Defining an API on the Replica type for fetching all operations back to the latest UndoPoint, and implementing that through the storage layer.
  • Adding a C interface in taskchampion-lib
  • Adding a C++ interface
  • Writing some C++ support for printing a list of operations as a "diff"
  • Using this for confirmation of task undo

Operations for Info

all of the above, and..

  • Updating the Taskchampion storage backend to store operations for each task for as long as that task exists. This is different from just storing all operations in the order they occurred, because once a task is deleted we want to also delete its associated operations. This will require changing the database schema.
  • Making that information available from the Replica type
  • Adding a C interface in taskchampion-lib
  • Adding a C+ interface
  • Using this to show a task's history in task info when the journal.info config is enabled.
@djmitche djmitche self-assigned this Aug 28, 2022
djmitche referenced this issue in djmitche/taskwarrior Sep 25, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Sep 25, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Oct 10, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Oct 10, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Nov 12, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Nov 12, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Nov 16, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Nov 16, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Dec 18, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Dec 18, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Dec 23, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Dec 23, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Jan 19, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Jan 19, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Feb 4, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Feb 4, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Feb 9, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Feb 9, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Apr 9, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Apr 9, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in GothenburgBitFactory/taskwarrior Apr 16, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Apr 19, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Apr 30, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior May 28, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in GothenburgBitFactory/taskwarrior Jun 11, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
@djmitche
Copy link
Collaborator Author

@masaeedu is considering working on this, so I'll un-assign from myself.

@djmitche djmitche removed their assignment Jun 19, 2023
@djmitche
Copy link
Collaborator Author

djmitche commented Aug 5, 2024

Status update: #372 has handled making operations a first-class thing in TaskChampion (not just for undo). What remains for this issue is:

  • Store operations for all tasks, even after those operations are sync'd
  • Provide an API to access those operations

Then Taskwarrior can read those operations and construct a journal in the task info command.

@djmitche djmitche removed this from the v0.7.0 milestone Aug 5, 2024
@djmitche djmitche added the topic:per-task-ops Issues related to exposing per-task operations label Aug 5, 2024
@djmitche djmitche added this to the v0.8.0 milestone Aug 5, 2024
@djmitche
Copy link
Collaborator Author

djmitche commented Aug 25, 2024

Progress / To-Do:

  • Add a test for upgrading from 0.7.0 DBs

  • Add an 'operations.uuid' column, generated and virtual

    This column is not stored separately, but allows indexing operations by
    uuid without further modification or redundant storage.

  • Add get_task_operations

    Thread this method through from Replica, via TaskDB, to Storage.

  • Replace set_operations with sync_complete (for sync) and remove_operation (for undo)

  • Add an operations.synced column, to track which operations have been sync'd to the server, with an index, and update storage methods accordingly.

  • Delete operations with no matching task when sync is complete

Think carefully about:

  • Once an operation has been sync'd, it can't be undone by just removing it; instead, it must be reversed (c.f., Make commit_reversed_operations more capable. #427)
  • What happens when a task is deleted with un-sync'd operations? This will probably always happen (assuming the Delete operation is added before the deletion occurs). Perhaps sync needs to finish with a (slow) query for all un-affiliated sync'd operations and delete them?
  • Need to check the performance of all of this
  • Other things I haven't thought of yet

@djmitche
Copy link
Collaborator Author

OK, that work is done. However, I just realized that this will only store local operations, not those that arrived via sync. So, a little more work is required!

@djmitche
Copy link
Collaborator Author

Also, UndoPoints are never deleted (since they do not correspond to a task). They should probably be deleted when the sync is complete.

@djmitche
Copy link
Collaborator Author

It's not obvious how to represent operations performed on a task in the presence of sync. Recall this diagram from sync.rs:

// base state-*
// / \-server op
// * *
// local / \ /
// ops * *
// / \ / new
// * * local
// local / \ / ops
// state-* *
// new-\ /
// server op *-new local state

To faithfully reproduce the set of operations on a task, we need to choose a path through that diagram.

The operations considered in the sync process are SyncOp, which means that unlike Operation they do not have an old_value for Updates or old_task for Deletes. But, we need to have accurate old_.. information for displaying task info -- for example, status changing from pending to completed is a "Done" event. So, we need a path through the diagram which allows reconstruction of the Operation. In particular, this requires a task database in the appropriate state. In the diagram, the leftmost * is the current task database (local state). The new server op operation can be expanded to an Operation by fetching old_value or old_task from the local state before modifying it.

So, I think the task history will follow the left-most side of the diagram, meaning that local operations will always precede remote operations.

One consequence is that operations may not be in chronological order. Imagine I sync daily at midnight, and I make a change on my home machine at 9am, and a change on my work machine at 10am. When those sync together at midnight, the local change will appear first. So on the home machine I will see [9am change, 10am change], while on the work machine I will see [10am change, 9am change].

In fact, if those two changes interfere with one another (say, home sets the status to deleted and work to completed), the two machines will see different operations. The home machine will see [9am pending -> deleted, 10am deleted -> completed] while work will see [10am pending -> completed], with the 10am change having been preferred due to its later timestamp:

) if uuid1 == uuid2 && property1 == property2 => {
// if the value is the same, there's no conflict
if value1 == value2 {
(None, None)
} else if timestamp1 < timestamp2 {
// prefer the later modification
(None, Some(operation2))
} else {
// prefer the later modification or, if the modifications are the same,
// just choose one of them
(Some(operation1), None)
}
}

I think this is OK, and I don't think OT can give us a better solution, other than not trying to (ab)use the ops from OT as history. That is, an alternative is to add a new AddTaskHistory(uuid, timestamp, change) operation which never conflicts with anything and which simply adds (timestamp, change) to the local task's history. Once all replicas are up-to-date, they would all have the same set of history elements. But, that's a lot of duplicated information to store and sync.

@ryneeverett what do you think?

@ryneeverett
Copy link
Collaborator

It seems that there are two behaviors you're not completely satisfied with:

  1. non-chronologically ordered operations
  2. inconsistent operations between replicas

It seems to me that these issues could potentially be treated separately:

  1. If we're talking about this in the context of task info, why couldn't taskwarrior sort the operations by timestamp?
  2. I understand the AddTaskHistory solution you're proposing and the duplication cost seems unacceptable. Would it be possible to add the "skipped" server operations to the replica's operations database without applying it to the working set? And would those operations get picked up by get_task_operations? I don't think you've published the work you outlined above yet so I haven't seen the implementation to know whether that makes any sense.

At the end of the day though, I agree that the behavior issues you're describing are OK if they're not reasonable to solve. I'm not sure anyone expects task info to be perfect and to a certain extent it should be unsurprising that an offline-first decentralized system has an inconsistent historical perspective.

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 1, 2024

Good summary, thanks! I'm not so worried about (1) - sorting would work, as long as it's understood that the old_value for an operation may not correspond to the value from a previous operation on the same task property.

Adding an operation to the operations table without applying it would violate the invariant that the replica state is the result of all operations. However, maybe we could add an operation in such a way that the invariant was always preserved?

Thinking that through..

  a
 / \
b   c
 \ /
  d

State a is the shared base state, with the task at status pending. a -> b is the home change, setting the status to deleted at 9am. a -> c is the office change, setting the status to completed at 10am. d is the shared final state, with task status completed.

On the home replica, when performing the sync, a -> b (deleted at 9am) already exists in the set of operations, and we are in state b, needing to get to state d, so the required operation b -> d is completed at 10am. That part works fine.

On the work replica, a -> c (completed at 10am) already exists in the set of operations, and we are in state c, needing to get to state d. Both of those states have the task status completed, so no change is required and the current implementation creates no change.

What if the transpose method could return two operations, to replace both the local and remote operation? Then on the home replica, it would return a -> b unchanged, deleted at 9am, and b -> d as before, completed at 10am. On the remote replica, though, it would return deleted at 9am followed by completed at 10am -- still producing state d, so the invariant is upheld.

That involves "editing" history a little bit, and I'm a little worried that it means the remote replica no longer went through state c, but maybe that's OK? I can try to implement this and see how it goes.

In the interim, though, I will start making PRs for the earlier part of this work.

@ryneeverett
Copy link
Collaborator

What if the transpose method could return two operations, to replace both the local and remote operation? Then on the home replica, it would return a -> b unchanged, deleted at 9am, and b -> d as before, completed at 10am. On the remote replica, though, it would return deleted at 9am followed by completed at 10am -- still producing state d, so the invariant is upheld.

I'm assuming "the transpose method" refers to Syncop::transform. It isn't really apparent to me why updates of the same property are considered a "conflict" per se in the section you linked. Yes, we do need to pick a winner, but couldn't that mean swapping the order in which they're applied rather than omitting one? So on each replica it could look like:

  a
 / \
b<->c
 \ /
  d

I'm not sure if I'm restating what you're saying or missing the point.

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 1, 2024

Yes, that's the method I mean. It's a conflict in the sense that the two replicas have diverged, and the effects of one of the changes will be removed. The word is probably not too important.

What you're suggesting is what I want to do, but the tricky thing is that the task db is in either state b (home) or state c (office) when the transpose method runs, meaning that the a -> b or a -> c operations have already been applied. So, changing the definition of that operation seems like "cheating", but I think it doesn't mater as long as the state ends up at d on both replicas.

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 2, 2024

OK, after some scratch-pad diamond diagrams, I think I have a workable solution that is not much different from the current solution. The transform method will, when a conflict occurs, return the "winning" operation for both op1' and op2', but also return the "losing" operation and which side it came from, maintaining the invariant that applying the losing operation followed by the winning operation has the same effect as applying only the winning operation.

The sync process can then keep a list of these "losing" operations for each side (local and remote). Locally, it would append that to the set of operations in the DB before applying the transformed operations from the server. Those transformed operations from the server will be winning operations, so by the invariant they will overwrite the effects of the losing operations. The sync process would prepare a remote version containing the "losing" local operations followed by the transformed local operations. Again, the invariant ensures that the result is the same as if the losing operations were not included at all.

I won't draw out the whole diamond, but the following situation demonstrates the results:

  • Transformed local operations: [modified=2 at 10am, status=C at 10am, status=P at noon]
  • Transformed remote operations: [modified=1 at 9am, status=D at 9am, modified=3 at 11am]

In the existing implementation, the transformed remote operations would be [modified=3 at 11am] and the transformed local operations would be [status=C at 10am, status=P at noon].

In the new implementation:

  • Losing local operations: [modified=2 at 10am]
  • Transformed local operations: [modified=3 at 10am, status=C at 10am, status=P at noon]
  • Losing remote operations: [modified=1 at 9am, status=D at 9am, status=C at 10am]
  • Transformed remote operations: [modified=2 at 10am, status=P at noon, modified=3 at 10am]

Both locally and remote, we end up with modified=3 and status=P, so the replica invariant is upheld. The only downside is, operations are repeated. In fact, sometimes a local operation ends up in the losing remote operations list (here, status=C at 10am) when it wins and then loses a conflict. I think all of this is manageable -- the display in task info will need to remove duplicates when sorting by timestamp.

Thinking about compatibility with 0.7.0, the only remotely-visible change here is a bunch of "losing" operations prepended to each version, with the invariant that another operation in the same version makes those losing operations irrelevant. So, no problem for a 0.7.0 replica. And if a new implementation gets a version generated by a 0.7.0 replica, it will just be missing the "losing" operations, so the ultimate task state will be correct, just with some missing historical operations -- which is about the best we can ask for.

I was concerned earlier about getting the old_value and old_task properties right. These fields are only required for undo, not task history, and I think they can be safely left empty after a sync. Which is good, because there's no good way to set those properties for losing operations!

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 2, 2024

Thinking a little about the duplication of operations, mentioned in #449 -- that would only happen on conflicts, and those are unusual, so even with arbitrarily-large operation values, I don't think this would lead to significantly more space used within a replica.

Currently, if a local and remote Update operation conflict, and the remote operation "wins", then nothing is sent to the server. With the suggested change, the losing operation and the winning operation would both be sent (the losing operation so other replicas are aware of it, and the winning operation so the final task state is correct). So, there is a good bit more data sent to the server, but at least half of that (the "losing" operation) is precisely what's required to get the same task history everywhere.

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 8, 2024

I think I need to finish the above-discussed functionality before landing the remaining patches, just to keep main stable and well-behaved.

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 9, 2024

I've called this the "WINNING INVARIANT":

applying the losing operation followed by the winning operation has the same effect as applying only the winning operation

However, given operations Create(uuid) and an Update(uuid, prop, value), neither "winner" would satisfy that invariant. Admittedly, a Update informally implies that the task had already been created when it was applied, but the operational transform needs to be able to deal with any operations in any state, so that informal implication feels like the kind of "cheat" that creates strange bugs years later.

I need to think about this some more. It's possible this whole idea doesn't work :(

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 9, 2024

Maybe the trick is to redefine how the operations apply so that every combination can satisfy the WINNING INVARIANT.

I think the critical change would be that an Update should create the task if it does not exist. Then Update can win over both Create and Delete.

Create Update Delete
Create Redundant Update Wins Delete Wins
Update Latest Wins Delete Wins
Delete Redundant

Here are the WINNING INVARIANT statements for each conflict:

  • Create vs. Update: Create followed by Update has same effect as Update ✔️ (task exists with updated value)
  • Create vs. Delete: Create followed by Delete has same effect as Delete ✔️ (task does not exist)
  • Update vs. Delete: Update followed by Delete has same effect as Delete ✔️ (task does not exist)

This sort of suggests that we never needed the Create operation to begin with, but I don't think that's worth revisiting at this time.

@djmitche
Copy link
Collaborator Author

Preliminary work implementing this suggests it's going to work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:per-task-ops Issues related to exposing per-task operations
Projects
Status: In progress
Development

No branches or pull requests

2 participants