Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(adr): add adr for reassign evolve #74

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .prettierrc
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@
"semi": true,
"printWidth": 80,
"singleQuote": true,
"trailingComma": "all"
"trailingComma": "all",
"proseWrap": "always"
}
157 changes: 157 additions & 0 deletions docs/madr/1-reassign-evolve.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Reassigning ANTs as Evolve/Upgrade Mechanism

- Status: Proposed
- Approvers: [Ariel], [Dylan], [Atticus], [Phil], [David]
- Date: [2025-01-30]
- Authors: [Atticus]

## Context and Problem Statement

Currently to evolve the ANT we use AOS `Eval` - this introduces the problem of
evaluating what the current version of the code is. If we maintain a variable in
the state of the process containing a version number, that tells us only what
that version number is. At any point the user - owner - of the ANT may `Eval`
new code into the process resulting in changes to the process that may not
reflect what we expect the version number we assigned to reflect. To overcome
this we currently use dryrun interactions on APIs and analyze the results to see
if the APIs perform as expected - this is very inefficient as dry run calls to
CU's are expensive in networking client side, as well as for the CU.

## Decision Drivers

- Improve the performance of ANT loading client side.
dtfiedler marked this conversation as resolved.
Show resolved Hide resolved
- Ensure the upgrade process is more robust by using immutable versioning, such
as module IDs.
- Reduce computational load on CU infra.
- Mitigate risks associated with reassigning ANT state and ARNS records.

## Considered Options

1. **Current Approach (Baseline)**
- Multiple dry runs per ANT.
- ETH compatibility checked via handler analysis.
- Upgrade involves loading Lua code.
2. **Leverage Re-Assign Name and \_boot capabilites to upgrade ANTs**
- Use the module ID to evaluate process version.
- Upgrade via spawning a new ANT and reassigning ARNS records.

## Decision Outcome

### Chosen Option: **Leverage Re-Assign Name and \_boot capabilites to upgrade ANTs**

With the ARIO process now supporting
[`Reassign-Name`](https://github.com/ar-io/ar-io-ant-process/pull/26) and recent
additions to add [`_boot`](https://github.com/ar-io/ar-io-ant-process/pull/57)
to fork the process state to a whole new ANT, we can assign that new ANT as the
registered ANT to the specified ArNS name in the registry, and using the module
ID of the process to identify process version and capabilities, and solely the
module ID.

This optimized approach reduces computational overhead and improves caching for
better performance. It also shifts to a more robust upgrade mechanism using
immutable module IDs rather than version detection heuristics. We can do this
now that we compile and publish our own WASM binaries, rather than using an AOS
binary and loading our code into it.

```mermaid
---
config:
theme: dark
---
sequenceDiagram
autonumber

participant ARIOGateway
participant Owner
participant OldAnt
participant NewAnt
participant AntRegistry
participant ARIOProcess


Owner ->> ARIOGateway: Get Process Meta from GQL
ARIOGateway -->> Owner: Process Meta GQL Result
Owner->>OldAnt: Get state
OldAnt-->>Owner: Return state

rect rgb(150,50,50)
break Process up to date
Owner ->> ARIOProcess: No upgrade needed - Module ID from process meta is up to date
end

end
rect rgb(50,150,100)
activate ARIOProcess
Owner->>NewAnt: Spawn new ant with old state
rect rgb(50,50,50)
loop Polling
Owner->>AntRegistry: Check if new ant is registered
AntRegistry-->>Owner: Not registered yet (retry)
end
end
AntRegistry-->>Owner: Return ACL list including the New ANT ID
Owner->>OldAnt: Send reassign message
OldAnt->>ARIOProcess: Forward Reassign Message
ARIOProcess-->>NewAnt: Reassign-Notice
deactivate ARIOProcess
end
```

### Positive Consequences

- **Performance Improvement:** Reducing dry runs significantly speeds up ANT
dtfiedler marked this conversation as resolved.
Show resolved Hide resolved
loading.
- **Immutable Module Versioning:** Eliminates issues with heuristic version
analysis.
- Downstream clients can identify and validate the capabilities of a module
(and by extension the ANT itself) has by maintaining a map of modules that
are capables of executing target workflows
- **Reduced CU Load:** Fewer computations per ANT.
dtfiedler marked this conversation as resolved.
Show resolved Hide resolved
- **Gateway ANT Module Whitelisting:** Gateways can decide which ANT Modules
they support.

### Negative Consequences

- **Complicates Memory Usage on ANT Registry:** More ANTs means more state in
the ANT registry, which currently doesn't have a cleanup process.
- **Potential Integration Issues:** Existing integrations relying on fixed ANT
IDs may break.
- **State Limitations:** Large ANT states (e.g., >2000 undernames) may fail to
bootstrap on spawn, meaning we need to understand the limitations there and
decide what amounts we wish to support.

## Pros and Cons of the Options

### Current Approach

- `+` Already implemented and functional.
- `+` Ensures ETH compatibility via handler checks.
- `-` Multiple dry runs increase computational cost.
- `-` Upgrade process relies on mutable version checks.

### Optimized Approach (Chosen)

- `+` 3x faster ANT loading.
- `+` Reduces CU consumption.
- `+` Enables robust versioning.
- `-` Introduces new risks related to state migration and memory management.

## Links

- [Ariel]: https://github.com/arielmelendez
- [David]: https://github.com/djwhitt
- [Dylan]: https://github.com/dtfiedler
- [Atticus]: https://github.com/atticusofsparta
- [Phil]: https://github.com/vilenarios

## Notes

- Further analysis is required for large ANT states.
- Records
- Controllers
- Consider preemptive PRs for integration updates.
- for example permaweb deploy
- Implement observation tools to track stale ANTs in the ANT registry
- Fastest solution is probably adding a `prune` handler to the ANT registry
that consumes process IDs and removes them, then use a cron job in gh
actions to pull the ArNS records and decide which ANT ids to prune
97 changes: 97 additions & 0 deletions docs/madr/template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# [short title of solved problem and solution]

- Status: [proposed | rejected | accepted | deprecated | superseded by
[ADR-5](5-example.md)]
- Approvers: [list everyone involved in the decision]
- Date: [YYYY-MM-DD]
- Authors: [list of authors]

## Context and Problem Statement

[Describe the context and the problem that needs to be solved, capturing the
background and why this decision is necessary.]

## Decision Drivers

[Identify key factors that influence the decision, such as
requirements, constraints, and other considerations.]

- [driver 1]
- [driver 2]
- [driver 3]

## Considered Options

[List the options that were considered to address the problem, providing a brief
overview of each.]

- [option 1]
- [option 2]
- [option 3]

## Decision Outcome

[Describe the decision that was made, including why it was chosen over the
other options.]

### Positive Consequences

[Highlight the benefits and positive outcomes expected from this decision.]

- [consequence 1]
- [consequence 2]

### Negative Consequences

[Identify any drawbacks or negative outcomes that might result from this decision.]

- [consequence 1]
- [consequence 2]

## Pros and Cons of the Options

[Compare the pros and cons of each considered option.]

### [option 1]

- `+` [pro 1]
- `+` [pro 2]
- `-` [con 1]
- `-` [con 2]

### [option 2]

- `+` [pro 1]
- `+` [pro 2]
- `-` [con 1]
- `-` [con 2]

### [option 3]

- `+` [pro 1]
- `+` [pro 2]
- `-` [con 1]
- `-` [con 2]

## Links

[Include any relevant links to documents, discussions, or other resources that
provide additional context or background information.]

- [link 1](url)
- [link 2](url)

## Related Decisions

[List any related ADRs or decisions that are connected to this one.]

- [ADR-1](1-example.md) - [Title of ADR-1]
- [ADR-2](2-example.md) - [Title of ADR-2]

## Notes

[Include any additional notes or comments that are relevant to the decision.]

---

[ADR Template]: https://adr.github.io/
Loading