Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Support for Replicated Objects #21

Open
tmagrino opened this issue May 16, 2018 · 0 comments
Open

Language Support for Replicated Objects #21

tmagrino opened this issue May 16, 2018 · 0 comments

Comments

@tmagrino
Copy link
Member

I've found it useful to use a pattern for creating replicated objects in both Fabric and FabIL to allow for read-mostly objects (or just individual/subsets of fields) to be replicated on stores closer to workers, improving read latency at the cost of coordinated writes. Ideally we could make this a feature of the language that the compiler can produce the necessary code for. It probably also could provide the foundation for exploring extensions for merge-based objects like CRDTs.

Design Sketch

I'm envisioning this being exposed in the language as the ability to take a primary object o, a traditional object in Fabric, and request a replica object r based at store s using a syntax r = o.replicaAt(s). In this design sketch we will refer to the collection of object o and all existing replicas r as the replica group. For the purposes of type checking, all replicas r have the same type as the primary o.

replicated fields

Replica objects replicate fields marked either as final or with a new replicated flag in the class definition and otherwise acts as having the same type as o. A replica object r is defined to provide the same methods as the primary object o and contain (consistently replicated, as discussed below) copies of the final and replicated fields and uses forwarding accessor methods for other fields, which read the field from the primary object.

Unlike final fields, replicated fields support updates, performed through either replicas or primaries. These writes are performed as coordinated writes to all objects in the replica group.

References to replicated objects

References to replica objects are distinct from primary objects, so other objects can directly point to a nearby replica directly rather than risking a remote fetch to a far away primary.

A possible exception to this behavior we might consider is the case of references stored in replicated fields. In these cases, it may be desirable for assignments to these fields to result in an implicit replica to be created. However, it's unclear if this is beneficial if the referenced type has no replicated fields itself.

Compiler support

The compiler can support this feature at the FabIL level by generating, similar to Proxy types, a ReplicatedXXX type with the necessary final and replicated field copies, forwarding accessor definitions for other fields. Furthermore, after these are generated, both the original type and the generated ReplicatedXXX types would be given generated specialized set$x definitions for writes to replicated fields, an additional internal (replicated) field group for holding the map from stores to objects making up the replica group, and replicaAt factory methods. I believe this would be performed early in the compiler passes, before _Impl and _Proxy types are generated but after type checking.

One possible optimization is to make replicaAt factory methods be defined as return this; in the case of classes with no replicated or final fields, as well as avoiding adding additional fields for tracking replica groups.

Possible Future Extension to Support for Merge Semantics?

I believe this design could be extended to support merge semantics, like in the case of CRDTs. I'm not 100% sure what's the best design for this but here's an incomplete list of a couple of rough proposals to stimulate discussion:

  • Allow an override of the default coordinated write behavior for replica groups by providing a custom merge(O updated, Collection<O> rest) operator definition in the primary object's class O definition. This merge operation then acts as a replacement for the default boiler plate in the compiler.
    Open Questions: how do merge operations compose both in terms of inheritance and recursive calls to merge operations for field values?
    Benefits: simple and flexible.
    Drawbacks: anticipated clunkiness of merge definitions for more complicated data structures.

  • Allow an override of the default coordinated write behavior for each replicated fields f with type T in the replica groups by providing a custom merge.f(T newValue, T oldValue, Collection<T> groupValues) operator definition in the primary object's class O definition, where newValue is the value being written to the field, oldValue is the existing value for the field in the current object and groupValues is the collection of existing values for the field in the rest of the replica group. This merge operation then acts as a replacement for the default boiler plate in the compiler for the write to that field.
    Open Questions: similar to the previous proposal how do merge operations compose both in terms of inheritance and recursive calls to merge operations for field values?
    Benefits: simple limited scope of update for merge logic.
    Drawbacks: a lot more code overhead and awkward/no support for coordinating updates of multiple related fields.

  • Allow for merges to be defined as lazy merged.r(Collection<O> replicaGroup) accessors, which perform necessary merging between the current replica group values before returning (possibly updating the replica group's values).
    Open Questions: Should we expose a method for customized writing back updates to the replica group?
    Benefits: allows per field merging while exposing all fields for updates which depend on related fields.
    Drawbacks: I'm concerned this will require replicated private flags for indicating when merges should act as more than nops, which is clunky.

  • Others?

Proposed Milestones

I believe this support can be rolled out in 2-3 phases:

  1. Implement replication support for final fields. This would require creating the necessary compiler passes for constructing replica class definitions, adding replica group maps to the object definitions, and providing replicaAt definitions. This style of immutable data replication has been proposed before and would act as a nice intermediate step while testing and improving on the implementation for supporting this feature.
  2. Add support for replicated fields. This would require introducing the specialized set$ operators in both primary and replica types in the compiler. This would complete the bulk of what's proposed here and be beneficial in a variety of existing and proposed applications.
  3. Extended support for merge semantics. This can act as an experimental opportunity for researching language support for this style of distributed programming.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant