Skip to content

Commit

Permalink
Merge remote-tracking branch 'refs/remotes/origin/Migration' into Mig…
Browse files Browse the repository at this point in the history
…ration
  • Loading branch information
sjcjoosten committed Jul 6, 2023
2 parents 27e1c6e + 292e3f6 commit 7eddcfb
Show file tree
Hide file tree
Showing 3 changed files with 1,207 additions and 1,264 deletions.
163 changes: 92 additions & 71 deletions 2022Migration/articleMigration.tex
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,12 @@ \subsection{Information Systems}

Information systems are typically used by actors (both users and computers) who are distributed and work with data all the time.
As a consequence, the data in a system changes continually.
In practice, actors ``talk to'' the system through an ingress mechanism, which connects each user to the right service(s).
The ingress function is provided by the deployment platform, so it is beyond the scope of this paper.
Also, every service runs independent of other services,
In practice, actors ``talk to'' the system through an ingress mechanism,
which routes the traffic between each user and the required application service(s).
The ingress function is used during migration to switch users from the old to new functionality.
In this research we assume that the deployment platform provides the ingress mechanism,
so developers merely have to configure it.
Also, we assume that every service runs independent of other services,
meaning that each service can be stopped, (re)started and substituted without disrupting the other services.
For practical purposes, this presumes a platform of managed services, such as Kubernetes,
even though the theory in this paper is not specific to any platform.
Expand Down Expand Up @@ -651,58 +654,89 @@ \subsection{Example}
% Requirement~\ref{eqn:satisfaction} is still satisfied because now there is a role to maintain rule {\tt EnrollRule}.
% The run-time engine will no longer enforce satisfaction of rule {\tt EnrollRule}, but leave it up to an administrator.

\subsection{Process of Data Migration}
\section{Process of Data Migration}
Sections~\ref{sct:Information Systems} and~\ref{sct:Example existing IS} introduce a single information system.
The transition from an existing system $\infsys$ to the desired one $\infsys'$ is the topic of this section.
Let us first sketch the steps to migrate system $\infsys$ to $\infsys'$.
\begin{figure}
\centering
\includegraphics[scale=0.09]{figures/Before migration.png}
\caption{Initial state of the migration}
\label{fig:initial}
\end{figure}
Figure~\ref{fig:initial} shows the initial state of the migration,
in which the existing information system $\infsys$ has data set $\dataset$ and schema $\schema$.
At the start of this migration, $\dataset$ satisfies all rules of $\schema$ (by requirement~\ref{eqn:satisfaction}).
\begin{enumerate}
\item Let the existing information system $\infsys$ have data set $\dataset$ and schema $\schema$.
A developer defines a new information system, $\infsys'$, which contains a data set of its own, $\dataset'$, a schema $\schema'$, and some functionality\footnote{In this paper, we only consider the functionality captured by rules.} as illustrated in figure~\ref{fig:post-migration}.
After the migration, the resulting system will be $\infsys'$, enhanced with the triples from $\dataset$ that satisfy $\schema'$,
and with additional changes made by the migration engineer (see step~\ref{item3}) and by users who have been using the system during the migration.
$\dataset'$ may contain some new data,
for instance, to initialize new features at the moment of transition (MoT).
If schemas $\schema$ equals $\schema'$, the data migration is unnecessary, so in the sequel we assume they are different.
\item The migration engineer uses a software generator to generate a migration script, $\migrsys$,
This is a system that includes both $\infsys$ and $\infsys'$.
$\migrsys$ is meant to migrate the data and define the semantics during the migration phase.
It contains enforcement rules that migrate existing data to the desired data set.
It also contains data structures and rules that are needed for the duration of the migration phase.
% The generated script must be free of type errors, to ensure that $\sat{u}{\dataset}$ exists for every rule $u$ and data set $\dataset$.
% The proof is available \href{https://www.isa-afp.org/}{here}
\item\label{item3}
In many cases, the schema $\schema_{\migrsys}$ will be enough to perform the migration.
However, a migration engineer may have specific requirements that call for changes in $\migrsys$.
For that purpose, the generator generates the migration script as an Ampersand script,
so the migration engineer can accommodate such requirements.
This can be necessary, for example, to resolve data pollution that violates a constraint that was not in schema $\schema_{\infsys}$.
She can either implement enforcement rules or use the business to resolve the issues.
She can test the resulting migration script, $\migrsys'$ separately before taking the desired system to production.
The migration engineer alters $\schema$ nor $\schema'$ while adapting $\migrsys'$ to the specific needs of the current migration.
\item Then, the migration engineer deploys system $\migrsys'$.
This event starts the copying of data from $\dataset$ to $\dataset'$, as specified in $\migrsys'$ (figure~\ref{fig:migration phase}).
During this phase, the ingress is still referring users to the existing system,
so they will hardly notice that the desired system is being fired up and loaded with data.
Users are still changing $\dataset$ (both inserts and deletes),
but these changes are being transferred to $\dataset'$ as well.
This step is complete when all data from $\dataset$ has been copied to $\dataset'$ and all invariants of the migration system are true.
\item At this moment, the ingress switches the incoming traffic from $\infsys$ to $\migrsys'$,
so $\migrsys'$ effectively takes over.
This event marks the MoT.
Since $\infsys$ may still have some violations due to last-minute edits,
it may take some (short but finite) time until the invariants of $\infsys$ are true.
After that, dataset $\dataset$ will no longer change.
\item From this moment, users experience the functionality of the migration system $\migrsys'$.
This system must ensure that all invariants of $\schema'$ are true before the migration is complete.
It no longer ensures the rules of $\schema$, which govern the existing system.
Since $\dataset$ is no longer changing, the migration engineer can now focus on the remaining violations of $\schema'$.
Users of the migration system will deal with process rules in the migration system that are still violated.
Once all invariants of $\migrsys'$ are true, the migration is complete.
This state marks the moment of cleanup (MoC)
\item After the MoC, the ingress switches the incoming traffic from $\migrsys'$ to $\infsys'$.
The migration system is no longer needed and can be taken out of production, together with the existing system $\infsys$.
The desired system remains in production and the migration is finished.
This establishes the post-migration situation (figure~\ref{fig:post-migration}).
\item The migration starts with a developer%
\footnote{The developer role may stand for a team of developers, a single developer, a migration engineer, or whoever defines the desired system.}
who specifies a new information system, $\infsys'$.
Let us call this moment the start of the migration (SoM).
$\infsys'$ contains a data set of its own, $\dataset'$, a schema $\schema'$, and some functionality%
\footnote{In this paper, we only consider functionality that is captured by rules.}.
Before replacing $\infsys$ by $\infsys'$, the data from $\dataset$ must be migrated to $\dataset'$
and all invariants of $\schema'$ must be satisfied.
We assume that schemas $\schema$ and $\schema'$ are different, or else the migration would be unnecessary.
\item A migration engineer%
\footnote{The migration engineer may also stand for a team, which may overlap with the developer role.}
uses the Ampersand compiler to generate a migration script from schemas $\schema$ and $\schema'$.
This script specifies a migration system, $\migrsys$, that includes $\infsys$, $\infsys'$ and a number of migration rules.
$\migrsys$ is meant to migrate the data and ensure that $\dataset'$ satisfies all invariants of $\schema'$
before taking $\infsys'$ to production.
\item\label{item3}
In specific cases, $\migrsys$ suffices to perform the migration automatically.
However, a migration engineer may have specific requirements that call for changes in $\migrsys$.
For that purpose, she may want to alter the migration script to accommodate such requirements, yielding another migration script $\migrsys'$.
This can be necessary, for example, to resolve data pollution that violates a constraint that was not in schema $\schema_{\infsys}$.
She can implement such adaptations as enforcement rules, which are executed automatically.
But she can also choose to specify process rules that require business users to resolve issues interactively.
Human involvement is often needed to implement a desired migration strategy.
In that case the migration system must ensure that there are finitely many steps to complete the migration.
The migration engineer must ensure that $\migrsys'$ can satisfy all invariants in $\schema'$, either by enforcement rules or by process rules.
She can test $\migrsys'$ separately before taking the desired system to production.
When this work is done, the migration engineer has a migration system $\migrsys'$ with schema $\schema'$ and an initial dataset $\dataset'$
that is ready to deploy.
\item\label{item4}
At this moment, let us call it the moment of deployment (MoD),
\begin{figure}[h]
\centering
\includegraphics[scale=0.09]{figures/4MoD.png}
\caption{During the migration}
\label{fig:4MoD}
\end{figure}
the migration engineer deploys system $\migrsys'$.
This event starts the copying of data from $\dataset$ to $\dataset'$, as specified in $\migrsys'$.
During this phase, the ingress still refers users to the existing system,
so they will hardly notice that the desired system is being fired up and loaded with data.
Users can still change data in $\dataset$, while $\infsys$ is keeping its invariants true.
Yet, the migration system also transfers these ``new'' changes to $\dataset'$ by means of its enforcement rules.
This step is complete when all data from $\dataset$ has been copied to $\dataset'$ and all invariants of the migration system are true.
\item After completing step~\ref{item4}, the ingress switches the incoming traffic from $\infsys$ to $\migrsys'$,
so $\migrsys'$ effectively takes over.
\begin{figure}[h]
\centering
\includegraphics[scale=0.09]{figures/5MoT.png}
\caption{Situation after MoT}
\label{fig:during}
\end{figure}
Simultaneously, all clients must switch to the functionality of $\migrsys'$,
so users will experience the new functionality.
This event marks the MoT.
Since $\infsys$ may still have some violations due to last-minute edits,
it may take some (short but finite) time until the invariants of $\infsys$ are true.
After that, the existing dataset $\dataset$ will no longer change.
\item Starting at MoT, $\migrsys'$ is working towards satisfying the migration rules.
It executes its enforce rules to migrate data.
It also executes enforce rules and process rules to ensure that all invariants of $\infsys'$ are true.
Since $\migrsys'$ also contains $\schema$,
it will also keep the invariants of $\infsys$ as the final transactions trickle into the new system.
However, the traffic into $\infsys$ has stopped, so the changes in $\dataset$ will quickly dry up.
Once all invariants of $\migrsys'$ are true, the migration is complete.
\item Let us designate the moment of completion as MoC.
After the MoC, the ingress can switch the incoming traffic from $\migrsys'$ to $\infsys'$ because all invariants of $\infsys'$ are true.
Simultaneously, all clients must switch to the functionality of $\infsys'$.
$\infsys$ and $\migrsys'$ are no longer needed and can be taken out of production.
$\infsys'$ remains in production and the migration is finished.
This establishes the post-migration situation (figure~\ref{fig:post-migration}).
\end{enumerate}

%SJC on Formalization: it's often better to put these inline where we first use them. Leaving this as a comment for easy copy-paste.
Expand Down Expand Up @@ -1022,8 +1056,8 @@ \subsection{Preparing for the Moment of Transition}
]
\end{verbatim}

Upon initialization, the migration system evaluates all enforce rules in an attempt to satisfy all invariants.
In this example, it leaves two violations behind that it cannot resolve by enforce rules.
Upon initialization, the migration system evaluates all enforcement rules to satisfy all invariants.
In this example, it leaves two violations behind that it cannot resolve by enforcement rules.
The pair {\tt ("Peter", "Management")} is the first one; it violates rule {\tt new.ExamRule2}.
This is because Peter has registered for an examination of the course Management,
which contains the module Finance for which Peter is not enrolled.
Expand Down Expand Up @@ -1171,11 +1205,11 @@ \section{Validation}
More concretely, this means the following:

When the default migration system $\migrsys$ is deployed, the system will always be in a state such that continually running any violated ENFORCE rules (in any order) leads to the MoT.
This means that $A\rightsquigarrow B$ is such that the dataset of $B$ is that of $A$ with an update applied according to one or more violated enforce rules. % TODO: define 'violated'
If there is no such $B$ because no enforce rules are violated, we are at the moment of transition.
This means that $A\rightsquigarrow B$ is such that the dataset of $B$ is that of $A$ with an update applied according to one or more violated enforcement rules. % TODO: define 'violated'
If there is no such $B$ because no enforcement rules are violated, we are at the moment of transition.
We say that the system $A$ is in a state such that the MoT can be reached if there are no infinite $\rightsquigarrow$-paths from $A$.
The constraint on $A\dashrightarrow B$ is that only the old system can change at this point.
Note that this does not prove that the moment of transition to never be reached, as more enforce rules may get violated at every $\dashrightarrow$ step.
Note that this does not prove that the moment of transition to never be reached, as more enforcement rules may get violated at every $\dashrightarrow$ step.
We expect this not to be a problem in practice since $\dashrightarrow$ transitions are triggered by users while $\rightsquigarrow$ is triggered automatically through ENFORCE rules: the ENFORCE rules should outperform the $\dashrightarrow$ actions by a big margin.

After the moment of transition, there is a finite upper-bound on the amount of work the migration engineer has to do:
Expand All @@ -1189,13 +1223,6 @@ \section{Software architecture}
The theory presented above roughly corresponds to the back-end of a migration system.
This section describes its software architecture.
For this research, we have used the Ampersand compiler to generate the system(s).
Figure~\ref{fig:initial} shows the starting point of the migration.
\begin{figure}
\centering
\includegraphics[scale=0.1]{figures/Before migration.png}
\caption{Initial state of the migration}
\label{fig:initial}
\end{figure}
For the purpose of data migration, the existing system must make its schema available through its API.
For this purpose, the back-end offers a service ``retrieve schema'', which provides the schema as Ampersand source-code.
As a result, the migration engineer can rely on the schema of any generated application to be accurate.
Expand All @@ -1205,13 +1232,7 @@ \section{Software architecture}
After the migration engineer has produced and tested a migration script, the Ampersand compiler produces a migration system.
It includes the schemas of both the existing system and the desired system, besides having a schema of its own.
This is implemented by Ampersand's INCLUDE mechanism, which boils down to taking the disjoint union of the schemas.
\begin{figure}
\centering
\includegraphics[scale=0.1]{figures/During migration.png}
\caption{During the migration}
\label{fig:during}
\end{figure}
Deployment of the migration system (figure~\ref{fig:during}) leaves the existing system intact.
Deployment of the migration system leaves the existing system intact.
It enhances the system with the data set needed by the migration system, which includes the data set needed by the desired system.
So, users will notice no difference in the existing system when the migration system is deployed.

Expand Down
Loading

0 comments on commit 7eddcfb

Please sign in to comment.