Skip to content

Commit

Permalink
correzioni
Browse files Browse the repository at this point in the history
  • Loading branch information
anisetti committed Jun 12, 2024
1 parent fc7a64d commit d302ead
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 12 deletions.
2 changes: 1 addition & 1 deletion pipeline_instance.tex
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ \section{Pipeline Instance}\label{sec:instance}
\item \textit{Selection Algorithm} -- The selection algorithm selects one service $s'_i$ for each set $S'_{i}$ of compatible services, which instantiates the corresponding vertex $\vii{i}$$\in$$\Vp$. There are many ways of choosing $s'_i$, we present our approach based on the maximization of data \quality \emph{\q} in Section \ref{sec:heuristics}.
\end{enumerate}

When all vertices $\vi{i}$$\in$$V$ in $G^{\myLambda,\myGamma}$ have been visited, the \pipelineInstance G' is generated, with a service instance $s'_i$ for each \vii{i}$\in$\Vp. Vertex \vii{i} is still annotated with policies in \P{i} according to \myLambda, because policies in \P{i} are evaluated and enforced only when the pipeline instance is triggered, before any service is executed. In case policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied.
When all vertices $\vi{i}$$\in$$V$ in $G^{\myLambda,\myGamma}$ have been visited, the \pipelineInstance G' is generated, with a service instance $s'_i$ for each \vii{i}$\in$\Vp. Vertex \vii{i} is still annotated with policies in \P{i} according to \myLambda, because policies in \P{i} are evaluated and enforced only when the pipeline instance is triggered before any service is executed. In the case of policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied.

\begin{figure}[ht!]
\centering
Expand Down
8 changes: 4 additions & 4 deletions pipeline_instance_example.tex
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

As presented in Table~\ref{tab:exisnt}(a), each vertex is labeled with policies (column \emph{candidate--$>$policy}), and then associated with different candidate services (column \emph{candidate}) and corresponding profile (column \emph{profile}). The filtering algorithm matches each candidate service profile with the policies in Table~\ref{tab:anonymization} annotating the corresponding vertex. It returns the set of services whose profile matches a policy (column \emph{filtering}):
\begin{enumerate*}[label=\textit{\roman*})]
\item vertex \vi{5}, the filtering algorithm produces the set $S'_1=\{s_{51},s_{52}\}$. Assuming that the dataset owner is ``CT'', the service profile of \s{51} matches \p{1} and the one of \s{52} matches \p{2}. For \s{53}, there is no policy match and, thus, it is discarded;
\item vertex \vi{6}, the filtering algorithm returns the set $S'_2=\{s_{62},s_{63}\}$. Assuming that the dataset region is ``CT'', the service profile of \s{62} matches \p{5} and the one of \s{63} matches \p{6}. For \s{61}, there is no policy match and, thus, it is discarded;
\item vertex \vi{5}, the filtering algorithm produces the set $S'_1=\{s_{51},s_{52}\}$. Assuming that the dataset owner is ``CT'', the service profile of \s_{51} matches \p{1} and the one of \s_{52} matches \p{2}. For \s_{53}, there is no policy match and, thus, it is discarded;
\item vertex \vi{6}, the filtering algorithm returns the set $S'_2=\{s_{62},s_{63}\}$. Assuming that the dataset region is ``CT'', the service profile of \s_{62} matches \p{5} and the one of \s_{63} matches \p{6}. For \s_{61}, there is no policy match and, thus, it is discarded;
\item vertex \vi{7}, the filtering algorithm returns the set $S'_3=\{s_{71},s_{72}\}$. Since policy \p{7} matches with any subject, the filtering algorithm keeps all services.
\end{enumerate*}

For each vertex \vii{i}, we select the matching service \sii{j} from $S'_i$ and incorporate it into a valid instance. For instance, we select \s{51} for \vi{5}; \s{62} for \vi{6}, and \s{71} for \vi{7}
as depicted in \cref{tab:instance_example_valid}(a) (column \emph{instance}). We note that, to move from a valid to an optimal instance, it is mandatory to evaluate candidate services based on specific quality metrics that reflect their impact on data quality, as discussed in the following of this paper.
For each vertex \vii{i}, we select the matching service \sii{j} from $S'_i$ and incorporate it into a valid instance. For instance, we select \s_{51} for \vi{5}; \s_{62} for \vi{6}, and \s_{71} for \vi{7}
as depicted in \cref{tab:instance_example_valid}(a) (column \emph{instance}). We note that to move from a valid to an optimal instance, it is mandatory to evaluate candidate services based on specific quality metrics that reflect their impact on data quality, as discussed in the following of this paper.
%In the next sections, we will introduce the metrics that we use to evaluate the quality of services and the results of the experiments conducted to evaluate the performance of our approach.

% \begin{table*}
Expand Down
12 changes: 6 additions & 6 deletions pipeline_template.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
\section{Pipeline Template}\label{sec:template}
Our approach integrates data protection and data management into the service pipeline using annotations.
To this aim, we extend the service pipeline in \cref{def:pipeline} with: \emph{i)} data protection annotations that also express transformations on data, ensuring compliance with data protection requirements, \emph{ii)} functional annotations to express data manipulations carried out during services execution.
To this aim, we extend the service pipeline in \cref{def:pipeline} with: \emph{i)} data protection annotations that also express transformations on data, ensuring compliance with data protection requirements, \emph{ii)} functional annotations to express data manipulations carried out during service execution.
These annotations enable the implementation of an advanced data lineage, tracking the entire data lifecycle by monitoring changes that result from functional service execution and data protection requirements.

In the following, we first introduce the annotated service pipeline, called pipeline template (Section \ref{sec:templatedefinition}). We then present both functional annotations (Section \ref{sec:funcannotation}) and data protection annotations (Section \ref{sec:nonfuncannotation}), providing an example of a pipeline template in the context of the reference scenario.
Expand All @@ -18,7 +18,7 @@ \subsection{Pipeline Template Definition}\label{sec:templatedefinition}
\vspace{0.5em}

\begin{definition}[Pipeline Template] \label{def:template}
Given a service pipeline G(\V,\E), a pipeline template \tChartFunction is a direct acyclic graph extendend with two annotation functions:
Given a service pipeline G(\V,\E), a pipeline template \tChartFunction is a direct acyclic graph extended with two annotation functions:
\begin{enumerate}%[label=\textit{\roman*}]
\item \emph{Data Protection Annotation} \myLambda that assigns a label \myLambda(\vi{i}) to each vertex $\vi{i}\in\V_S$. Label \myLambda(\vi{i}) corresponds to a set \P{i} of policies $p_j$ to be satisfied by service $s_i$ represented by \vi{i};
\item \emph{Functional Annotation} \myGamma that assigns a label \myGamma(\vi{i}) to each vertex $\vi{i}\in\V_S$. Label \myGamma(\vi{i}) corresponds to the functional description $F_i$ of service $s_i$ represented by \vi{i}.
Expand Down Expand Up @@ -112,18 +112,18 @@ \subsection{Pipeline Template Definition}\label{sec:templatedefinition}

\vspace{0.5em}

More in detail, \textit{subject subj} specifies a service $s_i$ issuing an access request to perform an action on an object. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (classifier$=$"SVM") specifies a service providing a SVM classifier. We note that \textit{subj} can also specify conditions on the service owner (\textit{e.g.}, owner\_location$=$"EU") and the service user (\textit{e.g.}, service\_user\_role$=$"DOC Director").
More in detail, \textit{subject subj} specifies a service $s_i$ issuing an access request to perform an action on an object. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (classifier$=$``SVM'') specifies a service providing a SVM classifier. We note that \textit{subj} can also specify conditions on the service owner (\textit{e.g.}, owner\_location$=$``EU'') and the service user (\textit{e.g.}, service\_user\_role$=$``DOC Director'').

%\item
\textit{Object obj} defines the data governed by the access policy. In this case, it is a set \{$pc_i$\} of \emph{Policy Conditions} on the object's attributes. %as defined in Definition \ref{def:policy_cond}.
%It can specify the \emph{type} of object, such as a file (e.g., a video, text file, image, etc.), a SQL or noSQL database, a table, a column, a row, or a cell of a table, or any other characteristics of the data.
For instance, \{(type$=$"dataset"), (region$=$CT)\} refers to an object of type dataset and whose region is Connecticut.
For instance, \{(type$=$``dataset''), (region$=$CT)\} refers to an object of type dataset and whose region is Connecticut.

%\item
\textit{Action act} specifies the operations that can be performed within a big data environment, from traditional atomic operations on databases (e.g., CRUD operations) to coarser operations, such as an Apache Spark Direct Acyclic Graph (DAG), Hadoop MapReduce, an analytics function call, and an analytics pipeline.

%\item
\textit{Environment env} defines a set of conditions on contextual attributes, such as time of the day, location, IP address, risk level, weather condition, holiday/workday, emergency. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (time$=$"night") refers to a policy that is applicable only at night.
\textit{Environment env} defines a set of conditions on contextual attributes, such as time of the day, location, IP address, risk level, weather condition, holiday/workday, and emergency. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (time$=$"night") refers to a policy that is applicable only at night.

%\item
\textit{Data Transformation \TP} defines a set of security and privacy-aware transformations on \textit{obj} that must be enforced before any access to data is given. Transformations focus on data protection, as well as on compliance to regulations and standards, in addition to simple format conversions. For instance, let us define three transformations that can be applied to the dataset in \cref{tab:dataset}, each performing different levels of anonymization:
Expand Down Expand Up @@ -153,7 +153,7 @@ \subsection{Functional Annotations}\label{sec:funcannotation}
\item an empty function \tf{\epsilon} that applies no transformation or processing on the data;
\item an additive function \tf{a} that expands the amount of data received, for example, by integrating data from other sources;
\item a transformation function \tf{t} that transforms some records in the dataset without altering the domain;
\item a transformation function \tf{d} (out of the scope of this work) that changes the domain of the data by applying, for instance, PCA or K-means.
\item a transformation function \tf{d} (out of the scope of this work) that changes the domain of the data by applying, for instance, the PCA.
\end{enumerate*}

For simplicity but with no loss of generality, we assume that all candidate services meet functional annotation \F{} and that \TF{}=\tf{}. As a consequence, all candidate services apply the same transformation to the data during the pipeline execution.
3 changes: 2 additions & 1 deletion system_model.tex
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ \subsection{Reference Scenario}\label{sec:service_definition}

In this context, the user, a member of the Connecticut Department of Correction (DOC), is interested to compare admission trends in Connecticut prisons with the ones in New York and New Hampshire. We assume that the three DOCs are partners and share data according to their privacy policies. The entire service execution must occur within the Connecticut Department of Correction. Moreover, if data transmission extends beyond Connecticut's borders, data protection measures must be implemented.

The user's objective aligns with a predefined service pipeline \st{template} that orchestrates the following sequence of operations:
The user's objective aligns with a predefined service pipeline %\st{template}
that orchestrates the following sequence of operations:
\begin{enumerate*}[label=(\roman*)]
\item \emph{Data fetching}, including the download of the dataset from other states;
\item \emph{Data preparation}, including data merging, cleaning, and anonymization;
Expand Down

0 comments on commit d302ead

Please sign in to comment.