From d302eadc0df9e9760e1af46a6ca1232d82cf275b Mon Sep 17 00:00:00 2001 From: Marco Anisetti Date: Wed, 12 Jun 2024 15:05:24 +0200 Subject: [PATCH] correzioni --- pipeline_instance.tex | 2 +- pipeline_instance_example.tex | 8 ++++---- pipeline_template.tex | 12 ++++++------ system_model.tex | 3 ++- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/pipeline_instance.tex b/pipeline_instance.tex index e207212..da9e2cf 100644 --- a/pipeline_instance.tex +++ b/pipeline_instance.tex @@ -28,7 +28,7 @@ \section{Pipeline Instance}\label{sec:instance} \item \textit{Selection Algorithm} -- The selection algorithm selects one service $s'_i$ for each set $S'_{i}$ of compatible services, which instantiates the corresponding vertex $\vii{i}$$\in$$\Vp$. There are many ways of choosing $s'_i$, we present our approach based on the maximization of data \quality \emph{\q} in Section \ref{sec:heuristics}. \end{enumerate} -When all vertices $\vi{i}$$\in$$V$ in $G^{\myLambda,\myGamma}$ have been visited, the \pipelineInstance G' is generated, with a service instance $s'_i$ for each \vii{i}$\in$\Vp. Vertex \vii{i} is still annotated with policies in \P{i} according to \myLambda, because policies in \P{i} are evaluated and enforced only when the pipeline instance is triggered, before any service is executed. In case policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied. +When all vertices $\vi{i}$$\in$$V$ in $G^{\myLambda,\myGamma}$ have been visited, the \pipelineInstance G' is generated, with a service instance $s'_i$ for each \vii{i}$\in$\Vp. Vertex \vii{i} is still annotated with policies in \P{i} according to \myLambda, because policies in \P{i} are evaluated and enforced only when the pipeline instance is triggered before any service is executed. In the case of policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied. \begin{figure}[ht!] \centering diff --git a/pipeline_instance_example.tex b/pipeline_instance_example.tex index a293693..d7dfcbd 100644 --- a/pipeline_instance_example.tex +++ b/pipeline_instance_example.tex @@ -6,13 +6,13 @@ As presented in Table~\ref{tab:exisnt}(a), each vertex is labeled with policies (column \emph{candidate--$>$policy}), and then associated with different candidate services (column \emph{candidate}) and corresponding profile (column \emph{profile}). The filtering algorithm matches each candidate service profile with the policies in Table~\ref{tab:anonymization} annotating the corresponding vertex. It returns the set of services whose profile matches a policy (column \emph{filtering}): \begin{enumerate*}[label=\textit{\roman*})] - \item vertex \vi{5}, the filtering algorithm produces the set $S'_1=\{s_{51},s_{52}\}$. Assuming that the dataset owner is ``CT'', the service profile of \s{51} matches \p{1} and the one of \s{52} matches \p{2}. For \s{53}, there is no policy match and, thus, it is discarded; - \item vertex \vi{6}, the filtering algorithm returns the set $S'_2=\{s_{62},s_{63}\}$. Assuming that the dataset region is ``CT'', the service profile of \s{62} matches \p{5} and the one of \s{63} matches \p{6}. For \s{61}, there is no policy match and, thus, it is discarded; + \item vertex \vi{5}, the filtering algorithm produces the set $S'_1=\{s_{51},s_{52}\}$. Assuming that the dataset owner is ``CT'', the service profile of \s_{51} matches \p{1} and the one of \s_{52} matches \p{2}. For \s_{53}, there is no policy match and, thus, it is discarded; + \item vertex \vi{6}, the filtering algorithm returns the set $S'_2=\{s_{62},s_{63}\}$. Assuming that the dataset region is ``CT'', the service profile of \s_{62} matches \p{5} and the one of \s_{63} matches \p{6}. For \s_{61}, there is no policy match and, thus, it is discarded; \item vertex \vi{7}, the filtering algorithm returns the set $S'_3=\{s_{71},s_{72}\}$. Since policy \p{7} matches with any subject, the filtering algorithm keeps all services. \end{enumerate*} -For each vertex \vii{i}, we select the matching service \sii{j} from $S'_i$ and incorporate it into a valid instance. For instance, we select \s{51} for \vi{5}; \s{62} for \vi{6}, and \s{71} for \vi{7} -as depicted in \cref{tab:instance_example_valid}(a) (column \emph{instance}). We note that, to move from a valid to an optimal instance, it is mandatory to evaluate candidate services based on specific quality metrics that reflect their impact on data quality, as discussed in the following of this paper. +For each vertex \vii{i}, we select the matching service \sii{j} from $S'_i$ and incorporate it into a valid instance. For instance, we select \s_{51} for \vi{5}; \s_{62} for \vi{6}, and \s_{71} for \vi{7} +as depicted in \cref{tab:instance_example_valid}(a) (column \emph{instance}). We note that to move from a valid to an optimal instance, it is mandatory to evaluate candidate services based on specific quality metrics that reflect their impact on data quality, as discussed in the following of this paper. %In the next sections, we will introduce the metrics that we use to evaluate the quality of services and the results of the experiments conducted to evaluate the performance of our approach. % \begin{table*} diff --git a/pipeline_template.tex b/pipeline_template.tex index 33294d8..453adc2 100644 --- a/pipeline_template.tex +++ b/pipeline_template.tex @@ -1,6 +1,6 @@ \section{Pipeline Template}\label{sec:template} Our approach integrates data protection and data management into the service pipeline using annotations. -To this aim, we extend the service pipeline in \cref{def:pipeline} with: \emph{i)} data protection annotations that also express transformations on data, ensuring compliance with data protection requirements, \emph{ii)} functional annotations to express data manipulations carried out during services execution. +To this aim, we extend the service pipeline in \cref{def:pipeline} with: \emph{i)} data protection annotations that also express transformations on data, ensuring compliance with data protection requirements, \emph{ii)} functional annotations to express data manipulations carried out during service execution. These annotations enable the implementation of an advanced data lineage, tracking the entire data lifecycle by monitoring changes that result from functional service execution and data protection requirements. In the following, we first introduce the annotated service pipeline, called pipeline template (Section \ref{sec:templatedefinition}). We then present both functional annotations (Section \ref{sec:funcannotation}) and data protection annotations (Section \ref{sec:nonfuncannotation}), providing an example of a pipeline template in the context of the reference scenario. @@ -18,7 +18,7 @@ \subsection{Pipeline Template Definition}\label{sec:templatedefinition} \vspace{0.5em} \begin{definition}[Pipeline Template] \label{def:template} - Given a service pipeline G(\V,\E), a pipeline template \tChartFunction is a direct acyclic graph extendend with two annotation functions: + Given a service pipeline G(\V,\E), a pipeline template \tChartFunction is a direct acyclic graph extended with two annotation functions: \begin{enumerate}%[label=\textit{\roman*}] \item \emph{Data Protection Annotation} \myLambda that assigns a label \myLambda(\vi{i}) to each vertex $\vi{i}\in\V_S$. Label \myLambda(\vi{i}) corresponds to a set \P{i} of policies $p_j$ to be satisfied by service $s_i$ represented by \vi{i}; \item \emph{Functional Annotation} \myGamma that assigns a label \myGamma(\vi{i}) to each vertex $\vi{i}\in\V_S$. Label \myGamma(\vi{i}) corresponds to the functional description $F_i$ of service $s_i$ represented by \vi{i}. @@ -112,18 +112,18 @@ \subsection{Pipeline Template Definition}\label{sec:templatedefinition} \vspace{0.5em} - More in detail, \textit{subject subj} specifies a service $s_i$ issuing an access request to perform an action on an object. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (classifier$=$"SVM") specifies a service providing a SVM classifier. We note that \textit{subj} can also specify conditions on the service owner (\textit{e.g.}, owner\_location$=$"EU") and the service user (\textit{e.g.}, service\_user\_role$=$"DOC Director"). + More in detail, \textit{subject subj} specifies a service $s_i$ issuing an access request to perform an action on an object. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (classifier$=$``SVM'') specifies a service providing a SVM classifier. We note that \textit{subj} can also specify conditions on the service owner (\textit{e.g.}, owner\_location$=$``EU'') and the service user (\textit{e.g.}, service\_user\_role$=$``DOC Director''). %\item \textit{Object obj} defines the data governed by the access policy. In this case, it is a set \{$pc_i$\} of \emph{Policy Conditions} on the object's attributes. %as defined in Definition \ref{def:policy_cond}. %It can specify the \emph{type} of object, such as a file (e.g., a video, text file, image, etc.), a SQL or noSQL database, a table, a column, a row, or a cell of a table, or any other characteristics of the data. - For instance, \{(type$=$"dataset"), (region$=$CT)\} refers to an object of type dataset and whose region is Connecticut. + For instance, \{(type$=$``dataset''), (region$=$CT)\} refers to an object of type dataset and whose region is Connecticut. %\item \textit{Action act} specifies the operations that can be performed within a big data environment, from traditional atomic operations on databases (e.g., CRUD operations) to coarser operations, such as an Apache Spark Direct Acyclic Graph (DAG), Hadoop MapReduce, an analytics function call, and an analytics pipeline. %\item - \textit{Environment env} defines a set of conditions on contextual attributes, such as time of the day, location, IP address, risk level, weather condition, holiday/workday, emergency. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (time$=$"night") refers to a policy that is applicable only at night. + \textit{Environment env} defines a set of conditions on contextual attributes, such as time of the day, location, IP address, risk level, weather condition, holiday/workday, and emergency. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (time$=$"night") refers to a policy that is applicable only at night. %\item \textit{Data Transformation \TP} defines a set of security and privacy-aware transformations on \textit{obj} that must be enforced before any access to data is given. Transformations focus on data protection, as well as on compliance to regulations and standards, in addition to simple format conversions. For instance, let us define three transformations that can be applied to the dataset in \cref{tab:dataset}, each performing different levels of anonymization: @@ -153,7 +153,7 @@ \subsection{Functional Annotations}\label{sec:funcannotation} \item an empty function \tf{\epsilon} that applies no transformation or processing on the data; \item an additive function \tf{a} that expands the amount of data received, for example, by integrating data from other sources; \item a transformation function \tf{t} that transforms some records in the dataset without altering the domain; - \item a transformation function \tf{d} (out of the scope of this work) that changes the domain of the data by applying, for instance, PCA or K-means. + \item a transformation function \tf{d} (out of the scope of this work) that changes the domain of the data by applying, for instance, the PCA. \end{enumerate*} For simplicity but with no loss of generality, we assume that all candidate services meet functional annotation \F{} and that \TF{}=\tf{}. As a consequence, all candidate services apply the same transformation to the data during the pipeline execution. \ No newline at end of file diff --git a/system_model.tex b/system_model.tex index 6a3fbeb..dd1e563 100644 --- a/system_model.tex +++ b/system_model.tex @@ -70,7 +70,8 @@ \subsection{Reference Scenario}\label{sec:service_definition} In this context, the user, a member of the Connecticut Department of Correction (DOC), is interested to compare admission trends in Connecticut prisons with the ones in New York and New Hampshire. We assume that the three DOCs are partners and share data according to their privacy policies. The entire service execution must occur within the Connecticut Department of Correction. Moreover, if data transmission extends beyond Connecticut's borders, data protection measures must be implemented. -The user's objective aligns with a predefined service pipeline \st{template} that orchestrates the following sequence of operations: +The user's objective aligns with a predefined service pipeline %\st{template} +that orchestrates the following sequence of operations: \begin{enumerate*}[label=(\roman*)] \item \emph{Data fetching}, including the download of the dataset from other states; \item \emph{Data preparation}, including data merging, cleaning, and anonymization;