From 93971c5fac228ee4b4e2ab4a1c46aaa002127176 Mon Sep 17 00:00:00 2001 From: Antongiacomo Polimeno Date: Tue, 21 Nov 2023 13:33:16 +0100 Subject: [PATCH] minor --- pipeline_template.tex | 9 +++++++-- pipeline_template_example.tex | 15 ++++----------- system_model.tex | 9 ++++----- 3 files changed, 15 insertions(+), 18 deletions(-) diff --git a/pipeline_template.tex b/pipeline_template.tex index ca19e42..9789466 100644 --- a/pipeline_template.tex +++ b/pipeline_template.tex @@ -119,12 +119,17 @@ \subsection{Pipeline Template Definition}\label{sec:templatedefinition} \item Environment \textit{env} defines a set of conditions on contextual attributes, such as time of the day, location, IP address, risk level, weather condition, holiday/workday, emergency. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, $<$\emph{env},\{(time $=$ "night")\}$>$ refers to a policy that is applicable only at night. - \item Data Transformation \textit{\TP} defines a set of security and privacy-aware transformations on \textit{obj}, which must be enforced before any access to data. Transformations focus on data protection, as well as compliance to regulations and standards, in addition to simple format conversions. + \item Data Transformation \textit{\TP} defines a set of security and privacy-aware transformations on \textit{obj}, which must be enforced before any access to data. + Transformations focus on data protection, as well as compliance to regulations and standards, in addition to simple format conversions. + For instance, let us define three transformations that can be applied to the \cref{tab:dataset}: \begin{enumerate*}[label=\roman*)] + \item \emph{level0} (\tp{0}): no anonymization is carried out; + The data has been partially anonymized with only the first name and last name being anonymized. + \item \emph{level1} (\tp{2}): \item \emph{level2} (\tp{2}): The data has been fully anonymized with the first name, last name, identifier, and age being anonymized. + \end{enumerate*} \end{description} \end{definition} Access control policies $p_j$$\in$\P{i} annotating vertex \vi{i} in a pipeline template $G^{\myLambda,\myGamma}$ are used to filter out those candidate services $s$$\in$$S^c$ that do not match data protection requirements. Specifically, each policy $p_j$$\in$\P{i} is evaluated to verify whether a candidate service $s$$\in$$S^c$ for vertex \vi{i} is compatible with data protection requirements in \P{i} (\myLambda(\vi{i})). Policy evaluation matches the profile of candidate service $s$$\in$$S^c$ with the policy conditions in each $p_j$$\in$\P{i}. If the credentials and attributes in the candidate service profile fails to meet the policy conditions, meaning that no policies $p_j$ are evaluated to \emph{true}, the service is discarded; otherwise it is added to the set $S'$ of compatible service, which is used in Section~\ref{sec:instance} to generate the pipeline instance $G'$. No policy enforcement is done at this stage. - \subsection{Functional Annotations}\label{sec:funcannotation} A proper data management approach must track functional data manipulations across the entire pipeline execution, defining the functional requirements of each service operating on data. To this aim, each vertex \vi{i}$\in\V_S$ is annotated with a label \myGamma(\vi{i}), corresponding to the functional description $F_i$ of the service $s_i$ represented by \vi{i}. diff --git a/pipeline_template_example.tex b/pipeline_template_example.tex index e7f1fa7..3f6cc86 100644 --- a/pipeline_template_example.tex +++ b/pipeline_template_example.tex @@ -38,17 +38,10 @@ \subsection{Example}\label{sec:example} \end{table*} -We present an example of pipeline template focusing on policy annotations. The pipeline template consists of five stages, and each stage is annotated with a policy presented in \cref{tab:anonymization}. \hl{Connecticut Prison (CTP) is the service user executing the pipeline. New York Prison and New Hampshire Prison are two partner DOC.}\hl{SPOSTARE NEL SYSTEM MODEL? SI, MA DATA OWNER DIPENDE DAL DATASET, HO MESSO SERVICE USER} We recall that \cref{tab:dataset} shows a sample of our reference dataset. +We present an example of pipeline template consisting of five stages, and each stage is annotated with a policy presented in \cref{tab:anonymization}. +We recall that \cref{tab:dataset} shows a sample of our reference dataset. -In the following we will make reference to three different type of anonymization:%\hl{E' GIUSTO USARE}\tf{i} -\hl{? SPOSTIAMO PRIMA?} -\begin{enumerate*}[label=\roman*)] - \item \emph{level0} (\tp{0}): no anonymization is performed; - \item \emph{level1} (\tp{1}): the data is partially anonymized, only the first name and last name are anonymized; - \item \emph{level2} (\tp{2}): the data is fully anonymized: first name, last name, identifier and age are anonymized. -\end{enumerate*} - -Let us consider the pipeline template \tChartFunction in \cref{sec:example}, +Let us consider the case study in \cref{sec:systemmodel}, a possibile pipeline template is shown in \cref{fig:service_composition_template}. % 1° NODO % The first stage consists of three parallel vertices (\vi{1}, \vi{2}, \vi{3}) and focuses on data collection. The policy annotation \p{0} is linked with an empty transformation. @@ -82,7 +75,7 @@ \subsection{Example}\label{sec:example} The fifth stage manages data storage. If the service is within the facility itself ($\langle service,region=FACILITY"\rangle$), \p{5} is satisfied, resulting in data anonymization level1 (\tp{1}). Otherwise, if the service is in a partner region ($\langle service,region={CT,NY,NH}"\rangle$), the data undergo anonymization level2 (\tp{2}). -The functional requirement specifies some\hl{?} data as input, and the output is the URI of the stored data. +The functional requirement specifies the data to be provided as input, and the output is the URI of the stored data. % 6° NODO % The sixth stage is responsible for data visualization. As stated in policy annotation \p{6}, if the user is member of the facility itself, the data are anonymized level0 (\tp{0}). diff --git a/system_model.tex b/system_model.tex index df1f30c..270b4a8 100644 --- a/system_model.tex +++ b/system_model.tex @@ -51,10 +51,7 @@ \subsection{Service Pipeline and Reference Scenario}\label{sec:service_definitio \item \emph{Data analysis}, including statistical measures like averages, medians, and clustering-based statistics; \item \emph{Machine learning task}, including training and inference; \item \emph{Data storage}, including the storage of the results in the corresponding states. - Specifically, one copy remains in Connecticut (where sensitive information in the source dataset is not protected), - while two additional copies are distributed to New York and New Hampshire (with sensitive information from the source dataset being safeguarded) - .\hl{SPIEGHIAMO BENE LA PARENTESI} - \item \emph{Data visualization}, including the visualization of the results.\hl{STORAGE E VISUALIZATION NON LI FACEVAMO ALTERNATIVE CON UN NODO FINE?} + \item \emph{Data visualization}, including the visualization of the results. \end{enumerate*} @@ -168,7 +165,9 @@ \subsection{Service Pipeline and Reference Scenario}\label{sec:service_definitio The adopted dataset\footnote{https://data.ct.gov/Public-Safety/Accused-Pre-Trial-Inmates-in-Correctional-Faciliti/b674-jy6w} exhibits a straightforward row-and-column structure. Each row represents an inmate; each column includes the following attributes: date of download, a unique identifier, last entry date, race, gender, age of the individual, the bound value, offense, entry facility, and detainer. To serve the objectives of our study, we have extended this dataset by introducing randomly generated first and last names. -In \cref{tab:dataset}, a sample of the dataset is presented, showcasing a representative subset of the collected information. +In \cref{tab:dataset}, we present a subset of the collection consisting of a representative sample of the data. +For demonstration purposes within our case study, we hypothesized that Connecticut, New York, and New Hampshire are collaborating as partners. +Therefore, data can be shared between them with relaxed privacy policies. % Scarichiamo tre dataset, nessuna anonimizzazione, nodo di merge, anonimizzo e pulisco tutto,