diff --git a/pipeline_instance_example.tex b/pipeline_instance_example.tex index 1f2eda7..220061f 100644 --- a/pipeline_instance_example.tex +++ b/pipeline_instance_example.tex @@ -13,30 +13,6 @@ For each vertex \vii{i}, we select the matching service \sii{j} from $S'_i$ and incorporate it into a valid instance. For instance, we select $\s{51}$ for \vi{5}; $\s{62}$ for \vi{6}, and $\s{71}$ for \vi{7} as depicted in \cref{tab:instance_example_valid}(a) (column \emph{instance}). We note that to move from a valid to an optimal instance, it is mandatory to evaluate candidate services based on specific quality metrics that reflect their impact on data quality, as discussed in the following of this paper. -%In the next sections, we will introduce the metrics that we use to evaluate the quality of services and the results of the experiments conducted to evaluate the performance of our approach. - -% \begin{table*} -% \def\arraystretch{1.5} -% \caption{Instance example}\label{tab:instance_example} - -% \centering -% \begin{tabular}{l|l|c|c|c} - -% \textbf{Vertex$\rightarrow$Policy} & \textbf{Candidate} & \textbf{Profile} & \textbf{Filtering} & \textbf{Ranking} \\ -% \multirow{ 3}{*}{\vi{4} $\rightarrow$ \p{1},\p{2} } & $\s{41}$ & service\_owner = "CT" & \cmark & 1 \\ -% & $\s{42}$ & service\_owner = "NY" & \cmark & 2 \\ -% & $\s{43}$ & service\_owner = "CA" & \xmark & -- \\ -% \hline -% \multirow{ 3}{*}{\vi{7} $\rightarrow$ \p{5},\p{6} } & $\s{71}$ & service\_region = "CA" & \xmark & -- \\ -% & $\s{72}$ & service\_region = "CT" & \cmark & 1 \\ -% & $\s{73}$ & service\_region = "NY" & \cmark & 2 \\ -% \hline -% \multirow{ 3}{*}{\vi{8} $\rightarrow$ \p{7},\p{8} } & $\s{81}$ & visualization\_location = "CT\_FACILITY" & \cmark & 1 \\ -% & $\s{82}$ & visualization\_location = "CLOUD" & \cmark & 2 \\ -% \hline -% \end{tabular} -% \end{table*} - \begin{table*} \def\arraystretch{1.5} @@ -47,16 +23,16 @@ \begin{tabular}{c|c|c|c|c}\label{tab:instance_example_valid} \textbf{Vertex$\rightarrow$Policy} & \textbf{Candidate} & \textbf{Profile} & \textbf{Filtering} & \textbf{Instance} \\\hline - \multirow{ 3}{*}{\vi{5} $\rightarrow$ \p{1},\p{2} } & $\s{51}$ & service\_owner = "CT" & \cmark & \cmark \\ - & $\s{52}$ & service\_owner = "NY" & \cmark & \xmark \\ - & $\s{53}$ & service\_owner = "CA" & \xmark & \xmark \\ + \multirow{ 3}{*}{\vi{5} $\rightarrow$ \p{1},\p{2} } & $\s{51}$ & service\_owner = ``CT" & \cmark & \cmark \\ + & $\s{52}$ & service\_owner = ``NY" & \cmark & \xmark \\ + & $\s{53}$ & service\_owner = ``CA" & \xmark & \xmark \\ \hline - \multirow{ 3}{*}{\vi{6} $\rightarrow$ \p{3},\p{4} } & $\s{61}$ & service\_region = "CA" & \xmark & \xmark \\ - & $\s{62}$ & service\_region = "CT" & \cmark & \cmark \\ - & $\s{63}$ & service\_region = "NY" & \cmark & \xmark \\ + \multirow{ 3}{*}{\vi{6} $\rightarrow$ \p{3},\p{4} } & $\s{61}$ & service\_region = ``CA" & \xmark & \xmark \\ + & $\s{62}$ & service\_region = ``CT" & \cmark & \cmark \\ + & $\s{63}$ & service\_region = ``NY" & \cmark & \xmark \\ \hline - \multirow{ 3}{*}{\vi{7} $\rightarrow$ \p{5},\p{6} } & $\s{71}$ & visualization\_location = "CT\_FACILITY" & \cmark & \cmark \\ - & $\s{72}$ & visualization\_location = "CLOUD" & \cmark & \xmark \\ + \multirow{ 3}{*}{\vi{7} $\rightarrow$ \p{5},\p{6} } & $\s{71}$ & visualization\_location = ``CT\_FACILITY" & \cmark & \cmark \\ + & $\s{72}$ & visualization\_location = ``CLOUD" & \cmark & \xmark \\ \end{tabular} & diff --git a/pipeline_template.tex b/pipeline_template.tex index 453adc2..7e24005 100644 --- a/pipeline_template.tex +++ b/pipeline_template.tex @@ -95,15 +95,15 @@ \subsection{Pipeline Template Definition}\label{sec:templatedefinition} \subsection{Data Protection Annotation}\label{sec:nonfuncannotation} Data Protection Annotation \myLambda\ expresses data protection requirements in the form of access control policies. We consider an attribute-based access control model that offers flexible fine-grained authorization and adapts its standard key components to address the unique characteristics of a big data environment. Access requirements are expressed in the form of policy conditions that are defined as follows. - + \vspace{0.5em} - + \begin{definition}[Policy Condition]\label{def:policy_cond} A \emph{Policy Condition pc} is a Boolean expression of the form $($\emph{attr\_name} op \emph{attr\_value}$)$, with op$\in$\{$<$,$>$,$=$,$\neq$,$\leq$,$\geq$\}, \emph{attr\_name} an attribute label, and \emph{attr\_value} the corresponding attribute value. \end{definition} - + \vspace{0.5em} - + Built on policy conditions, an access control policy is then defined as follows. \vspace{0.5em} \begin{definition}[Policy]\label{def:policy_rule} @@ -123,7 +123,7 @@ \subsection{Pipeline Template Definition}\label{sec:templatedefinition} \textit{Action act} specifies the operations that can be performed within a big data environment, from traditional atomic operations on databases (e.g., CRUD operations) to coarser operations, such as an Apache Spark Direct Acyclic Graph (DAG), Hadoop MapReduce, an analytics function call, and an analytics pipeline. %\item - \textit{Environment env} defines a set of conditions on contextual attributes, such as time of the day, location, IP address, risk level, weather condition, holiday/workday, and emergency. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (time$=$"night") refers to a policy that is applicable only at night. + \textit{Environment env} defines a set of conditions on contextual attributes, such as time of the day, location, IP address, risk level, weather condition, holiday/workday, and emergency. It is a set \{$pc_i$\} of \emph{Policy Conditions} as defined in Definition \ref{def:policy_cond}. For instance, (time$=$``night") refers to a policy that is applicable only at night. %\item \textit{Data Transformation \TP} defines a set of security and privacy-aware transformations on \textit{obj} that must be enforced before any access to data is given. Transformations focus on data protection, as well as on compliance to regulations and standards, in addition to simple format conversions. For instance, let us define three transformations that can be applied to the dataset in \cref{tab:dataset}, each performing different levels of anonymization: