From 97bde09910d3329ef3c4a584c35a1df2e8c26d0f Mon Sep 17 00:00:00 2001 From: Claudio Ardagna Date: Wed, 28 Feb 2024 11:55:40 +0100 Subject: [PATCH] Sections 5, 6 intro, 6.2 - Claudio. --- macro.tex | 1 + metrics.tex | 17 ++- pipeline_instance.tex | 279 +++++++++++++++++++++--------------------- pipeline_template.tex | 2 +- 4 files changed, 150 insertions(+), 149 deletions(-) diff --git a/macro.tex b/macro.tex index a4477be..768cf32 100644 --- a/macro.tex +++ b/macro.tex @@ -50,6 +50,7 @@ % \newcommand{\TF}{\ensuremath{T_{\fChartFunction}}} \newcommand{\user}{user\,} \newcommand{\User}{User\,} +\newcommand{\profile}{\emph{prf}} \newcommand{\fChartFunction}{\ensuremath{\myLambda{}}} diff --git a/metrics.tex b/metrics.tex index 9c78212..88788a2 100644 --- a/metrics.tex +++ b/metrics.tex @@ -1,9 +1,8 @@ \section{Maximizing the Pipeline Instance Quality}\label{sec:heuristics} - - % +% % %Ovviamente non è sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore. -The goal of this paper is to produce a pipeline instance with maximum quality, i.e., guaranteeing a high level of data protection but at the same time the minimum amount of information lost across the pipeline. To this aim, we first discuss the crucial role of well-defined metrics (\cref{sec:metrics}) to specify and measure data quality, and describe the ones that will be used in the paper. -Then, we prove that the problem of maximizing the pipeline instance quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexities associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed numerous nodes and candidate services. +Our goal is to generate a pipeline instance with maximum quality, which addresses data protection requirements with the minimum amount of information loss across the pipeline. To this aim, we first discuss the crucial role of well-defined metrics (\cref{sec:metrics}) to specify and measure data quality, and describe the ones used in the paper. +Then, we prove that the problem of generating a pipeline instance with maximum quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexities associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed of numerous nodes and candidate services. Our focus extends beyond identifying optimal combinations, encompassing an understanding of the quality changes introduced during the transformation processes. %Inspired by existing literature, these metrics, categorized as quantitative and statistical, play a pivotal role in quantifying the impact of policy-driven transformations on the original dataset. @@ -61,21 +60,21 @@ \subsubsection{Weighted Jensen-Shannon Divergence} By incorporating weights into the JSD calculation, WJSD provides a more accurate measure of dissimilarity between X and Y, considering the importance of individual elements based on the assigned weights. This approach is particularly useful when elements in the datasets have varying levels of significance, enabling a more tailored analysis of dissimilarity. \subsection{NP-Hardness of the Max Quality Pipeline Instantiation Process}\label{sec:nphard} +\hl{se lo definiamo in maniera formale come il problema di trovare un'istanza valida in accordo alla definizione di istanza tale che non ne esiste una con un loss piu' piccolo?} - \begin{definition}[Max Quality Pipeline Instantiation Process]\label{def:MaXQualityInstance} +\begin{definition}[Max Quality Pipeline Instantiation Process]\label{def:MaXQualityInstance} Given \textit{dtloss}$_i$ the value of the quality metric computed after applying the transformation of the policy matching the service selected to instantiate vertex \vi{i}$\in$$\V_S$, the Max quality \problem is the case in which the \emph{pipeline instantiation} function returns a \pipelineInstance where the \textit{dtloss}$_i$ sum is maximized. \end{definition} -The Max Quality \problem is a combinatorial selection problem and is NP-hard, as stated by theorem \ref{theorem:NP}. -However, while the overall problem is NP-hard, there is a component of the problem that is solvable in polynomial time: matching the profile of each service with the node policy. This can be done by iterating over each node and each service, checking if the service matches the node’s policy. This process would take $O(|N|*|S|)$ time. This is polynomial time complexity. +The Max Quality \problem is a combinatorial selection problem and is NP-hard, as stated by Theorem \ref{theorem:NP}. However, while the overall problem is NP-hard, there is a component of the problem that is solvable in polynomial time: matching the profile of each service with the node policy. This can be done by iterating over each node and each service, checking if the service matches the node’s policy. This process would take $O(|N|*|S|)$ time. This is polynomial time complexity. \begin{theorem}\label{theorem:NP} The Max Quality \problem is NP-Hard. \end{theorem} \emph{Proof: } -The proof is a reduction from the multiple-choice knapsack problem (MCKP), a classified NP-hard combinatorial optimization problem, which is a generalization of the simple knapsack problem (KP) \cite{}. In the MCKP problem, there are $t$ mutually disjoint classes $N_1,N_2,…N_t$ of items to pack in some knapsack of capacity $C$, class $N_i$ having size $n_i$. Each item $j \in N_i$ has a profit $p_{ij}$ and a weight $w_{ij}$; the problem is to choose one item from each class such that the profit sum is maximized without having the weight sum to exceed C. +The proof is a reduction from the multiple-choice knapsack problem (MCKP), a classified NP-hard combinatorial optimization problem, which is a generalization of the simple knapsack problem (KP) \cite{}. In the MCKP problem, there are $t$ mutually disjoint classes $N_1,N_2,\ldots,N_t$ of items to pack in some knapsack of capacity $C$, class $N_i$ having size $n_i$. Each item $j$$\in$$N_i$ has a profit $p_{ij}$ and a weight $w_{ij}$; the problem is to choose one item from each class such that the profit sum is maximized without having the weight sum to exceed C. -The MCKP can be reduced to the Max quality \problem in plynomial time, with $N_1,N_2,…N_t$ corresponding to $S^c_{1}, S^c_{1}, ..., S^c_{u},$, $t=u$ and $n_i$ the size of $S^c_{i}$. The profit $p_{ij}$ of item $j \in N_i$ corresponds to \textit{dtloss}$_{ij}$ computed for each candidate service $s_j \in S^c_{i}$, while $w_{ij}$ is uniformly 1 (thus, C is always equal to the cardinality of $V_C$). +The MCKP can be reduced to the Max quality \problem in polynomial time, with $N_1,N_2,\ldots,N_t$ corresponding to $S^c_{1}, S^c_{1}, \ldots, S^c_{u},$, $t$$=$$u$ and $n_i$ the size of $S^c_{i}$. The profit $p_{ij}$ of item $j$$\in$$N_i$ corresponds to \textit{dtloss}$_{ij}$ computed for each candidate service $s_j$$\in$$S^c_{i}$, while $w_{ij}$ is uniformly 1 (thus, C is always equal to the cardinality of $V_C$). Since the reduction can be done in polynomial time, our problem is also NP-hard. (non è sufficiente, bisogna provare che la soluzione di uno e' anche soluzione dell'altro) diff --git a/pipeline_instance.tex b/pipeline_instance.tex index d9f35cb..b21165a 100644 --- a/pipeline_instance.tex +++ b/pipeline_instance.tex @@ -1,12 +1,13 @@ \section{Pipeline Instance}\label{sec:instance} -Given a set of candidate services, a \pipelineInstance $\iChartFunction$ instantiates a \pipelineTemplate \tChartFunction by selecting the services according to its data protection and functional annotations. It is formally defined as follows. +%Given a set of candidate services, a +A \pipelineInstance $\iChartFunction$ instantiates a \pipelineTemplate \tChartFunction by composing services in the instance according to data protection and functional annotations in the template. It is formally defined as follows. \begin{definition}[Pipeline Instance]\label{def:instance} Let \tChartFunction be a pipeline template, a pipeline Instance $\iChartFunction$ is an isomorphic directed acyclic graph where: \begin{enumerate*}[label=\textit{\roman*})] \item $v'_r$$=$$v_r$; \item for each vertex $\vi{}\in\V_{\timesOperator}\cup\V_{\plusOperator}$, there exists a corresponding vertex $\vii{}\in\Vp_{\timesOperator}\cup\Vp_{\plusOperator}$; - \item for each $\vi{i}$$\in$$\V_S$ annotated with policy \P{i}, there exists a corresponding vertex \vii{i}$\in$$\Vp_S$ instantiated with the service \sii{i} selected so that: + \item for each $\vi{i}$$\in$$\V_S$ annotated with policy \P{i}, there exists a corresponding vertex \vii{i}$\in$$\Vp_S$ instantiated with a service \sii{i}, such that: \end{enumerate*} \begin{enumerate}[label=\arabic*)] \item $s'_i$ satisfies data protection annotation \myLambda(\vi{i}) in \tChartFunction; @@ -14,21 +15,21 @@ \section{Pipeline Instance}\label{sec:instance} \end{enumerate} \end{definition} -Condition 1 requires that each selected service \sii{i} satisfies the policy requirements \P{i} of the corresponding vertex \vi{i} in the \pipelineTemplate, whereas Condition 2 is needed to preserve the process functionality, as it simply states that each service \sii{i} must satisfy the functional requirements \F{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. Recall from Section~\ref{sec:funcannotation} that we assume that all candidate services meet the functional annotation, thus Condition 2 is satisfied for all candidate services. +Condition 1 requires that each selected service \sii{i} satisfies the policy requirements \P{i} of the corresponding vertex \vi{i} in the \pipelineTemplate, whereas Condition 2 is needed to preserve the process functionality, as it simply states that each service \sii{i} must satisfy the functional requirements \F{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. - We then define a \emph{pipeline instantiation} function that takes as input a \pipelineTemplate \tChartFunction and a set $S^c$ of candidate services, with a specific set of services $S^c_{i}$ for each vertex \vi{i}$\in$$\V_S$, and returns as output a \pipelineInstance \iChartFunction. +We then define a \emph{pipeline instantiation} function that takes as input a \pipelineTemplate \tChartFunction and a set $S^c$ of candidate services, with a specific set of services $S^c_{i}$ for each vertex \vi{i}$\in$$\V_S$, and returns as output a \pipelineInstance \iChartFunction. Recall from Section~\ref{sec:funcannotation} that all candidate services meet the functional annotation in the template, meaning that Condition 2 in Definition~\ref{def:instance} is satisfied for all candidate services. - The \pipelineInstance is generated by traversing the \pipelineTemplate with a breadth-first search algorithm, starting from the root vertex \vi{r}. - Then, for each vertex $v\in\Vplus\bigcup\Vtimes$ in the pipeline template, the corresponding vertex $v'\in\Vpplus\bigcup\Vptimes$ is generated. - Finally, for each vertex \vi{i}$\in$$\V_S$, a two-step approach is applied as follows. +The \pipelineInstance is generated by traversing the \pipelineTemplate with a breadth-first search algorithm, starting from the root vertex \vi{r}. +Then, for each vertex $v\in\Vplus\bigcup\Vtimes$ in the pipeline template, the corresponding vertex $v'\in\Vpplus\bigcup\Vptimes$ is generated. +Finally, for each vertex \vi{i}$\in$$\V_S$, a two-step approach is applied as follows. - \begin{enumerate} +\begin{enumerate} - \item \textit{Filtering Algorithm} -- The filtering algorithm checks if the profile of each candidate service $\si{j} \in S^c_{i}$, given as a set of attributes in the form (\emph{name}, \emph{value}), satisfies the policies $p_k$$\in$\P{i} corresponding to \myLambda(\vi{i}). If \si{j}'s profile satisfies at least one policy, the service is considered compatible, otherwise it is discarded. The filtering algorithm finally returns a subset $S'_{i}\subseteq S^c_{i}$ of compatible services for each vertex \vi{i}$\in$$\V_S$. - \item \textit{Selection Algorithm} -- The selection algorithm selects one service $s'_i$ for each set of compatible services in $S'_{i}$ and instantiates the vertex $\vii{i}\in \Vp$ with it. There are many ways of choosing $s'_i$, we present our approach based on the minimization of quality loss in Section \ref{sec:metrics}. + \item \textit{Filtering Algorithm} -- The filtering algorithm checks if the profile \profile$_j$ of each candidate service $\si{j}$$\in$$S^c_{i}$ satisfies the policies $p_k$$\in$\P{i} corresponding to \myLambda(\vi{i}). If \profile$_j$ satisfies at least one policy, service $\si{j}$ is compatible, otherwise it is discarded. The filtering algorithm finally returns a subset $S'_{i}$$\subseteq$$S^c_{i}$ of compatible services for each vertex \vi{i}$\in$$\V_S$. + \item \textit{Selection Algorithm} -- The selection algorithm selects one service $s'_i$ for each set $S'_{i}$ of compatible services and instantiates the corresponding vertex $\vii{i}$$\in$$\Vp$ with it. There are many ways of choosing $s'_i$, we present our approach based on the minimization of quality loss in Section \ref{sec:metrics}. \end{enumerate} - When all vertices $\vi{i}\in V$ have been visited, the \pipelineInstance G' is finalized, with a service instance $s'_i$ for each \vii{i}$\in$\Vp. Vertex \vii{i} is still annotated with policies $p_k$$\in$\P{i} according to \myLambda, since policies in \P{i} are evaluated and enforced only when the pipeline instance is triggered, before any service is executed. In case policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied. + When all vertices $\vi{i}$$\in$$V$ have been visited, the \pipelineInstance G' is finalized, with a service instance $s'_i$ for each \vii{i}$\in$\Vp. Vertex \vii{i} is still annotated with policies $p_k$$\in$\P{i} according to \myLambda, because policies in \P{i} are evaluated and enforced only when the pipeline instance is triggered, before any service is executed. In case policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied. \begin{figure}[ht!] \centering @@ -123,140 +124,140 @@ \subsection{Example}\label{sec:example} TBD (mettere un esempio in cui non sia s % Then, for each vertex $v\in\Vplus\bigcup\Vtimes$ in the pipeline template, the corresponding vertex $v'\in\Vpplus\bigcup\Vptimes$ is generated. % Finally, for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows. -\subsection{Pipeline Instance - VECCHIO PARAGRAFO}\label{sec:instance} -%The goal of our approach is to generate \pipelineInstance starting from the \pipelineTemplate in Section~\ref{sec:template}. -A \pipelineInstance $\iChartFunction$ instantiates a \pipelineTemplate \tChartFunction by selecting the component services according to its data protection and functional annotations. We formally define \tChartFunction as follows. - - \begin{definition}[Pipeline Instance]\label{def:instance} - Let \tChartFunction be a pipeline template, a pipeline Instance $\iChartFunction$ is a directed acyclic graph where: - \begin{enumerate*}[label=\textit{\roman*})] - \item $s_r$$=$$s'_r$; - \item for each vertex $\vi{}\in\V_{\timesOperator}\cup\V_{\plusOperator}$ it exists a corresponding vertex $\vii{}\in\Vp_{\timesOperator}\cup\Vp_{\plusOperator}$; - \item for each $\vi{i}$$\in$$\V_S$ annotated with policy \P{i} it exists a corresponding \vii{i}$\in$$\Vp_S$ instantiated with a service instance \sii{i}; - \end{enumerate*} - and such that the following conditions hold: - \begin{enumerate}[label=\arabic*)] - \item $s'_i$ satisfies data protection annotation \myLambda(\vi{i}) in \tChartFunction; - \item $s'_i$ satisfies functional annotation \myGamma(\vi{i}) in \tChartFunction. - \end{enumerate} - \end{definition} - -Condition 1 states that each service \sii{i} must satisfy the policy requirements \P{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. -Condition 2 is needed to preserve the process functionality, as it simply states that each service \sii{i} must satisfy the functional requirements \F{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. We recall that Condition 1 is satisfied for all candidate services (see Section~\ref{sec:funcannotation}) and therefore concentrate on Condition 2 in the following. - - We then define a \emph{pipeline instantiation} function that takes as input a \pipelineTemplate \tChartFunction and a set $S^c$ of candidate services, one for each vertex \vi{i}$\in$$\V_S$, and returns as output a \pipelineInstance \iChartFunction in Definition~\ref{def:instance}. - %In \iChartFunction, every invocations \vii{i}$\in$$V'_S$ contains a service instance, and every branching $v\in\Vplus\bigcup\Vtimes$ in the template is maintained as is. - %\chia{ The objective of the Pipeline Instantiation Process is to return a Pipeline Instance that minimizes the quantity of information lost, and maximizes the level of data protection and data sharing, not for the selection of a single service, but in the overall. To this aim, the \pipelineTemplate is traversed with a breadth-first search algorithm and for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows.} - % - % %Ovviamente non è sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore. - % - The \pipelineInstance is generated by traversing the \pipelineTemplate with a breadth-first search algorithm, starting from the root vertex \vi{r}. - Then, for each vertex $v\in\Vplus\bigcup\Vtimes$ in the pipeline template, the corresponding vertex $v'\in\Vpplus\bigcup\Vptimes$ is generated. - Finally, for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows. - - \begin{itemize} - - \item \textit{Filtering Algorithm} -- As already discussed in Section~\ref{sec:templatedefinition}, the filtering algorithm retrieves a set of candidate services $S^c$ and matches them one-by-one against data protection requirements \myLambda(\vi{i}). In particular, the profile of each candidate service \si{j} is matched against policies $p_k$$\in$\P{i} corresponding to \myLambda(\vi{i}). Filtering algorithm returns as output the set of compatible services that match the policy. - - Formally, let us consider a set $S^c$ of candidate services \si{j}, each one having a profile as a set of attributes in the form (\emph{name}, \emph{value}). The filtering algorithm is executed for each \si{j}; it is successful if \si{j}'s profile satisfies at least one policy $p_k$$\in$\P{i}; otherwise, \si{j} is discarded and not considered for selection. The filtering algorithm finally returns a subset $S'\subseteq S^c$ of compatible services, among which the service instance is selected. - - \item \textit{Selection Algorithm} -- Upon retrieving a set $S'$ of compatible services \si{j}, a service $s'_i$$\in$$S'$ is then selected and integrated in $\vii{i}\in \Vp$. There are many ways of choosing $s'_i$, we present our approach based on quality loss in Section \ref{sec:metrics}. - %\item \textit{Selection Algorithm} -- Upon retrieving a set $S'$ of compatible services \si{j}, it produces a ranking of these services according to some metrics that evaluates the quality loss introduced by each service when integrated in the pipeline instance. More details about the metrics are provided in Section \ref{sec:metrics}. The best service $s'_i$ is then selected and integrated in $\vii{i}\in \Vp$. There are many ways of choosing relevant metrics, we present those used in this article in Section \ref{sec:metrics}. - \end{itemize} - - When all vertices $\vi{i}\in V$ have been visited, a \pipelineInstance G' is generated, where each \vii{i}$\in$\Vp contains a service instance $s'_i$. We note that each vertex \vii{i} is annotated with policies $p_k$$\in$\P{i} according to \myLambda. When pipeline instance is triggered, before any services can be executed, policies in \P{i} are evaluated and enforced. In case policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied. - -\begin{figure}[ht!] - \centering - \newcommand{\function}{$\instanceChartAnnotation{}$} - \begin{tikzpicture}[scale=0.7] - % Nodes - \node[draw, circle] (sr) at (0,0) {$\vi{r}$}; - % \node[draw, circle] (node2) at (1,0) {$\s{1}$}; - \node[draw, circle, plus,minimum size=1.5em] (plus) at (1.5,0) {}; - \node[draw, circle] (s1) at (3,1.7) {$\sii{1}$}; - \node[draw, circle] (s2) at (3,-1.7) {$\sii{2}$}; - \node[draw, circle] (s3) at (3,0) {$\sii{3}$}; - - - \node[draw, circle] (s4) at (4.5,0) {$\sii{4}$}; - \node[draw, circle, cross,minimum size=1.5em] (cross) at (6,0) {}; - \node[draw, circle] (s5) at (7.5,1.2) {$\sii{5}$}; - \node[draw, circle] (s6) at (7.5,-1.2) {$\sii{6}$}; - - \node[draw, circle] (s7) at (9,0) {$\sii{7}$}; - \node[draw, circle] (s8) at (10.5,0) {$\sii{8}$}; - % Text on top - \node[above] at (sr.north) {\function{}}; - \node[above] at (s1.north) {\function{}}; - - \node[above] at (s2.north) {\function{}}; - \node[above] at (s3.north) {\function{}}; - \node[above] at (s4.north) {\function{}}; - \node[above] at (s5.north) {\function{}}; - \node[above] at (s6.north) {\function{}}; - \node[above] at (s7.north) {\function{}}; - \node[above] at (s8.north) {\function{}}; - % Connection - - % \draw[->] (node2) -- (node3); - \draw[->] (sr) -- (plus); - \draw[->] (plus) -- (s1); - \draw[->] (plus) -- (s2); - \draw[->] (plus) -- (s3); - - \draw[->] (s1) -- (s4); - \draw[->] (s2) -- (s4); - \draw[->] (s3) -- (s4); - % \draw[->] (node6) -- (node65); - % \draw[->] (node65) -- (node7);3 - \draw[->] (s4) -- (cross); - \draw[->] (cross) -- (s5); - \draw[->] (cross) -- (s6); - \draw[->] (s5) -- (s7); - \draw[->] (s6) -- (s7); - \draw[->] (s7) -- (s8); - - \end{tikzpicture} - \caption{Service composition instance} - \label{fig:service_composition_instance} -\end{figure} - - -% \subsection{Pipeline Instance Definition}\label{sec:instancedefinition} - % The goal of our approach is to generate an instance of the \pipelineTemplate starting from the \pipelineTemplate in Section~\ref{sec:template}. In the following, we first define the pipeline instance and the corresponding pipeline instantiation process (Section \ref{sec:instancedefinition}). We then prove that the pipeline instantiation process is NP-hard (Section \ref{sec:funcannotation}). - - % \subsection{Pipeline Instance Definition}\label{sec:instancedefinition} - % A \pipelineInstance $\iChartFunction$ is a ready-to-be-executed pipeline, which instantiates a \pipelineTemplate \tChartFunction \chia{ selecting the services} according to its data protection and functional annotations. We formally define \tChartFunction as follows. +% \subsection{Pipeline Instance - VECCHIO PARAGRAFO}\label{sec:instance} +% %The goal of our approach is to generate \pipelineInstance starting from the \pipelineTemplate in Section~\ref{sec:template}. +% A \pipelineInstance $\iChartFunction$ instantiates a \pipelineTemplate \tChartFunction by selecting the component services according to its data protection and functional annotations. We formally define \tChartFunction as follows. + +% \begin{definition}[Pipeline Instance]\label{def:instance} +% Let \tChartFunction be a pipeline template, a pipeline Instance $\iChartFunction$ is a directed acyclic graph where: +% \begin{enumerate*}[label=\textit{\roman*})] +% \item $s_r$$=$$s'_r$; +% \item for each vertex $\vi{}\in\V_{\timesOperator}\cup\V_{\plusOperator}$ it exists a corresponding vertex $\vii{}\in\Vp_{\timesOperator}\cup\Vp_{\plusOperator}$; +% \item for each $\vi{i}$$\in$$\V_S$ annotated with policy \P{i} it exists a corresponding \vii{i}$\in$$\Vp_S$ instantiated with a service instance \sii{i}; +% \end{enumerate*} +% and such that the following conditions hold: +% \begin{enumerate}[label=\arabic*)] +% \item $s'_i$ satisfies data protection annotation \myLambda(\vi{i}) in \tChartFunction; +% \item $s'_i$ satisfies functional annotation \myGamma(\vi{i}) in \tChartFunction. +% \end{enumerate} +% \end{definition} + +% Condition 1 states that each service \sii{i} must satisfy the policy requirements \P{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. +% Condition 2 is needed to preserve the process functionality, as it simply states that each service \sii{i} must satisfy the functional requirements \F{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. We recall that Condition 1 is satisfied for all candidate services (see Section~\ref{sec:funcannotation}) and therefore concentrate on Condition 2 in the following. + +% We then define a \emph{pipeline instantiation} function that takes as input a \pipelineTemplate \tChartFunction and a set $S^c$ of candidate services, one for each vertex \vi{i}$\in$$\V_S$, and returns as output a \pipelineInstance \iChartFunction in Definition~\ref{def:instance}. +% %In \iChartFunction, every invocations \vii{i}$\in$$V'_S$ contains a service instance, and every branching $v\in\Vplus\bigcup\Vtimes$ in the template is maintained as is. +% %\chia{ The objective of the Pipeline Instantiation Process is to return a Pipeline Instance that minimizes the quantity of information lost, and maximizes the level of data protection and data sharing, not for the selection of a single service, but in the overall. To this aim, the \pipelineTemplate is traversed with a breadth-first search algorithm and for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows.} +% % +% % %Ovviamente non è sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore. +% % +% The \pipelineInstance is generated by traversing the \pipelineTemplate with a breadth-first search algorithm, starting from the root vertex \vi{r}. +% Then, for each vertex $v\in\Vplus\bigcup\Vtimes$ in the pipeline template, the corresponding vertex $v'\in\Vpplus\bigcup\Vptimes$ is generated. +% Finally, for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows. + +% \begin{itemize} + +% \item \textit{Filtering Algorithm} -- As already discussed in Section~\ref{sec:templatedefinition}, the filtering algorithm retrieves a set of candidate services $S^c$ and matches them one-by-one against data protection requirements \myLambda(\vi{i}). In particular, the profile of each candidate service \si{j} is matched against policies $p_k$$\in$\P{i} corresponding to \myLambda(\vi{i}). Filtering algorithm returns as output the set of compatible services that match the policy. + +% Formally, let us consider a set $S^c$ of candidate services \si{j}, each one having a profile as a set of attributes in the form (\emph{name}, \emph{value}). The filtering algorithm is executed for each \si{j}; it is successful if \si{j}'s profile satisfies at least one policy $p_k$$\in$\P{i}; otherwise, \si{j} is discarded and not considered for selection. The filtering algorithm finally returns a subset $S'\subseteq S^c$ of compatible services, among which the service instance is selected. + +% \item \textit{Selection Algorithm} -- Upon retrieving a set $S'$ of compatible services \si{j}, a service $s'_i$$\in$$S'$ is then selected and integrated in $\vii{i}\in \Vp$. There are many ways of choosing $s'_i$, we present our approach based on quality loss in Section \ref{sec:metrics}. +% %\item \textit{Selection Algorithm} -- Upon retrieving a set $S'$ of compatible services \si{j}, it produces a ranking of these services according to some metrics that evaluates the quality loss introduced by each service when integrated in the pipeline instance. More details about the metrics are provided in Section \ref{sec:metrics}. The best service $s'_i$ is then selected and integrated in $\vii{i}\in \Vp$. There are many ways of choosing relevant metrics, we present those used in this article in Section \ref{sec:metrics}. +% \end{itemize} + +% When all vertices $\vi{i}\in V$ have been visited, a \pipelineInstance G' is generated, where each \vii{i}$\in$\Vp contains a service instance $s'_i$. We note that each vertex \vii{i} is annotated with policies $p_k$$\in$\P{i} according to \myLambda. When pipeline instance is triggered, before any services can be executed, policies in \P{i} are evaluated and enforced. In case policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied. + +% \begin{figure}[ht!] +% \centering +% \newcommand{\function}{$\instanceChartAnnotation{}$} +% \begin{tikzpicture}[scale=0.7] +% % Nodes +% \node[draw, circle] (sr) at (0,0) {$\vi{r}$}; +% % \node[draw, circle] (node2) at (1,0) {$\s{1}$}; +% \node[draw, circle, plus,minimum size=1.5em] (plus) at (1.5,0) {}; +% \node[draw, circle] (s1) at (3,1.7) {$\sii{1}$}; +% \node[draw, circle] (s2) at (3,-1.7) {$\sii{2}$}; +% \node[draw, circle] (s3) at (3,0) {$\sii{3}$}; + + +% \node[draw, circle] (s4) at (4.5,0) {$\sii{4}$}; +% \node[draw, circle, cross,minimum size=1.5em] (cross) at (6,0) {}; +% \node[draw, circle] (s5) at (7.5,1.2) {$\sii{5}$}; +% \node[draw, circle] (s6) at (7.5,-1.2) {$\sii{6}$}; + +% \node[draw, circle] (s7) at (9,0) {$\sii{7}$}; +% \node[draw, circle] (s8) at (10.5,0) {$\sii{8}$}; +% % Text on top +% \node[above] at (sr.north) {\function{}}; +% \node[above] at (s1.north) {\function{}}; + +% \node[above] at (s2.north) {\function{}}; +% \node[above] at (s3.north) {\function{}}; +% \node[above] at (s4.north) {\function{}}; +% \node[above] at (s5.north) {\function{}}; +% \node[above] at (s6.north) {\function{}}; +% \node[above] at (s7.north) {\function{}}; +% \node[above] at (s8.north) {\function{}}; +% % Connection + +% % \draw[->] (node2) -- (node3); +% \draw[->] (sr) -- (plus); +% \draw[->] (plus) -- (s1); +% \draw[->] (plus) -- (s2); +% \draw[->] (plus) -- (s3); + +% \draw[->] (s1) -- (s4); +% \draw[->] (s2) -- (s4); +% \draw[->] (s3) -- (s4); +% % \draw[->] (node6) -- (node65); +% % \draw[->] (node65) -- (node7);3 +% \draw[->] (s4) -- (cross); +% \draw[->] (cross) -- (s5); +% \draw[->] (cross) -- (s6); +% \draw[->] (s5) -- (s7); +% \draw[->] (s6) -- (s7); +% \draw[->] (s7) -- (s8); + +% \end{tikzpicture} +% \caption{Service composition instance} +% \label{fig:service_composition_instance} +% \end{figure} + + +% % \subsection{Pipeline Instance Definition}\label{sec:instancedefinition} +% % The goal of our approach is to generate an instance of the \pipelineTemplate starting from the \pipelineTemplate in Section~\ref{sec:template}. In the following, we first define the pipeline instance and the corresponding pipeline instantiation process (Section \ref{sec:instancedefinition}). We then prove that the pipeline instantiation process is NP-hard (Section \ref{sec:funcannotation}). + +% % \subsection{Pipeline Instance Definition}\label{sec:instancedefinition} +% % A \pipelineInstance $\iChartFunction$ is a ready-to-be-executed pipeline, which instantiates a \pipelineTemplate \tChartFunction \chia{ selecting the services} according to its data protection and functional annotations. We formally define \tChartFunction as follows. - % \begin{definition}[Pipeline Instance]\label{def:instance} - % Let \tChartFunction be a Pipeline Template, a Pipeline Instance $\iChartFunction$ is a directed acyclic graph where: - % \begin{enumerate*}[label=\textit{\roman*})] - % \item $s_r$$=$$s'_r$; - % \item for each vertex $\vi{}\in\V_{\timesOperator}\cup\V_{\plusOperator}$ it exists a corresponding vertex $\vii{}\in\Vp_{\timesOperator}\cup\Vp_{\plusOperator}$; - % \item for each $\vi{i}$$\in$$\V_S$ annotated with policy \P{i} it exists a corresponding \vii{i}$\in$$\Vp_S$ instantiated with a service instance \sii{i}; - % \end{enumerate*} - % and such that the following conditions hold: - % \begin{enumerate}[label=\arabic*)] - % \item $s'_i$ satisfies data protection annotation \myLambda(\vi{i}) in \tChartFunction; - % \item $s'_i$ satisfies functional annotation \myGamma(\vi{i}) in \tChartFunction. - % \end{enumerate} - % \end{definition} +% % \begin{definition}[Pipeline Instance]\label{def:instance} +% % Let \tChartFunction be a Pipeline Template, a Pipeline Instance $\iChartFunction$ is a directed acyclic graph where: +% % \begin{enumerate*}[label=\textit{\roman*})] +% % \item $s_r$$=$$s'_r$; +% % \item for each vertex $\vi{}\in\V_{\timesOperator}\cup\V_{\plusOperator}$ it exists a corresponding vertex $\vii{}\in\Vp_{\timesOperator}\cup\Vp_{\plusOperator}$; +% % \item for each $\vi{i}$$\in$$\V_S$ annotated with policy \P{i} it exists a corresponding \vii{i}$\in$$\Vp_S$ instantiated with a service instance \sii{i}; +% % \end{enumerate*} +% % and such that the following conditions hold: +% % \begin{enumerate}[label=\arabic*)] +% % \item $s'_i$ satisfies data protection annotation \myLambda(\vi{i}) in \tChartFunction; +% % \item $s'_i$ satisfies functional annotation \myGamma(\vi{i}) in \tChartFunction. +% % \end{enumerate} +% % \end{definition} - % Condition 1 states that each service \sii{i} must satisfy the policy requirements \P{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. - % Condition 2 is needed to preserve the process functionality, as it simply states that each service \sii{i} must satisfy the functional requirements \F{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. +% % Condition 1 states that each service \sii{i} must satisfy the policy requirements \P{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. +% % Condition 2 is needed to preserve the process functionality, as it simply states that each service \sii{i} must satisfy the functional requirements \F{i} of the corresponding vertex \vi{i} in the \pipelineTemplate. - % We recall that Condition 2 is satisfied for all candidate services (see Section~\ref{sec:funcannotation}) and therefore concentrate on Condition 1 in the following. +% % We recall that Condition 2 is satisfied for all candidate services (see Section~\ref{sec:funcannotation}) and therefore concentrate on Condition 1 in the following. - % We then define a \emph{Pipeline Instantiation Process} as a function that takes as input a \pipelineTemplate \tChartFunction and a set $S^c$ of candidate services, one for each vertex \vi{i}$\in$\V,\marginpar{ \chia{ $\V_S$?}} and returns as output a \pipelineInstance \iChartFunction in Definition~\ref{def:instance}. - % %In \iChartFunction, every invocations \vii{i}$\in$$V'_S$ contains a service instance, and every branching $v\in\Vplus\bigcup\Vtimes$ in the template is maintained as is. - % \chia{ The objective of the Pipeline Instantiation Process is to return a Pipeline Instance that minimizes the quantity of information lost, and maximizes the level of data protection and data sharing, not for the selection of a single service, but in the overall. To this aim, the \pipelineTemplate is traversed with a breadth-first search algorithm and for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows.} +% % We then define a \emph{Pipeline Instantiation Process} as a function that takes as input a \pipelineTemplate \tChartFunction and a set $S^c$ of candidate services, one for each vertex \vi{i}$\in$\V,\marginpar{ \chia{ $\V_S$?}} and returns as output a \pipelineInstance \iChartFunction in Definition~\ref{def:instance}. +% % %In \iChartFunction, every invocations \vii{i}$\in$$V'_S$ contains a service instance, and every branching $v\in\Vplus\bigcup\Vtimes$ in the template is maintained as is. +% % \chia{ The objective of the Pipeline Instantiation Process is to return a Pipeline Instance that minimizes the quantity of information lost, and maximizes the level of data protection and data sharing, not for the selection of a single service, but in the overall. To this aim, the \pipelineTemplate is traversed with a breadth-first search algorithm and for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows.} - % %Ovviamente non è sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore. +% % %Ovviamente non è sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore. - % The \pipelineInstance is generated by traversing the \pipelineTemplate with a breadth-first search algorithm, starting from the root vertex \vi{r}. - % Then, for each vertex $v\in\Vplus\bigcup\Vtimes$ in the pipeline template, the corresponding vertex $v'\in\Vpplus\bigcup\Vptimes$ is generated. - % Finally, for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows. +% % The \pipelineInstance is generated by traversing the \pipelineTemplate with a breadth-first search algorithm, starting from the root vertex \vi{r}. +% % Then, for each vertex $v\in\Vplus\bigcup\Vtimes$ in the pipeline template, the corresponding vertex $v'\in\Vpplus\bigcup\Vptimes$ is generated. +% % Finally, for each vertex \vi{i}$\in$$\V_S$, a two-step selection approach is applied as follows. diff --git a/pipeline_template.tex b/pipeline_template.tex index 8ea20bc..6820e21 100644 --- a/pipeline_template.tex +++ b/pipeline_template.tex @@ -117,7 +117,7 @@ \subsection{Pipeline Template Definition}\label{sec:templatedefinition} \end{description} \end{definition} - Access control policies $p_j$$\in$\P{i} annotating vertex \vi{i} in a pipeline template $G^{\myLambda,\myGamma}$ are used to filter out those candidate services $s$$\in$$S^c$ that do not match data protection requirements. Specifically, each policy $p_j$$\in$\P{i} is evaluated to verify whether a candidate service $s$$\in$$S^c$ for vertex \vi{i} is compatible with data protection requirements in \P{i} (\myLambda(\vi{i})). Policy evaluation matches the profile of candidate service $s$$\in$$S^c$ with the policy conditions in each $p_j$$\in$\P{i}. If the credentials and attributes in the candidate service profile fails to meet the policy conditions, meaning that no policies $p_j$ are evaluated to \emph{true}, the service is discarded; otherwise it is added to the set $S'$ of compatible service, which is used in Section~\ref{sec:instance} to generate the pipeline instance $G'$. No policy enforcement is done at this stage. + Access control policies $p_j$$\in$\P{i} annotating vertex \vi{i} in a pipeline template $G^{\myLambda,\myGamma}$ are used to filter out those candidate services $s$$\in$$S^c$ that do not match data protection requirements. Specifically, each policy $p_j$$\in$\P{i} is evaluated to verify whether a candidate service $s$$\in$$S^c$ for vertex \vi{i} is compatible with data protection requirements in \P{i} (\myLambda(\vi{i})). Policy evaluation matches the profile \profile\ of candidate service $s$$\in$$S^c$ with the policy conditions in each $p_j$$\in$\P{i}. If the credentials and declarations, defined as a set of attributes in the form (\emph{name}, \emph{value}), in the candidate service profile fails to meet the policy conditions, meaning that no policies $p_j$ are evaluated to \emph{true}, the service is discarded; otherwise it is added to the set $S'$ of compatible service, which is used in Section~\ref{sec:instance} to generate the pipeline instance $G'$. No policy enforcement is done at this stage. \subsection{Functional Annotations}\label{sec:funcannotation} A proper data management approach must track functional data manipulations across the entire pipeline execution, defining the functional requirements of each service operating on data. To this aim, each vertex \vi{i}$\in\V_S$ is annotated with a label \myGamma(\vi{i}), corresponding to the functional description $F_i$ of the service $s_i$ represented by \vi{i}.