From e04edfe6e6f2da47d88745a91c1679ba22332550 Mon Sep 17 00:00:00 2001
From: Claudio Ardagna <claudio.ardagna@unimi.it>
Date: Mon, 29 Apr 2024 16:55:45 +0200
Subject: [PATCH] Section 5 - Claudio

---
 metrics.tex                   | 109 +++++++++++++++++-----------------
 pipeline_instance.tex         |   2 +-
 pipeline_instance_example.tex |   8 +--
 pipeline_template_example.tex |   5 +-
 4 files changed, 65 insertions(+), 59 deletions(-)

diff --git a/metrics.tex b/metrics.tex
index 0bed93a..34f2414 100644
--- a/metrics.tex
+++ b/metrics.tex
@@ -1,67 +1,77 @@
 \section{Maximizing the Pipeline Instance Quality}\label{sec:heuristics}
 %
 % %Ovviamente non è sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore.
-Our goal is to generate a pipeline instance with maximum quality, which addresses data protection requirements with the minimum amount of information loss across the pipeline execution. To this aim, we first discuss the role of some metrics (\cref{subsec:metrics}) to specify and measure data quality, and describe the ones used in the paper.
-Then, we prove that the problem of generating a pipeline instance with maximum quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexity associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed of numerous vertexes and candidate services. Our focus extends beyond identifying optimal combinations, encompassing an understanding of the quality changes introduced during the transformation processes.
+Our goal is to generate a pipeline instance with maximum quality, addressing data protection requirements while minimizing information loss \textit{dloss} throughout the pipeline execution. To this aim, we first discuss the quality metrics used to measure and monitor data quality, which guide the generation of the pipeline instance. Then, we prove that the problem of generating a pipeline instance with maximum quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexity associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed of numerous vertices and candidate services. Our focus extends beyond identifying optimal combinations to encompass an understanding of the quality changes introduced during the transformation processes.
 
 %Inspired by existing literature, these metrics, categorized as quantitative and statistical, play a pivotal role in quantifying the impact of policy-driven transformations on the original dataset.
 
 \subsection{Quality Metrics}\label{subsec:metrics}
 %Ensuring data quality is mandatory to implement data pipelines that provide accurate results and decision-making along the whole pipeline execution. To this aim, we define two metrics evaluating the quality loss introduced by our policy-driven transformation in Section~\cite{ADD} on the input dataset \origdataset at each step of the data pipeline. Our metrics can be classified as \emph{quantitative} and \emph{qualitative}~\cite{ADD}, and compare the input dataset \origdataset\ and dataset \transdataset\ generated by enforcing data protection requirements on \origdataset.
-Ensuring data quality is mandatory to implement data pipelines that provide accurate results and decision-making along the whole pipeline execution. To this aim, quality metrics evaluate the quality loss introduced at each step of the data pipeline, and can be classified as \emph{quantitative} or \emph{qualitative}~\cite{ADD}.
-Quantitative metrics monitor the amount of data lost during data transformations as the quality difference between datasets \origdataset\ and \transdataset.
+Ensuring data quality is mandatory to implement data pipelines that provide accurate results and decision-making along the whole pipeline execution. Quality metrics evaluate the information loss introduced at each step of the data pipeline, and can be classified as \emph{quantitative} or \emph{qualitative}~\cite{ADD}.
+Quantitative metrics monitor the amount of data lost during data transformations to model the quality difference between datasets \origdataset\ and \transdataset.
 Qualitative metrics evaluate changes in the properties of datasets \origdataset\ and \transdataset. For instance, qualitative metrics can measure the changes in the statistical distribution of the two datasets.
 
-In this paper, we provide two metrics, one quantitative and one qualitative, that compare the input dataset \origdataset\ and dataset \transdataset\ generated by enforcing data protection requirements (i.e., our policy-driven transformation in Section~\cite{ADD}) on \origdataset\ at each step of the data pipeline.
+In this paper, we use two metrics, one quantitative and one qualitative, to compare the input dataset \origdataset\ and dataset \transdataset\ generated by enforcing data protection requirements (i.e., our policy-driven transformation in Section~\ref{sec:instance}) on \origdataset\ at each step of the data pipeline. We note that a complete taxonomy of possible metrics is outside the scope of this paper and will be the target of our future work.
 
-\subsubsection{Quantitive metric}
-We propose a metric that enables the measurement of the similarity between two datasets, for this purpose, we use the Jaccard coefficient.
-The Jaccard coefficient is a quantitative metric that can be employed to assess the dissimilarity between the elements in two datasets.
-It is defined as follows:\[J(X,Y) = \frac{|X \cap Y|}{|X \cup Y|}\]
+\subsubsection{Quantitative metric}
+%We propose a metric that measures the similarity between two datasets, for this purpose, we use the Jaccard coefficient.
+We propose a quantitative metric $M_J$ based on the Jaccard coefficient that assesses the similarity between two datasets. The Jaccard coefficient is defined as follows: \[J(X,Y) = \frac{|X \cap Y|}{|X \cup Y|}\]
 where X and Y are two datasets of the same size.
 
-The coefficient is calculated by dividing the cardinality of the intersection of two sets by the cardinality of their union. It ranges from 0 to 1, with 0 indicating no similarity and 1 indicating complete similarity between the datasets.
+The coefficient is calculated by dividing the cardinality of the intersection of two datasets by the cardinality of their union. It ranges from 0 to 1, with 0 indicating no similarity and 1 indicating complete similarity between the datasets. It has several advantages. Unlike other similarity measures, such as Euclidean distance, it is not affected by the magnitude of the values in the dataset. It is suitable for datasets with categorical variables or nominal data, where the values do not have a meaningful numerical interpretation.
 
-This metric has several advantages. Unlike other similarity measures, such as Euclidean distance, it is not affected by the magnitude of the values in the dataset. It is suitable for datasets with categorical variables or nominal data, where the values do not have a meaningful numerical interpretation.
+Metric $M_J$ extends the Jaccard coefficient with weights that model the importance of each element in the dataset as follows:\[M_J(X,Y) = \frac{\sum_{i=1}^{n}w_i(x_i \cap y_i)}{\sum_{i=1}^{n}w_i(x_i \cup y_i)}\]
+where $x_i$$\in$X ($y_i$$\in$Y, resp.) is the $i$-th feature of dataset X (Y, resp.), and $w_i$ the weight modeling the importance of the $i$-th feature.
 
-The Jaccard coefficient can be extended with weights that model the importance of each element in the dataset.
-it is defined as follows:\[\text{Weighted }J(X,Y) = \frac{\sum_{i=1}^{n}w_i(x_i \cap y_i)}{\sum_{i=1}^{n}w_i(x_i \cup y_i)}\]
-where X and Y are two datasets of the same size.
-
-It is computed by dividing the cardinality of the intersection of two datasets by the cardinality of their union, weighted by the importance of each element in the datasets. Weights prioritize certain elements (e.g., a specific feature) in the datasets.
-The Weighted Jaccard coefficent can then account for element importance and provide a more accurate measure of similarity.
+It is computed by dividing the cardinality of the intersection of two datasets by the cardinality of their union, weighted by the importance of each feature in the datasets providing a more accurate measure of similarity. %Weights prioritize certain elements (e.g., a specific feature) in the datasets.
+%The Weighted Jaccard coefficent can then account for element importance and provide a more accurate measure of similarity.
 
 \subsubsection{Qualitative Metric}
-We propose a metric that enables the measurement of the distance of two distributions. The suggested metric is based on the well-known Jensen-Shannon Divergence, which is defined as follows:
-The Jensen-Shannon divergence (JSD) is a qualitative metric that can be used to measure the dissimilarity between the probability distributions of two datasets.
-It is a symmetrized version of the KL divergence~\cite{Fuglede} and is defined as:
+%We propose a metric that enables the measurement of the distance of two distributions. 
+We propose a qualitative metric $M_{JDS}$ based on the Jensen-Shannon Divergence (JSD) that measures the dissimilarity (distance) between the probability distributions of two datasets. 
+
+JSD is a symmetrized version of the KL divergence~\cite{Fuglede} and is applicable to a pair of statistical distributions only. It is defined as follows:
 \[JSD(X, Y) = \frac{1}{2} \left( KL(X || M)
   + KL(Y || M) \right)\]
 %
-where X and Y are two distribution of the same size, and M$=$0.5*(X+Y) is the average distribution.
-JSD incorporates both the KL divergence from X to M and from Y to M. It provides a balanced measure of dissimilarity that is symmetric and accounts for the contribution from both datasets.
-%
-JSD can compare the dissimilarity of the two datasets, providing a symmetric and normalized measure that considers the overall data distribution.
-%
-However, the JSD is applicable solely to statistical distributions and not directly to datasets. Therefore, our metric is computed by applying the JSD to each column of the dataset. The results obtained are then aggregated using a weighted average. The weights are determined by the ratio of distinct elements to the total number of elements in the column, using the following formula:
-\[\text{Weighted JSD} = \sum_{i=1}^n w_i \cdot \text{JSD}_i\]
-where \(w_i = \frac{n_i}{N}\) represents the weight for the \(i\)-th column, with \(n_i\) being the number of distinct elements in that column and \(N\) the total number of elements in the dataset. Each \(\text{JSD}_i\) is the Jensen-Shannon Divergence computed for the \(i\)-th column.
+where X and Y are two distributions of the same size, and M$=$0.5*(X+Y) is the average distribution.
+JSD incorporates both the KL divergence from X to M and from Y to M. 
+
+To make JSD applicable to datasets, where each feature in the dataset has its own statistical distribution, metric $M_{JDS}$ applies JSD to each column of the dataset. The obtained results are then aggregated using a weighted average, thus enabling the prioritization of important features that can be lost during the policy-driven transformation in Section~\ref{sec:heuristics}, as follows: \[M_{JDS} = \sum_{i=1}^n w_i \cdot \text{JSD}(x_i,y_i)\]
+where \(w_i = \frac{n_i}{N}\) represents the weight for the \(i\)-th column, with \(n_i\) being the number of distinct elements in the $i$-th feature and \(N\) the total number of elements in the dataset. Each \(\text{JSD}(x_i,y_i)\) accounts for the Jensen-Shannon Divergence computed for the \(i\)-th feature in datasets X and Y.
+
+$M_{JDS}$ provides a weighted measure of dissimilarity, which is symmetric and accounts for the contribution from both datasets and specific features. It can compare the dissimilarity of the two datasets, providing a symmetric and normalized measure that considers the overall data distributions.
 
-\vspace{0.5em}
 
-We note that our metrics can be applied either to the entire dataset or to specific features only. The features can be assigned with equal or varying importance, providing a weighted version of the metrics, thus enabling the prioritization of important features that might be possibly lost during the policy-driven transformation in Section~\cite{ADD}. A complete taxonomy of possible metrics is however outside the scope of this paper and will be the target of our future work.
+\subsubsection{Information Loss}
+%We note that our metrics can be applied either to the entire dataset or to specific features only. The features can be assigned with equal or varying importance, providing a weighted version of the metrics, thus enabling the prioritization of important features that might be possibly lost during the policy-driven transformation in Section~\cite{ADD}. 
 
-\subsection{NP-Hardness of the Max Quality Pipeline Instantiation Process}\label{sec:nphard}
-\hl{se lo definiamo in maniera formale come il problema di trovare un'istanza valida in accordo alla definizione di istanza tale che non ne esiste una con un loss piu' piccolo?}
+Metrics $M_J$ and $M_{JDS}$ contribute to the calculation of the information loss \textit{dloss} throughout the pipeline execution. It is calculated as the average \emph{AVG} of the information loss at each vertex \vi{i}$\in$$\V_S$ of the service pipeline $G(V,E)$ as follows. 
 
-\begin{definition}[Max Quality Pipeline Instantiation Process]\label{def:MaXQualityInstance}
-  Given \textit{dtloss}$_i$ the value of the quality metric computed after applying the transformation of the policy matching the service selected to instantiate vertex  \vi{i}$\in$$\V_S$, the Max quality \problem is the case in which the \emph{pipeline instantiation} function returns a \pipelineInstance where the \textit{dtloss}$_i$ sum is maximized.
+\begin{definition}[\emph{dloss}]
+  Given a metrics M$\in$$\{M_J,M_{JDS}$\}, information loss \textit{dloss} is calculated as 1$-$\emph{AVG}($M_ij$), with $M_{ij}$ the value of the quality metric retrieved at each vertex \vi{i}$\in$$\V_S$ of the service pipeline $G(V,E)$ according to service \si{j}. 
 \end{definition}
 
-The Max Quality \problem is a combinatorial selection problem and is NP-hard, as stated by Theorem \ref{theorem:NP}. However, while the overall problem is NP-hard, there is a component of the problem that is solvable in polynomial time: matching the profile of each service with the node policy. This can be done by iterating over each node and each service, checking if the service matches the node’s policy. This process would take $O(|N|*|S|)$ time. This is polynomial time complexity.
+We note that \textit{dloss}$_{ij}$$=$1$-$$M_i$ models the quality loss at vertex \vi{i}$\in$$\V_S$ of the service pipeline $G(V,E)$ for service \si{j}. 
+%We also note that information loss \textit{dloss} is used to generate the Max-Quality pipeline instance in the remaining of this section.
+
+\subsection{NP-Hardness of the Max-Quality Pipeline Instantiation Problem}\label{sec:nphard}
+%\hl{se lo definiamo in maniera formale come il problema di trovare un'istanza valida in accordo alla definizione di istanza tale che non ne esiste una con un loss piu' piccolo?}
+The problem of computing a pipeline instance (Definition~\ref{def:instance}) with maximum quality (minimum information loss) can be formally defined as follows.
+
+\begin{definition}[Max-Quality Problem]\label{def:MaXQualityInstance}
+  Given a pipeline template $G^{\myLambda,\myGamma}$ and a set $S^c$ of candidate services, find a max-quality pipeline instance $G'$ such that: 
+  \begin{itemize}
+    \item $G'$ satisfies conditions in Definition~\ref{def:instance},
+    \item $\nexists$ a pipeline instance $G''$ that satisfies conditions in Definition~\ref{def:instance} and such that information loss \textit{dtloss}($G''$)$<$\textit{dtloss}($G'$), where \textit{dtloss}($\cdot$) is the information loss throughout the pipeline execution.
+    %computed after applying the transformation of the policy matching the service selected to instantiate vertex  \vi{i}$\in$$\V_S$, .
+  \end{itemize}
+\end{definition}
+
+The Max Quality \problem is a combinatorial selection problem and is NP-hard, as stated by Theorem \ref{theorem:NP}. However, while the overall problem is NP-hard, there is a component of the problem that is solvable in polynomial time: matching the profile of each service with the corresponding vertex policy. This can be done by iterating over each vertex and each service, checking if the service matches the vertex policy. This process would take $O(|N|*|S|)$ time. This is polynomial time complexity.
 
 \begin{theorem}\label{theorem:NP}
-  The Max Quality  \problem is NP-Hard.
+  The Max-Quality \problem is NP-Hard.
 \end{theorem}
 \emph{Proof: }
 The proof is a reduction from the multiple-choice knapsack problem (MCKP), a classified NP-hard combinatorial optimization problem, which is a generalization of the simple knapsack problem (KP) \cite{}. In the MCKP problem, there are $t$ mutually disjoint classes $N_1,N_2,\ldots,N_t$ of items to pack in some knapsack of capacity $C$, class $N_i$ having size $n_i$. Each item $j$$\in$$N_i$ has a profit $p_{ij}$ and a weight $w_{ij}$; the problem is to choose one item from each class such that the profit sum is maximized without having the weight sum to exceed C.
@@ -72,19 +82,11 @@ \subsection{NP-Hardness of the Max Quality Pipeline Instantiation Process}\label
 
 
 \begin{example}[Max-Quality Pipeline Instance]
-  consider a subset \{\vi{5}, \vi{6}, \vi{7}\} of the pipeline template \tChartFunction in \cref{sec:example_instace}.
-  Each vertex is associated with three candidate services, each having a profile. The filtering algorithm matches each candidate service's profile with the policies annotating the corresponding vertex. It returns the set of services whose profile matches a policy.
-
-  The comparison algorithm is then applied to the set of services $S'_*$ and it returns a ranking of the services.
-  The ranking is based on the amount of data that is anonymized by the service.
-  The ranking is listed in \cref{tab:instance_example_maxquality} (b) and it is based on the transformation function of the policies,
-  assuming that a more restrictive transformation function anonymizes more data affecting negatively the position in the ranking.
+  Let us start from Example~\ref{ex:instance} and extend it with the comparison algorithm in Section~\ref{sec:instance} built on \emph{dloss}. The comparison algorithm is applied to the set of services $S'_*$ and returns three service rankings one for each vertex \vi{4}, \vi{5}, \vi{6} according to the amount of data anonymized.
+  The ranking is listed in \cref{tab:instance_example_maxquality}(b) and based on the transformation function in the policies. We assume that the more restrictive the transformation function (i.e., it anonymizes more data), the lower is the service position in the ranking.
   For example, \s{11} is ranked first because it anonymizes less data than \s{12} and \s{13}.
   The ranking of \s{22} and \s{23} is based on the same logic.
-  Finally, the ranking of \s{31}, \s{32} is influenced by the environment state at the time of the ranking.
-  For example, if the environment in which the visualization is performed is a CT facility, then \s{31} is ranked first and \s{32} second;
-  thus because the facility is considered a less risky environment than the cloud.
-
+  Finally, the ranking of \s{31} and \s{32} is affected by the environment state at the time of the ranking.   For example, if the environment where the visualization is performed is a CT facility, then \s{31} is ranked first and \s{32} second because the facility is considered less risky than the cloud.
 \end{example}
 
 % The metrics established will enable the quantification of data loss pre- and post-transformations.
@@ -99,9 +101,10 @@ \subsection{Heuristic}\label{subsec:heuristics}
 %The computational challenge posed by the enumeration of all possible combinations within a given set is a well-established NP-hard problem.}
 %The exhaustive exploration of such combinations swiftly becomes impractical in terms of computational time and resources, particularly when dealing with the analysis of complex pipelines.
 %In response to this computational complexity, the incorporation of heuristic emerges as a strategy to try to efficiently address the problem.
-\hl{HO RIVISTO IL PARAGRAFO VELOCEMENTE GIUSTO PER DARE UN'INDICAZIONE. DOBBIAMO USARE LA FORMALIZZAZIONE E MAGARI FORMALIZZARE ANCHE LO PSEUDOCODICE.} We design and implement a heuristic algorithm for computing the pipeline instance maximizing data quality. Our heuristic is built on a \emph{sliding window} and aims to minimize information loss according to quality metrics. At each step, a set of vertexes in the pipeline template $\tChartFunction$ is selected according to a specific window size w=[i,j], where $i$ and $j$ are the starting and ending depth of window w. Service filtering and selection in Section~\ref{sec:instance} are then executed to minimize information loss in window w. The heuristic returns as output the list of services instantiating vertexes at depth $i$. A new window w=[i+1,j+1] is considered until $j$+1 is equal to the max depth of $\tChartFunction$, that is the window reaches the end of the template.
+%\hl{HO RIVISTO IL PARAGRAFO VELOCEMENTE GIUSTO PER DARE UN'INDICAZIONE. DOBBIAMO USARE LA FORMALIZZAZIONE E MAGARI FORMALIZZARE ANCHE LO PSEUDOCODICE.} 
+We design and implement a heuristic algorithm for computing the pipeline instance maximizing data quality. Our heuristic is built on a \emph{sliding window} and aims to minimize information loss according to quality metrics. At each step, a set of vertices in the pipeline template $\tChartFunction$ is selected according to a specific window size w=[i,j], where $i$ and $j$ are the starting and ending depth of window w. Service filtering and selection in Section~\ref{sec:instance} are then executed to minimize information loss in window w. The heuristic returns as output the list of services instantiating vertexes at depth $i$. A new window w=[i+1,j+1] is considered until $j$+1 is equal to the max depth of $\tChartFunction$, that is, the window reaches the end of the template.
 %For example, in our service selection problem where the quantity of information lost needs to be minimized, the sliding window algorithm can be used to select services composition that have the lowest information loss within a fixed-size window.
-This strategy ensures that only services with low information loss are selected at each step, minimizing the overall information loss. Pseudo-code for the sliding window algorithm is presented in Algorithm 1.
+This strategy ensures that only services with low information loss are selected at each step, minimizing the \hl{overall o average?} information loss. Pseudo-code for the sliding window algorithm is presented in Algorithm 1.
 
 \lstset{ %
   backgroundcolor=\color{white},   % choose the background color; you must add \usepackage{color} or \usepackage{xcolor}
@@ -152,11 +155,11 @@ \subsection{Heuristic}\label{subsec:heuristics}
   return instance
   \end{lstlisting}
 
-The pseudocode implemets function {\em SlidingWindowHeuristic}, which takes a sequence of vertexes and a window size as input and returns a set of selected vertexes as output. The function starts by initializing an empty set of selected vertexes (line 3). Then, for each node in the sequence (lines 4--12), the algorithm iterates over the vertexes in the window (lines 7--11) and selects the node with the lowest metric value (lines 9-11). The selected node is then added to the set of selected vertexes (line 12). Finally, the set of selected vertexes is returned as output (line 13).
+\hl{NON CHIARA, COSA SONO NODE?}
+The pseudocode implements function {\em SlidingWindowHeuristic}, which takes a sequence of vertices and a window size as input and returns a set of selected vertices as output. The function starts by initializing an empty set of selected vertices (line 3). Then, for each node in the sequence (lines 4--12), the algorithm iterates over the vertices in the window (lines 7--11) and selects the node with the lowest metric value (lines 9-11). The selected node is then added to the set of selected vertices (line 12). Finally, the set of selected vertices is returned as output (line 13).
 
-We note that a window of size 1 corresponds to the \emph{greedy} approach, while a window of size N, where N represents the total number of vertexes, corresponds to the \emph{exhaustive} method.
+We note that a window of size 1 corresponds to the \emph{greedy} approach, while a window of size N, where N represents the total number of vertices, corresponds to the \emph{exhaustive} method.
 
-The utilization of heuristic in service selection can be enhanced through the incorporation of techniques derived from other algorithms, such as Ant Colony Optimization or Tabu Search.
-By integrating these approaches, it becomes feasible to achieve a more effective and efficient selection of services, with a specific focus on eliminating paths that have previously been deemed unfavorable.
+The heuristic for service selection can be enhanced through the integration of other optimization algorithms, such as Ant Colony Optimization or Tabu Search. By integrating these approaches, it becomes feasible to achieve a more effective and efficient selection of services, with a specific focus on deleting paths previously deemed unfavorable.
 
 %\AG{It is imperative to bear in mind that the merging operations subsequent to the selection process and the joining operations subsequent to the branching process are executed with distinct objectives. In the former case, the primary aim is to optimize quality, whereas in the latter, the foremost objective is to minimize it.}
diff --git a/pipeline_instance.tex b/pipeline_instance.tex
index 40aef1f..13a7370 100644
--- a/pipeline_instance.tex
+++ b/pipeline_instance.tex
@@ -26,7 +26,7 @@ \section{Pipeline Instance}\label{sec:instance}
   \begin{enumerate}
 
     \item \textit{Filtering Algorithm} -- The filtering algorithm checks whether profile \profile$_j$ of each candidate service $\si{j}$$\in$$S^c_{i}$ satisfies at least one policy in \P{i}. If yes,  service $\si{j}$ is compatible, otherwise it is discarded. The filtering algorithm finally returns a subset $S'_{i}$$\subseteq$$S^c_{i}$ of compatible services for each vertex \vi{i}$\in$$\V_S$.
-    \item \textit{Selection Algorithm} -- The selection algorithm selects one service $s'_i$ for each set $S'_{i}$ of compatible services and instantiates the corresponding vertex $\vii{i}$$\in$$\Vp$ with it. There are many ways of choosing $s'_i$, we present our approach based on the minimization of quality loss in Section \ref{sec:heuristics}.
+    \item \textit{Selection Algorithm} -- The selection algorithm selects one service $s'_i$ for each set $S'_{i}$ of compatible services and instantiates the corresponding vertex $\vii{i}$$\in$$\Vp$ with it. There are many ways of choosing $s'_i$, we present our approach based on the minimization of information loss \emph{dloss} in Section \ref{sec:heuristics}.
   \end{enumerate}
 
   When all vertices $\vi{i}$$\in$$V$ in $G^{\myLambda,\myGamma}$ have been visited, the \pipelineInstance G' is finalized, with a service instance $s'_i$ for each \vii{i}$\in$\Vp. Vertex \vii{i} is still annotated with policies in \P{i} according to \myLambda, because policies in \P{i} are evaluated and enforced only when the pipeline instance is triggered, before any service is executed. In case policy evaluation returns \emph{true}, data transformation \TP$\in$\P{i} is applied, otherwise a default transformation that removes all data is applied.
diff --git a/pipeline_instance_example.tex b/pipeline_instance_example.tex
index beb4f18..19971f9 100644
--- a/pipeline_instance_example.tex
+++ b/pipeline_instance_example.tex
@@ -1,8 +1,8 @@
-\subsection{Example}\label{sec:example_instace}
+%\subsection{Example}\label{sec:example_instace}
 
-%\begin{example}[\bf \pipelineInstance]\label{ex:instance}
+\begin{example}[\bf \pipelineInstance]\label{ex:instance}
 
-  Let us consider a subset \{\vi{5}, \vi{6}, \vi{7}\} of the pipeline template $G^{\myLambda,\myGamma}$ in \cref{sec:example_instace}.
+  Let us consider a subset \{\vi{5}, \vi{6}, \vi{7}\} of the pipeline template $G^{\myLambda,\myGamma}$ in Example~\ref{ex:template}.
 
   As presented in Table~\ref{tab:exisnt}(a), each vertex is labeled with policies (column \emph{candidate--$>$policy}) and then associated with different candidate services (column \emph{candidate}) and corresponding profile (column \emph{profile}). The filtering algorithm matches each candidate service profile with the policies in Table~\ref{tab:anonymization} annotating the corresponding vertex. It returns the set of services whose profile matches a policy (column \emph{filtering}):
   \begin{enumerate*}[label=\textit{\roman*})]
@@ -83,5 +83,5 @@ \subsection{Example}\label{sec:example_instace}
 
 
 
-%\end{example}
+\end{example}
 
diff --git a/pipeline_template_example.tex b/pipeline_template_example.tex
index d4bdee3..41de5be 100644
--- a/pipeline_template_example.tex
+++ b/pipeline_template_example.tex
@@ -1,4 +1,4 @@
-\subsection{Example}\label{sec:example_template}
+%\subsection{Example}\label{sec:example_template}
 
 \begin{table*}[ht!]
   \def\arraystretch{1.5}
@@ -34,6 +34,8 @@ \subsection{Example}\label{sec:example_template}
     \end{tabular}
   }
 \end{table*}
+
+\begin{example}[\bf \pipelineTemplate]\label{ex:template}
 Let us consider our reference scenario in \cref{sec:systemmodel}.
 \cref{fig:service_composition_template} presents an example of pipeline template consisting of five stages, each one annotated with a policy in \cref{tab:anonymization}.
 % We recall that \cref{tab:dataset} shows a sample of our reference dataset.
@@ -74,3 +76,4 @@ \subsection{Example}\label{sec:example_template}
 Functional requirement \F{8} prescribes a dataset as input and data visualization interface (possibly in the form of JSON file) as output.
 
 %In summary, this tion has delineated a comprehensive pipeline template. This illustrative pipeline serves as a blueprint, highlighting the role of policy implementation in safeguarding data protection across diverse operational stages.
+\end{example}
\ No newline at end of file