wip

SESARLab · Mar 27, 2024 · 36605b4 · 36605b4
1 parent 6277695
commit 36605b4
Show file tree

Hide file tree

Showing 2 changed files with 27 additions and 32 deletions.
diff --git a/experiment.tex b/experiment.tex
@@ -1,10 +1,12 @@
 \section{Experiments}\label{sec:experiment}
 
 We experimentally evaluated the performance and quality of our methodology,
-and corresponding heuristic implementation in Section~\ref{TOADD}, and compare them against the exhaustive approach in Section~\ref{TOADD}.
-In the following, Section~\ref{TOADD} presents the simulator and testing infrastructure adopted in our experiments, as well as the complete experimental settings; Section~\ref{TOADD} analyses the performance of our solution in terms of execution time; Section~\ref{TOADD} presents the quality of our heuristic algorithm in terms of the metrics in Section~\ref{TOADD}.
+and corresponding heuristic implementation in \cref{subsec:heuristics},
+and compare them against the exhaustive approach in Section~\ref{TOADD}.
+In the following,
+\cref{subsec:experiments_infrastructure} presents the simulator and testing infrastructure adopted in our experiments, as well as the complete experimental settings; \cref{subsec:experiments_performance} analyses the performance of our solution in terms of execution time; \cref{subsec:experiments_quality} presents the quality of our heuristic algorithm in terms of the metrics in \cref{subsec:metrics}.
 
-\subsection{Testing Infrastructure and Experimental Settings}
+\subsection{Testing Infrastructure and Experimental Settings}\label{subsec:experiments_infrastructure}
 Our testing infrastructure is a Swift-based simulator of a service-based ecosystem, including service execution, comparison, and composition.
 Upon setting the sliding window size, the simulator selects a subset of nodes along with their corresponding candidate services.
 It then generates all possible service combinations for the chosen nodes.
@@ -83,7 +85,7 @@ \subsection{Testing Infrastructure and Experimental Settings}
   \label{fig:service_composition_instance}
 \end{figure}
 
-\subsection{Perfomance}
+\subsection{Perfomance}\label{subsec:experiments_performance}
 % \subsection{performance}
 % \begin{itemize}
 %   \item Finestra scorrevole da 1 a N=Nodi
@@ -111,25 +113,18 @@ \subsection{Perfomance}
 offering a clear visual confirmation of the heuristic's efficiency in decreasing computational time.
 
 
-\subsection{Quality}
+\subsection{Quality}\label{subsec:experiments_quality}
 We finally evaluated the quality of our heuristic comparing, where possible,
 its results with the optimal solution retrieved by executing the exhaustive approach.
 The latter executes with window size equals to the number of services per node and provides the best,
 among all possible, solution.
 
-The number of nodes has been varied from 3 to 7, while the number of services per node has been set from 2 to 6.
-The window size has been set from 1 (greedy) to the number of nodes (exhaustive).
+% \hl{DOBBIAMO SPIEGARE COSA ABBIAMO VARIATO NEGLI ESPERIMENTI E COME, WINDOW SIZE, NODI, ETC.
 
-
-
-
-
-\hl{DOBBIAMO SPIEGARE COSA ABBIAMO VARIATO NEGLI ESPERIMENTI E COME, WINDOW SIZE, NODI, ETC.
-
-  LE IMMAGINI CHE ABBIAMO SONO SOLO QUELLE 5? POSSIAMO ANCHE INVERTIRE GLI ASSI E AGGIUNGERE VISUALI DIVERSE}
+%   LE IMMAGINI CHE ABBIAMO SONO SOLO QUELLE 5? POSSIAMO ANCHE INVERTIRE GLI ASSI E AGGIUNGERE VISUALI DIVERSE}
 
 \cref{fig:quality_window} presents our results
-In the figure, there are five charts, each representing a different number of nodes, ranging from 3 to 7.
+In the figure each chart represents a configuration with a specific number of nodes, ranging from 3 to 7.
 On the x-axis of each chart, the number of services is plotted, which ranges from 2 to 6.
 The y-axis represents the metric value, which varies across the charts.
 Each chart shows different window sizes, labeled as W Size 1, W Size 2, and so on, corresponding to various metric values.

diff --git a/metrics.tex b/metrics.tex
@@ -6,7 +6,7 @@ \section{Maximizing the Pipeline Instance Quality}\label{sec:heuristics}
 
 %Inspired by existing literature, these metrics, categorized as quantitative and statistical, play a pivotal role in quantifying the impact of policy-driven transformations on the original dataset.
 
-\subsection{Quality Metrics}\label{sec:metrics}
+\subsection{Quality Metrics}\label{subsec:metrics}
 %Ensuring data quality is mandatory to implement data pipelines that provide accurate results and decision-making along the whole pipeline execution. To this aim, we define two metrics evaluating the quality loss introduced by our policy-driven transformation in Section~\cite{ADD} on the input dataset \origdataset at each step of the data pipeline. Our metrics can be classified as \emph{quantitative} and \emph{qualitative}~\cite{ADD}, and compare the input dataset \origdataset\ and dataset \transdataset\ generated by enforcing data protection requirements on \origdataset.
 Ensuring data quality is mandatory to implement data pipelines that provide accurate results and decision-making along the whole pipeline execution. To this aim, quality metrics evaluate the quality loss introduced at each step of the data pipeline, and can be classified as \emph{quantitative} or \emph{qualitative}~\cite{ADD}.
 Quantitative metrics monitor the amount of data lost during data transformations as the quality difference between datasets \origdataset\ and \transdataset.
@@ -59,27 +59,27 @@ \subsection{NP-Hardness of the Max Quality Pipeline Instantiation Process}\label
 \emph{Proof: }
 The proof is a reduction from the multiple-choice knapsack problem (MCKP), a classified NP-hard combinatorial optimization problem, which is a generalization of the simple knapsack problem (KP) \cite{}. In the MCKP problem, there are $t$ mutually disjoint classes $N_1,N_2,\ldots,N_t$ of items to pack in some knapsack of capacity $C$, class $N_i$ having size $n_i$. Each item $j$$\in$$N_i$ has a profit $p_{ij}$ and a weight $w_{ij}$; the problem is to choose one item from each class such that the profit sum is maximized without having the weight sum to exceed C.
 
-    The MCKP can be reduced to the Max quality \problem in polynomial time, with $N_1,N_2,\ldots,N_t$ corresponding to $S^c_{1}, S^c_{1}, \ldots, S^c_{u},$, $t$$=$$u$ and $n_i$ the size of $S^c_{i}$. The profit $p_{ij}$ of item $j$$\in$$N_i$ corresponds to \textit{dtloss}$_{ij}$ computed for each candidate service $s_j$$\in$$S^c_{i}$, while $w_{ij}$ is uniformly 1 (thus, C is always equal to the cardinality of $V_C$).
+The MCKP can be reduced to the Max quality \problem in polynomial time, with $N_1,N_2,\ldots,N_t$ corresponding to $S^c_{1}, S^c_{1}, \ldots, S^c_{u},$, $t$$=$$u$ and $n_i$ the size of $S^c_{i}$. The profit $p_{ij}$ of item $j$$\in$$N_i$ corresponds to \textit{dtloss}$_{ij}$ computed for each candidate service $s_j$$\in$$S^c_{i}$, while $w_{ij}$ is uniformly 1 (thus, C is always equal to the cardinality of $V_C$).
 
-    Since the reduction can be done in polynomial time, our problem is also NP-hard. (non è sufficiente, bisogna provare che la soluzione di uno e' anche soluzione dell'altro)
+Since the reduction can be done in polynomial time, our problem is also NP-hard. (non è sufficiente, bisogna provare che la soluzione di uno e' anche soluzione dell'altro)
 
 
-    \begin{example}[Max-Quality Pipeline Instance]
-    \end{example}
+\begin{example}[Max-Quality Pipeline Instance]
+\end{example}
 
-    % The metrics established will enable the quantification of data loss pre- and post-transformations.
-    % In the event of multiple service interactions, each with its respective transformation,
-    % efforts will be made to minimize the loss of information while upholding privacy and security standards.
-    % Due to the exponential increase in complexity as the number of services and transformations grow,
-    % identifying the optimal path is inherently an NP-hard problem.
-    % As such, we propose some heuristics to approximate the optimal path as closely as possible.
-    %To evaluate their efficacy, the heuristically generated paths will be compared against the optimal solution.
+% The metrics established will enable the quantification of data loss pre- and post-transformations.
+% In the event of multiple service interactions, each with its respective transformation,
+% efforts will be made to minimize the loss of information while upholding privacy and security standards.
+% Due to the exponential increase in complexity as the number of services and transformations grow,
+% identifying the optimal path is inherently an NP-hard problem.
+% As such, we propose some heuristics to approximate the optimal path as closely as possible.
+%To evaluate their efficacy, the heuristically generated paths will be compared against the optimal solution.
 
-    \subsection{Heuristic}\label{subsec:heuristics}
-    %The computational challenge posed by the enumeration of all possible combinations within a given set is a well-established NP-hard problem.}
-    %The exhaustive exploration of such combinations swiftly becomes impractical in terms of computational time and resources, particularly when dealing with the analysis of complex pipelines.
-    %In response to this computational complexity, the incorporation of heuristic emerges as a strategy to try to efficiently address the problem.
-    \hl{HO RIVISTO IL PARAGRAFO VELOCEMENTE GIUSTO PER DARE UN'INDICAZIONE. DOBBIAMO USARE LA FORMALIZZAZIONE E MAGARI FORMALIZZARE ANCHE LO PSEUDOCODICE.} We design and implement a heuristic algorithm for computing the pipeline instance maximizing data quality. Our heuristic is built on a \emph{sliding window} and aims to minimize information loss according to quality metrics. At each step, a set of nodes in the pipeline template $\tChartFunction$ is selected according to a specific window w=[i,j], where $i$ and $j$ are the starting and ending depth of window w. Service filtering and selection in Section~\ref{sec:instance} are then executed to minimize information loss in window w. The heuristic returns as output the list of services instantiating nodes at depth $i$. A new window w=[i+1,j+1] is considered until $j$+1 is equal to the max depth of $\tChartFunction$, that is the window reaches the end of the template.
+\subsection{Heuristic}\label{subsec:heuristics}
+%The computational challenge posed by the enumeration of all possible combinations within a given set is a well-established NP-hard problem.}
+%The exhaustive exploration of such combinations swiftly becomes impractical in terms of computational time and resources, particularly when dealing with the analysis of complex pipelines.
+%In response to this computational complexity, the incorporation of heuristic emerges as a strategy to try to efficiently address the problem.
+\hl{HO RIVISTO IL PARAGRAFO VELOCEMENTE GIUSTO PER DARE UN'INDICAZIONE. DOBBIAMO USARE LA FORMALIZZAZIONE E MAGARI FORMALIZZARE ANCHE LO PSEUDOCODICE.} We design and implement a heuristic algorithm for computing the pipeline instance maximizing data quality. Our heuristic is built on a \emph{sliding window} and aims to minimize information loss according to quality metrics. At each step, a set of nodes in the pipeline template $\tChartFunction$ is selected according to a specific window w=[i,j], where $i$ and $j$ are the starting and ending depth of window w. Service filtering and selection in Section~\ref{sec:instance} are then executed to minimize information loss in window w. The heuristic returns as output the list of services instantiating nodes at depth $i$. A new window w=[i+1,j+1] is considered until $j$+1 is equal to the max depth of $\tChartFunction$, that is the window reaches the end of the template.
 %For example, in our service selection problem where the quantity of information lost needs to be minimized, the sliding window algorithm can be used to select services composition that have the lowest information loss within a fixed-size window.
 This strategy ensures that only services with low information loss are selected at each step, minimizing the overall information loss. Pseudo-code for the sliding window algorithm is presented in Algorithm 1.