From 4d8605b408ec4496f32a5239423cd5cbd13c410f Mon Sep 17 00:00:00 2001 From: Antongiacomo Polimeno Date: Tue, 7 May 2024 02:03:31 +0200 Subject: [PATCH] Updated tables --- experiment.tex | 2 +- macro.tex | 4 +++- metrics.tex | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/experiment.tex b/experiment.tex index 6cd09ca..0d39de6 100644 --- a/experiment.tex +++ b/experiment.tex @@ -14,7 +14,7 @@ \subsection{Testing Infrastructure and Experimental Settings}\label{subsec:exper The simulator then starts the instantiation process as shown in Figure~\ref{fig:execution_example}. At each step $i$, it selects the subset \{\vi{i},$\ldots$,$v_{\windowsize+i-1}$\} of vertices with their corresponding candidate services, and generates all possible service combinations. For each combination, the simulator calculates a given metric $M$ and selects the service that instantiates \vi{i} from the optimal combination according to $M$. The window is shifted step 1 (i.e., $i$=$i$+1) and the instantiation process restart. When the sliding window reach the end of the pipeline template, that is, $v_{\windowsize+i-1}$$=$$\vi{l}$, the simulator computes the optimal service combination and instantiates the remaining vertices with the corresponding services. -We note that a hash function randomly simulates the natural interdependence between services, modeling a data removal on one service that might impact another one. \hl{LA SPECIFICHIAMO UN PO' MEGLIO?} %By assigning weights to the services using this function, the system aims to reflect the interconnected dynamics among the services. +It is reasonable to assume that within a service pipeline, any data modification made at an earlier stage could affect the performance of the service at the subsequent steps, making the services inderdependent. Consider, for example, the removal of data from the ''name`` feature; a service that relies on that column will be more significantly affected by its removal compared to a service that does not use it. During its execution, the simulator employs a specific combination of services as the seed to assign weights to the services. This reflects how changes in one service might influence others, as previously described. By assigning weights to the services using this approach, the system aims to reflect the interconnected dynamics among the services. %The simulator is used to assess the performance and quality of our sliding window heuristic in Section \ref{sec:heuristics} for the generation of the best pipeline instance (Section \ref{sec:instance}). % Performance measures the heuristics execution time in different settings, while quality compares the results provided by our heuristics in terms of selected services with the optimal solution retrieved using the exhaustive approach. diff --git a/macro.tex b/macro.tex index 96ccb2a..9cbd271 100644 --- a/macro.tex +++ b/macro.tex @@ -63,7 +63,9 @@ \newcommand{\pipeline}{Pipeline\xspace} \newcommand{\pipelineTemplate}{Pipeline Template\xspace} \newcommand{\pipelineInstance}{Pipeline Instance\xspace} - +\newcommand{\quality}{quality\xspace} +\newcommand{\Quality}{Quality\xspace} +\newcommand{\q}{$q$\xspace} \newcommand{\pone}{$(service\_owner=dataset\_owner)$} \newcommand{\ptwo}{$(service\_owner=partner(dataset\_owner))$} \newcommand{\pthree}{$\langle service\_owner \neq dataset\_owner AND owner \neq partner(dataset\_owner)$} diff --git a/metrics.tex b/metrics.tex index 812e349..8df0499 100644 --- a/metrics.tex +++ b/metrics.tex @@ -1,7 +1,7 @@ \section{Maximizing the Pipeline Instance Quality}\label{sec:heuristics} % % %Ovviamente non รจ sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore. -Our goal is to generate a pipeline instance with maximum quality, addressing data protection requirements while minimizing information loss \textit{dloss} throughout the pipeline execution. To this aim, we first discuss the quality metrics used to measure and monitor data quality, which guide the generation of the pipeline instance. Then, we prove that the problem of generating a pipeline instance with maximum quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexity associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed of numerous vertices and candidate services. Our focus extends beyond identifying optimal combinations to encompass an understanding of the quality changes introduced during the transformation processes. +Our goal is to generate a pipeline instance with maximum quality, addressing data protection requirements while maximizing \textit{\quality (\q)} throughout the pipeline execution. To this aim, we first discuss the quality metrics used to measure and monitor data quality, which guide the generation of the pipeline instance. Then, we prove that the problem of generating a pipeline instance with maximum quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexity associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed of numerous vertices and candidate services. Our focus extends beyond identifying optimal combinations to encompass an understanding of the quality changes introduced during the transformation processes. %Inspired by existing literature, these metrics, categorized as quantitative and statistical, play a pivotal role in quantifying the impact of policy-driven transformations on the original dataset.