Skip to content

Commit

Permalink
Sections 5, 6 intro, 6.2 - Claudio.
Browse files Browse the repository at this point in the history
  • Loading branch information
cardagna committed Feb 28, 2024
1 parent 420fc91 commit 97bde09
Show file tree
Hide file tree
Showing 4 changed files with 150 additions and 149 deletions.
1 change: 1 addition & 0 deletions macro.tex
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
% \newcommand{\TF}{\ensuremath{T_{\fChartFunction}}}
\newcommand{\user}{user\,}
\newcommand{\User}{User\,}
\newcommand{\profile}{\emph{prf}}

\newcommand{\fChartFunction}{\ensuremath{\myLambda{}}}

Expand Down
17 changes: 8 additions & 9 deletions metrics.tex
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
\section{Maximizing the Pipeline Instance Quality}\label{sec:heuristics}

%
%
% %Ovviamente non è sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore.
The goal of this paper is to produce a pipeline instance with maximum quality, i.e., guaranteeing a high level of data protection but at the same time the minimum amount of information lost across the pipeline. To this aim, we first discuss the crucial role of well-defined metrics (\cref{sec:metrics}) to specify and measure data quality, and describe the ones that will be used in the paper.
Then, we prove that the problem of maximizing the pipeline instance quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexities associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed numerous nodes and candidate services.
Our goal is to generate a pipeline instance with maximum quality, which addresses data protection requirements with the minimum amount of information loss across the pipeline. To this aim, we first discuss the crucial role of well-defined metrics (\cref{sec:metrics}) to specify and measure data quality, and describe the ones used in the paper.
Then, we prove that the problem of generating a pipeline instance with maximum quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexities associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed of numerous nodes and candidate services.
Our focus extends beyond identifying optimal combinations, encompassing an understanding of the quality changes introduced during the transformation processes.

%Inspired by existing literature, these metrics, categorized as quantitative and statistical, play a pivotal role in quantifying the impact of policy-driven transformations on the original dataset.
Expand Down Expand Up @@ -61,21 +60,21 @@ \subsubsection{Weighted Jensen-Shannon Divergence}
By incorporating weights into the JSD calculation, WJSD provides a more accurate measure of dissimilarity between X and Y, considering the importance of individual elements based on the assigned weights. This approach is particularly useful when elements in the datasets have varying levels of significance, enabling a more tailored analysis of dissimilarity.

\subsection{NP-Hardness of the Max Quality Pipeline Instantiation Process}\label{sec:nphard}
\hl{se lo definiamo in maniera formale come il problema di trovare un'istanza valida in accordo alla definizione di istanza tale che non ne esiste una con un loss piu' piccolo?}

\begin{definition}[Max Quality Pipeline Instantiation Process]\label{def:MaXQualityInstance}
\begin{definition}[Max Quality Pipeline Instantiation Process]\label{def:MaXQualityInstance}
Given \textit{dtloss}$_i$ the value of the quality metric computed after applying the transformation of the policy matching the service selected to instantiate vertex \vi{i}$\in$$\V_S$, the Max quality \problem is the case in which the \emph{pipeline instantiation} function returns a \pipelineInstance where the \textit{dtloss}$_i$ sum is maximized.
\end{definition}

The Max Quality \problem is a combinatorial selection problem and is NP-hard, as stated by theorem \ref{theorem:NP}.
However, while the overall problem is NP-hard, there is a component of the problem that is solvable in polynomial time: matching the profile of each service with the node policy. This can be done by iterating over each node and each service, checking if the service matches the node’s policy. This process would take $O(|N|*|S|)$ time. This is polynomial time complexity.
The Max Quality \problem is a combinatorial selection problem and is NP-hard, as stated by Theorem \ref{theorem:NP}. However, while the overall problem is NP-hard, there is a component of the problem that is solvable in polynomial time: matching the profile of each service with the node policy. This can be done by iterating over each node and each service, checking if the service matches the node’s policy. This process would take $O(|N|*|S|)$ time. This is polynomial time complexity.

\begin{theorem}\label{theorem:NP}
The Max Quality \problem is NP-Hard.
\end{theorem}
\emph{Proof: }
The proof is a reduction from the multiple-choice knapsack problem (MCKP), a classified NP-hard combinatorial optimization problem, which is a generalization of the simple knapsack problem (KP) \cite{}. In the MCKP problem, there are $t$ mutually disjoint classes $N_1,N_2,N_t$ of items to pack in some knapsack of capacity $C$, class $N_i$ having size $n_i$. Each item $j \in N_i$ has a profit $p_{ij}$ and a weight $w_{ij}$; the problem is to choose one item from each class such that the profit sum is maximized without having the weight sum to exceed C.
The proof is a reduction from the multiple-choice knapsack problem (MCKP), a classified NP-hard combinatorial optimization problem, which is a generalization of the simple knapsack problem (KP) \cite{}. In the MCKP problem, there are $t$ mutually disjoint classes $N_1,N_2,\ldots,N_t$ of items to pack in some knapsack of capacity $C$, class $N_i$ having size $n_i$. Each item $j$$\in$$N_i$ has a profit $p_{ij}$ and a weight $w_{ij}$; the problem is to choose one item from each class such that the profit sum is maximized without having the weight sum to exceed C.

The MCKP can be reduced to the Max quality \problem in plynomial time, with $N_1,N_2,N_t$ corresponding to $S^c_{1}, S^c_{1}, ..., S^c_{u},$, $t=u$ and $n_i$ the size of $S^c_{i}$. The profit $p_{ij}$ of item $j \in N_i$ corresponds to \textit{dtloss}$_{ij}$ computed for each candidate service $s_j \in S^c_{i}$, while $w_{ij}$ is uniformly 1 (thus, C is always equal to the cardinality of $V_C$).
The MCKP can be reduced to the Max quality \problem in polynomial time, with $N_1,N_2,\ldots,N_t$ corresponding to $S^c_{1}, S^c_{1}, \ldots, S^c_{u},$, $t$$=$$u$ and $n_i$ the size of $S^c_{i}$. The profit $p_{ij}$ of item $j$$\in$$N_i$ corresponds to \textit{dtloss}$_{ij}$ computed for each candidate service $s_j$$\in$$S^c_{i}$, while $w_{ij}$ is uniformly 1 (thus, C is always equal to the cardinality of $V_C$).

Since the reduction can be done in polynomial time, our problem is also NP-hard. (non è sufficiente, bisogna provare che la soluzione di uno e' anche soluzione dell'altro)

Expand Down
Loading

0 comments on commit 97bde09

Please sign in to comment.