From 5ab27f92b2e9db476a8dc6960b2f948ce190429f Mon Sep 17 00:00:00 2001 From: Claudio Ardagna Date: Mon, 13 May 2024 10:23:35 +0200 Subject: [PATCH] Sezione 6 - Claudio --- experiment.tex | 2 +- metrics.tex | 58 +++++++++++++++++++++++++++----------------------- 2 files changed, 32 insertions(+), 28 deletions(-) diff --git a/experiment.tex b/experiment.tex index ab2fa5c..1165e3f 100644 --- a/experiment.tex +++ b/experiment.tex @@ -5,7 +5,7 @@ \section{Experiments}\label{sec:experiment} \cref{subsec:experiments_performance} analyses the performance of our solution in terms of execution time; \cref{subsec:experiments_quality} discusses the quality of the best pipeline instance generated by our solution according to the metrics $M_J$ and $M_{JSD}$ in \cref{subsec:metrics}. \subsection{Testing Infrastructure and Experimental Settings}\label{subsec:experiments_infrastructure} -Our testing infrastructure is a Swift-based simulator of a service-based ecosystem, including service execution, comparison, and composition. The simulator first defines the pipeline template as a sequence of vertices, with $l$ the length of the pipeline template, and defines the size \windowsize\ of the sliding window, such that \windowsize$\leq$$l$. We recall that alternative vertices are modeled in different pipeline templates, while parallel vertices are not considered in our experiments since they only add a fixed execution time that is negligible and do not affect the performance and quality of our solution. Each vertex is associated with a (set of) policy that applies a filtering transformation that remove a given percentage of data. +Our testing infrastructure is a Swift-based simulator of a service-based ecosystem, including service execution, selection, and composition. The simulator first defines the pipeline template as a sequence of vertices, with $l$ the length of the pipeline template, and defines the size \windowsize\ of the sliding window, such that \windowsize$\leq$$l$. We recall that alternative vertices are modeled in different pipeline templates, while parallel vertices are not considered in our experiments since they only add a fixed execution time that is negligible and do not affect the performance and quality of our solution. Each vertex is associated with a (set of) policy that applies a filtering transformation that remove a given percentage of data. % Our testing infrastructure is a Swift-based simulator of a service-based ecosystem, including service execution, comparison, and composition. The simulator first defines the pipeline template as a sequence of vertices in the range 3$-$7 (the length $l$ of the pipeline template) and defines the size \windowsize\ of the sliding window, such that \windowsize$<$$l$. We recall that alternative vertices are modeled in different pipeline templates, while parallel vertices are not considered since they only add a fixed execution time that is negligible and do not affect the performance and quality of our approach. Each vertex is associated with a (set of) policy that applies a filtering transformation that either remove a percentage of data in $[0.5,0.8]$ (\average) or in $[0.20,1]$ (\wide). % % \begin{enumerate*}[label=\textit{\roman*})] % % \item \average: data removal percentage within $[0.5,0.8]$. diff --git a/metrics.tex b/metrics.tex index fe3287f..a411c70 100644 --- a/metrics.tex +++ b/metrics.tex @@ -1,34 +1,34 @@ \section{Maximizing the Pipeline Instance Quality}\label{sec:heuristics} % % %Ovviamente non è sufficiente scegliere il best service per ogni vertice, ma diventa un problema complesso dove si devono calcolare/valutare tutte le possibili combinazioni dei servizi disponibili, tra le quali scegliere la migliore. -Our goal is to generate a pipeline instance with maximum quality, addressing data protection requirements while maximizing \textit{\quality (\q)} throughout the pipeline execution. To this aim, we first discuss the quality metrics used to measure and monitor data quality, which guide the generation of the pipeline instance. Then, we prove that the problem of generating a pipeline instance with maximum quality is NP-hard (\cref{sec:nphard}). Finally, we present a parametric heuristic (\cref{subsec:heuristics}) tailored to address the computational complexity associated with enumerating all possible combinations within a given set. The primary aim of the heuristic is to approximate the optimal path for service interactions and transformations, particularly within the landscape of more complex pipelines composed of numerous vertices and candidate services. Our focus extends beyond identifying optimal combinations to encompass an understanding of the quality changes introduced during the transformation processes. +Our goal is to generate a pipeline instance with maximum quality \q, addressing data protection requirements throughout the pipeline execution. To this aim, we first discuss the quality metrics used to measure and monitor data quality (information loss), which guide the generation of the pipeline instance with maximum \q. Then, we prove that the problem of generating a pipeline instance with maximum \q\ is NP-hard (\cref{sec:nphard}). Finally, we introduce a parametric heuristic (\cref{subsec:heuristics}) designed to tackle the computational complexity associated with enumerating all possible combinations within a given set. The main objective of the heuristic is to approximate the optimal path for service interactions and transformations, especially within the realm of complex pipelines consisting of numerous vertices and candidate services. Our focus extends beyond identifying optimal combinations to include an understanding of the quality changes introduced during the transformation processes. %Inspired by existing literature, these metrics, categorized as quantitative and statistical, play a pivotal role in quantifying the impact of policy-driven transformations on the original dataset. \subsection{Quality Metrics}\label{subsec:metrics} %Ensuring data quality is mandatory to implement data pipelines that provide accurate results and decision-making along the whole pipeline execution. To this aim, we define two metrics evaluating the quality loss introduced by our policy-driven transformation in Section~\cite{ADD} on the input dataset \origdataset at each step of the data pipeline. Our metrics can be classified as \emph{quantitative} and \emph{qualitative}~\cite{ADD}, and compare the input dataset \origdataset\ and dataset \transdataset\ generated by enforcing data protection requirements on \origdataset. -Ensuring data quality is mandatory to implement data pipelines that provide accurate results and decision-making along the whole pipeline execution. Quality metrics evaluate the information loss introduced at each step of the data pipeline, and can be classified as \emph{quantitative} or \emph{qualitative}~\cite{ADD}. +Ensuring data quality is mandatory to implement data pipelines that provide accurate results and decision-making along the whole pipeline execution. Quality metrics measure the data quality preserved at each step of the data pipeline, and can be classified as \emph{quantitative} or \emph{qualitative}~\cite{ADD}\hl{CITE}. Quantitative metrics monitor the amount of data lost during data transformations to model the quality difference between datasets \origdataset\ and \transdataset. Qualitative metrics evaluate changes in the properties of datasets \origdataset\ and \transdataset. For instance, qualitative metrics can measure the changes in the statistical distribution of the two datasets. -In this paper, we use two metrics, one quantitative and one qualitative, to compare the input dataset \origdataset\ and dataset \transdataset\ generated by enforcing data protection requirements (i.e., our policy-driven transformation in \cref{sec:instance}) on \origdataset\ at each step of the data pipeline. We note that a complete taxonomy of possible metrics is outside the scope of this paper and will be the target of our future work. +In this paper, we use two metrics, one quantitative and one qualitative, to compare the input dataset \origdataset\ and dataset \transdataset\ generated by enforcing data protection requirements (i.e., our policy-driven transformation in \cref{sec:instance}) on \origdataset\ at each step of the pipeline. We note that a complete taxonomy of possible metrics is outside the scope of this paper and will be the target of our future work. \subsubsection{Quantitative metric} %We propose a metric that measures the similarity between two datasets, for this purpose, we use the Jaccard coefficient. We propose a quantitative metric $M_J$ based on the Jaccard coefficient that assesses the similarity between two datasets. The Jaccard coefficient is defined as follows \cite{RAHMAN20102707}: \[J(X,Y) = \frac{|X \cap Y|}{|X \cup Y|}\] where X and Y are two datasets of the same size. -The coefficient is calculated by dividing the cardinality of the intersection of two datasets by the cardinality of their union. It ranges from 0 to 1, with 0 indicating no similarity and 1 indicating complete similarity between the datasets. It has several advantages. Unlike other similarity measures, such as Euclidean distance, it is not affected by the magnitude of the values in the dataset. It is suitable for datasets with categorical variables or nominal data, where the values do not have a meaningful numerical interpretation. +The coefficient is calculated by dividing the cardinality of the intersection of two datasets by the cardinality of their union. It ranges from 0 to 1, with 0 indicating no similarity (minimum quality) and 1 indicating complete similarity (maximum quality) between the datasets. It has several advantages. Unlike other similarity measures, such as Euclidean distance, it is not affected by the magnitude of the values in the dataset. It is suitable for datasets with categorical variables or nominal data, where the values do not have a meaningful numerical interpretation. Metric $M_J$ extends the Jaccard coefficient with weights that model the importance of each element in the dataset as follows:\[M_J(X,Y) = \frac{\sum_{i=1}^{n}w_i(x_i \cap y_i)}{\sum_{i=1}^{n}w_i(x_i \cup y_i)}\] where $x_i$$\in$X ($y_i$$\in$Y, resp.) is the $i$-th feature of dataset X (Y, resp.), and $w_i$ the weight modeling the importance of the $i$-th feature. -It is computed by dividing the cardinality of the intersection of two datasets by the cardinality of their union, weighted by the importance of each feature in the datasets providing a more accurate measure of similarity. %Weights prioritize certain elements (e.g., a specific feature) in the datasets. +It is computed by dividing the cardinality of the intersection of two datasets by the cardinality of their union, weighted by the importance of each feature in the datasets. It provides a more accurate measure of similarity. %Weights prioritize certain elements (e.g., a specific feature) in the datasets. %The Weighted Jaccard coefficent can then account for element importance and provide a more accurate measure of similarity. \subsubsection{Qualitative Metric} %We propose a metric that enables the measurement of the distance of two distributions. -We propose a qualitative metric $M_{JDS}$ based on the Jensen-Shannon Divergence (JSD) that measures the dissimilarity (distance) between the probability distributions of two datasets. +We propose a qualitative metric $M_{JDS}$ based on the Jensen-Shannon Divergence (JSD) that assesses the similarity (distance) between the probability distributions of two datasets. JSD is a symmetrized version of the KL divergence~\cite{Fuglede} and is applicable to a pair of statistical distributions only. It is defined as follows: \[JSD(X, Y) = \frac{1}{2} \left( KL(X || M) @@ -38,57 +38,57 @@ \subsubsection{Qualitative Metric} JSD incorporates both the KL divergence from X to M and from Y to M. To make JSD applicable to datasets, where each feature in the dataset has its own statistical distribution, metric $M_{JDS}$ applies JSD to each column of the dataset. The obtained results are then aggregated using a weighted average, thus enabling the prioritization of important features that can be lost during the policy-driven transformation in \cref{sec:heuristics}, as follows: \[M_{JDS} = 1 - \sum_{i=1}^n w_i \cdot \text{JSD}(x_i,y_i)\] -where \(w_i = \frac{n_i}{N}\) represents the weight for the \(i\)-th column, with \(n_i\) being the number of distinct elements in the $i$-th feature and \(N\) the total number of elements in the dataset. Each \(\text{JSD}(x_i,y_i)\) accounts for the Jensen-Shannon Divergence computed for the \(i\)-th feature in datasets X and Y. -Must be noted that the one minus has been added to the formula to transfrom the metric into a similarity metric, where 1 indicates complete similarity and 0 indicates no similarity. +%where \(w_i = \frac{n_i}{N}\) represents the weight for the \(i\)-th column, with \(n_i\) being the number of distinct elements in the $i$-th feature and \(N\) the total number of elements in the dataset. +where $\sum_{i=1}^n w_i$$=$1 and each \(\text{JSD}(x_i,y_i)\) accounts for the Jensen-Shannon Divergence computed for the \(i\)-th feature in datasets X and Y. It ranges from 0 to 1, with 0 indicating no similarity (minimum quality) and 1 indicating complete similarity (maximum quality) between the datasets. +%Must be noted that the one minus has been added to the formula to transfrom the metric into a similarity metric, where 1 indicates complete similarity and 0 indicates no similarity. $M_{JDS}$ provides a weighted measure of similarity, which is symmetric and accounts for the contribution from both datasets and specific features. It can compare the similarity of the two datasets, providing a symmetric and normalized measure that considers the overall data distributions. -\subsubsection{\Quality (\q) Definition} +\subsubsection{Pipeline Quality (\q)} %We note that our metrics can be applied either to the entire dataset or to specific features only. The features can be assigned with equal or varying importance, providing a weighted version of the metrics, thus enabling the prioritization of important features that might be possibly lost during the policy-driven transformation in Section~\cite{ADD}. -Metrics $M_J$ and $M_{JDS}$ contribute to the calculation of the information \quality \textit{\q} throughout the pipeline execution as follows. %Information loss is calculated as the average \emph{AVG} of data at each vertex \vi{i}$\in$$\V_S$ of the service pipeline $G(V,E)$ as follows. +Metrics $M_J$ and $M_{JSD}$ contribute to the calculation of the pipeline quality \q\ as follows. %Information loss is calculated as the average \emph{AVG} of data at each vertex \vi{i}$\in$$\V_S$ of the service pipeline $G(V,E)$ as follows. -\begin{definition}[\emph{\quality}] - Given a metrics M$\in$$\{M_J,M_{JDS}$\} modeling the data quality, \quality is calculated as \emph{AVG}($M_{ij}$), with $M_{ij}$ the value of the quality metric retrieved at each vertex \vii{i}$\in$$\V'_S$ of the pipeline instance $G'$ according to service \sii{j}. +\begin{definition}[\emph{\quality}]\label{def:quality} + Given a metric $M$$\in$$\{M_J,M_{JSD}$\} modeling the data quality, pipeline quality \q$=$\emph{AVG}($M_{ij}$), with $M_{ij}$ the value of the quality metric retrieved at each vertex \vii{i}$\in$$\V'_S$ of the pipeline instance $G'$ according to service \sii{j}. \end{definition} We note that \emph{AVG}($M_{ij}$) models the average data quality preserved within the pipeline instance $G'$. -We also note that $q_{ij}$$=$$M_i$ models the \quality at vertex \vii{i}$\in$$\V'_S$ of $G'$ for service \sii{j}. +We also note that $\q_{ij}$$=$$M_{ij}$ models the \quality at vertex \vii{i}$\in$$\V'_S$ of $G'$ for \sii{j}. %We also note that information loss \textit{dloss} is used to generate the Max-Quality pipeline instance in the remaining of this section. \subsection{NP-Hardness of the Max-Quality Pipeline Instantiation Problem}\label{sec:nphard} %\hl{se lo definiamo in maniera formale come il problema di trovare un'istanza valida in accordo alla definizione di istanza tale che non ne esiste una con un loss piu' piccolo?} -The problem of computing a pipeline instance (\cref{def:instance}) with maximum quality (minimum information loss) can be formally defined as follows. +The problem of computing a pipeline instance (\cref{def:instance}) with maximum quality \q\ can be formally defined as follows. \begin{definition}[Max-Quality Problem]\label{def:MaXQualityInstance} Given a pipeline template $G^{\myLambda,\myGamma}$ and a set $S^c$ of candidate services, find a max-quality pipeline instance $G'$ such that: \begin{itemize} \item $G'$ satisfies conditions in \cref{def:instance}, - \item $\nexists$ a pipeline instance $G''$ that satisfies conditions in \cref{def:instance} and such that \quality \textit{\q}($G''$)$>$\textit{\q}($G'$), where \textit{\q}($\cdot$) is the \quality throughout the pipeline execution. + \item $\nexists$ a pipeline instance $G''$ that satisfies conditions in \cref{def:instance} and such that quality \q($G''$)$>$\q($G'$), where \q($\cdot$) is the pipeline quality in Definition~\ref{def:quality}. %computed after applying the transformation of the policy matching the service selected to instantiate vertex \vi{i}$\in$$\V_S$, . \end{itemize} \end{definition} -The Max Quality \problem is a combinatorial selection problem and is NP-hard, as stated by Theorem \cref{theorem:NP}. However, while the overall problem is NP-hard, there is a component of the problem that is solvable in polynomial time: matching the profile of each service with the corresponding vertex policy. This can be done by iterating over each vertex and each service, checking if the service matches the vertex policy. This process would take $O(|N|*|S|)$ time. This is polynomial time complexity. +The Max Quality \problem is a combinatorial selection problem and is NP-hard, as stated by Theorem \cref{theorem:NP}. However, while the overall problem is NP-hard, there is a component of the problem that is solvable in polynomial time: matching the profile of each service with the corresponding vertex policy. This can be done by iterating over each vertex and each service, checking if the service matches the vertex policy. This process take polynomial time complexity $O(|N|*|S|)$. \begin{theorem}\label{theorem:NP} The Max-Quality \problem is NP-Hard. \end{theorem} \emph{Proof: } -The proof is a reduction from the multiple-choice knapsack problem (MCKP), a classified NP-hard combinatorial optimization problem, which is a generalization of the simple knapsack problem (KP) \cite{}. In the MCKP problem, there are $t$ mutually disjoint classes $N_1,N_2,\ldots,N_t$ of items to pack in some knapsack of capacity $C$, class $N_i$ having size $n_i$. Each item $j$$\in$$N_i$ has a profit $p_{ij}$ and a weight $w_{ij}$; the problem is to choose one item from each class such that the profit sum is maximized without having the weight sum to exceed C. +The proof is a reduction from the multiple-choice knapsack problem (MCKP), a classified NP-hard combinatorial optimization problem, which is a generalization of the simple knapsack problem (KP) \cite{}\hl{CITA}. In the MCKP problem, there are $t$ mutually disjoint classes $N_1,N_2,\ldots,N_t$ of items to pack in some knapsack of capacity $C$, class $N_i$ having size $n_i$. Each item $j$$\in$$N_i$ has a profit $p_{ij}$ and a weight $w_{ij}$; the problem is to choose one item from each class such that the profit sum is maximized without having the weight sum to exceed C. The MCKP can be reduced to the Max quality \problem in polynomial time, with $N_1,N_2,\ldots,N_t$ corresponding to $S^c_{1}, S^c_{1}, \ldots, S^c_{u},$, $t$$=$$u$ and $n_i$ the size of $S^c_{i}$. The profit $p_{ij}$ of item $j$$\in$$N_i$ corresponds to \textit{\q}$_{ij}$ computed for each candidate service $s_j$$\in$$S^c_{i}$, while $w_{ij}$ is uniformly 1 (thus, C is always equal to the cardinality of $V_C$). -Since the reduction can be done in polynomial time, our problem is also NP-hard. (non è sufficiente, bisogna provare che la soluzione di uno e' anche soluzione dell'altro) +Since the reduction can be done in polynomial time, our problem is also NP-hard. \hl{CHIARA (non e' sufficiente, bisogna provare che la soluzione di uno e' anche soluzione dell'altro).} \begin{example}[Max-Quality Pipeline Instance] - Let us start from \cref{ex:instance} and extend it with the comparison algorithm in \cref{sec:instance} built on \quality. The comparison algorithm is applied to the set of services $S'_*$ and returns three service rankings one for each vertex \vi{4}, \vi{5}, \vi{6} according to the amount of data anonymized. - The ranking is listed in \cref{tab:instance_example_maxquality}(b) and based on the transformation function in the policies. We assume that the more restrictive the transformation function (i.e., it anonymizes more data), the lower is the service position in the ranking. - For example, \s{11} is ranked first because it anonymizes less data than \s{12} and \s{13}. - The ranking of \s{22} and \s{23} is based on the same logic. - Finally, the ranking of \s{31} and \s{32} is affected by the environment state at the time of the ranking. For example, if the environment where the visualization is performed is a CT facility, then \s{31} is ranked first and \s{32} second because the facility is considered less risky than the cloud. + We extend \cref{ex:instance} with the selection algorithm in \cref{sec:instance} built on pipeline quality \q. The selection algorithm is applied to the set $S'_*$ of compatible services and returns three service rankings, one for each vertex \vi{4}, \vi{5}, \vi{6}, according to quality metric $M_J$ measuring the amount of preserved data after anonymization. The ranking is presented in \cref{tab:instance_example_maxquality}(b), according to the transformation function in the corresponding policies. + We assume that the more restrictive the transformation function (i.e., it anonymizes more data), the lower is the service position in the ranking. + For example, \s{11} is ranked first because it anonymizes less data than \s{12} and \s{13}, that is, $Q_{11}$$>$$Q_{12}$ and $Q_{11}$$>$$Q_{13}$. The same applies for the ranking of \s{22} and \s{23}. + The ranking of \s{31} and \s{32} is affected by the environment state at the time of the ranking. For example, if the environment where the visualization is performed is a CT facility, then \s{31} is ranked first and \s{32} second because the facility is considered less risky than the cloud, and $Q_{31}$$>$$Q_{32}$. \end{example} % The metrics established will enable the quantification of data loss pre- and post-transformations. @@ -104,9 +104,13 @@ \subsection{Heuristic}\label{subsec:heuristics} %The exhaustive exploration of such combinations swiftly becomes impractical in terms of computational time and resources, particularly when dealing with the analysis of complex pipelines. %In response to this computational complexity, the incorporation of heuristic emerges as a strategy to try to efficiently address the problem. %\hl{HO RIVISTO IL PARAGRAFO VELOCEMENTE GIUSTO PER DARE UN'INDICAZIONE. DOBBIAMO USARE LA FORMALIZZAZIONE E MAGARI FORMALIZZARE ANCHE LO PSEUDOCODICE.} -We design and implement a heuristic algorithm for computing the pipeline instance maximizing data quality. Our heuristic is built on a \emph{sliding window} and aims to maximize information \quality \emph{\q} according to quality metrics. At each step, a set of vertices in the pipeline template $\tChartFunction$ is selected according to a specific window size \windowsize, that select a subset of the pipeline template starting at depth $i$ and ending at depth \windowsize+i-1. Service filtering and selection in \cref{sec:instance} are then executed to maximiza \emph{\quality} in window $w$. The heuristic returns as output the list of services instantiating all vertices at depth $i$. The sliding window $w$ is then shifted by 1 (i.e., $i$=$i$+1) and the filtering and selection process executed until \windowsize+i-1 is equal to length $l$ (max depth) of $\tChartFunction$, that is, the sliding window reaches the end of the template. +We design and implement a heuristic algorithm built on a \emph{sliding window} for computing the pipeline instance maximizing quality \q. +%Our heuristic is built on a \emph{sliding window} and aims to maximize information \quality \emph{\q} according to quality metrics. +%At each step, a set of vertices in the pipeline template $\tChartFunction$ is selected according to a window of size \windowsize, which select a subset of the pipeline template starting at depth $i$ and ending at depth \windowsize+i-1. +At each step, a window of size \windowsize\ selects a subset of vertices in the pipeline template $\tChartFunction$ starting at depth $i$ and ending at depth \windowsize+i-1. +Service filtering and selection in \cref{sec:instance} are then executed to maximize quality $Q_w$ in window $w$. The heuristic returns as output the list of services instantiating all vertices at depth $i$. The sliding window $w$ is then shifted by 1 (i.e., $i$=$i$+1) and the filtering and selection process executed until \windowsize+i-1 is equal to length $l$ (max depth) of $\tChartFunction$, that is, the sliding window reaches the end of the template. %For example, in our service selection problem where the quantity of information lost needs to be minimized, the sliding window algorithm can be used to select services composition that have the lowest information loss within a fixed-size window. -This strategy ensures that only services with low information loss are selected at each step, maximizing the information \quality \emph{\q}. The pseudocode of the heuristic algorithm is presented in \cref{lst:slidingwindowfirstservice}. +This strategy ensures that only services with low information loss are selected at each step, maximizing the pipeline quality \q. The pseudocode of the heuristic algorithm is presented in \cref{lst:slidingwindowfirstservice}. \definecolor{commentsColor}{rgb}{0.497495, 0.497587, 0.497464} \definecolor{keywordsColor}{rgb}{0.000000, 0.000000, 0.0} \definecolor{stringColor}{rgb}{0.558215, 0.000000, 0.135316} @@ -142,7 +146,7 @@ \subsection{Heuristic}\label{subsec:heuristics} \begin{lstlisting}[frame=bt, escapechar=\%,mathescape, caption={Sliding Window Heuristic with Selection of First Service from Optimal Combination},label={lst:slidingwindowfirstservice}] var $\text{G'}$ = [] //pipeline instance - $M$ = 0; //DATA QUALITY + $M$ = 0; //Quality metric function SlidingWindowHeuristic(G^{%\myLambda%,%\myGamma%}, %\windowsize%){ for i = 1 to l - %\windowsize% + 1 { @@ -179,7 +183,7 @@ \subsection{Heuristic}\label{subsec:heuristics} \end{lstlisting} - The function SlidingWindowHeuristic processes a list of vertices, each associated with a list of services, to identify optimal service combinations using a sliding window approach, given the constraints set by parameters verticesList and w (window size). + \hl{PARAGRAFO NON CHIARO} The function SlidingWindowHeuristic processes a list of vertices, each associated with a list of services, to identify optimal service combinations using a sliding window approach, given the constraints set by parameters verticesList and w (window size). Initially, the function establishes $\text{G'}$ to store the optimal services or combinations identified during the process (line 2). It iterates from the start to the feasible end of the vertex list to ensure each possible window of services is evaluated (line 3). For each window, the function initializes minMetric to infinity and an empty list minMetricCombination to store the best service combination found within that specific window (line 5-6).