diff --git a/experiment.tex b/experiment.tex index d50aef9..206481c 100644 --- a/experiment.tex +++ b/experiment.tex @@ -1,18 +1,14 @@ \section{Experiments}\label{sec:experiment} - -We experimentally evaluated the performance and quality of our methodology, -and corresponding heuristic implementation in \cref{subsec:heuristics}, -and compare them against the exhaustive approach in Section~\ref{TOADD}. -In the following, +We experimentally evaluated the performance and quality of our methodology \cref{subsec:heuristics}, +and compared it against the exhaustive approach in Section~\ref{TOADD}. In the following, \cref{subsec:experiments_infrastructure} presents the simulator and testing infrastructure adopted in our experiments, as well as the complete experimental settings; \cref{subsec:experiments_performance} analyses the performance of our solution in terms of execution time; \cref{subsec:experiments_quality} presents the quality of our heuristic algorithm in terms of the metrics in \cref{subsec:metrics}. \subsection{Testing Infrastructure and Experimental Settings}\label{subsec:experiments_infrastructure} -Our testing infrastructure is a Swift-based simulator of a service-based ecosystem, including service execution, comparison, and composition. -Upon setting the sliding window size, the simulator selects a subset of nodes along with their corresponding candidate services. -It then generates all possible service combinations for the chosen nodes. -For each combination, the simulator calculates a metric, selecting the first service from the optimal combination before shifting the sliding window. -When the end of the node list is reached, or when the window size equals the node count, the simulator computes the optimal service combination for the remaining nodes. -To ensure that each service is interdependent within a combination, a hash function is employed. This function generates weights that services use to simulate data removal due to an anonymization process. +Our testing infrastructure is a Swift-based simulator of a service-based ecosystem, including service execution, comparison, and composition. The simulator first defines the pipeline template as a sequence of nodes in the range \hl{x-y}. We recall that alternative nodes are modeled in different pipeline templates, while parallel nodes only add a fixed execution time that is negligible and do not affect the quality of our approach. Each node is associated with a (set of) policy with transformations varying in three classes: \hl{a,b,c}. A set of functionally-equivalent candidate services is randomly generated, each service having a profile...\hl{to conclude}. +Upon setting up the pipeline template, the sliding window size is configured and our methodology for pipeline instance generation starts. %The simulator selects a subset of nodes along with their corresponding candidate services. +The simulator calculates all possible pipeline instances, that is, it instantiates all nodes with a service according to the selected window size. For each node, the simulator calculates a quality metric, selecting the first service from the optimal combination in the sliding window, then shifting the window by step 1. +When the end of the node list is reached, or when the window size equals the node count, the simulator computes the optimal service combination for the remaining nodes and the pipeline instance is generated. +\hl{NON MI E' CHIARISSIMA To ensure that each service is interdependent within a combination, a hash function is employed. This function generates weights that services use to simulate transformations (data removal) mandated by the specified policies.} The simulator is used to assess the performance and quality of our sliding window heuristic in Section \ref{sec:heuristics} for the generation of the best pipeline instance (Section \ref{sec:instance}). % Performance measures the heuristics execution time in different settings, while quality compares the results provided by our heuristics in terms of selected services with the optimal solution retrieved using the exhaustive approach. %We note that the exhaustive approach generates the best pipeline instance by executing all possible combinations of candidate services. @@ -114,29 +110,24 @@ \subsection{Perfomance}\label{subsec:experiments_performance} \subsection{Quality}\label{subsec:experiments_quality} -We finally evaluated the quality of our heuristic comparing, where possible, -its results with the optimal solution retrieved by executing the exhaustive approach. -The latter executes with window size equals to the number of services per node and provides the best, -among all possible, solution. +We finally evaluated the quality of our heuristic comparing, where possible, its results with the optimal solution retrieved by executing the exhaustive approach. The latter executes with window size equals to the number of nodes and provides the best, among all possible, solution. + +We recall that we considered three different setting, confident, diffident, average, varying the policy transformations, that is, the amount of data removal at each node. Setting confident assigns to each policy a transformation that changes the amount of data removal in the interval [x,y] (Jaccard coefficient) or decreases the probability distribution dissimilarity in the interval [x,y] (Jensen-Shannon Divergence). Setting diffident assigns to each policy a transformation that changes the amount of data removal in the interval [x,y] (Jaccard coefficient) or decreases the probability distribution dissimilarity in the interval [x,y] (Jensen-Shannon Divergence). Setting average assigns to each policy a transformation that changes the amount of data removal in the interval [x,y] (Jaccard coefficient) or decreases the probability distribution dissimilarity in the interval [x,y] (Jensen-Shannon Divergence). % \hl{DOBBIAMO SPIEGARE COSA ABBIAMO VARIATO NEGLI ESPERIMENTI E COME, WINDOW SIZE, NODI, ETC. % LE IMMAGINI CHE ABBIAMO SONO SOLO QUELLE 5? POSSIAMO ANCHE INVERTIRE GLI ASSI E AGGIUNGERE VISUALI DIVERSE} -\cref{fig:quality_window} presents our results -In the figure each chart represents a configuration with a specific number of nodes, ranging from 3 to 7. -On the x-axis of each chart, the number of services is plotted, which ranges from 2 to 6. -The y-axis represents the metric value, which varies across the charts. -Each chart shows different window sizes, labeled as W Size 1, W Size 2, and so on, corresponding to various metric values. -As the number of nodes increases in each subsequent chart, the relationship between the window size and metric value is depicted, -showing how metric values tend to decrease (better data preservation) as the window size increases across different node configurations. -This suggests that the heuristic performs better when it has a broader perspective of the data it is analyzing. -The trend is consistent across various numbers of nodes, from three to seven, indicating that the heuristic's enhanced -performance with larger window sizes is not confined to a specific setup but rather a general characteristic of its behavior. +\cref{fig:quality_window} presents our results with setting \hl{confident} and metric Jaccard coefficient. \cref{fig:quality_window}(a)--(e) \hl{aggiungere le lettere e uniformare l'asse y} present the retrieved quality varying the number of nodes in [3, 7], respectively. Each figure in \cref{fig:quality_window}(a)--(e) varies the number of candidate services at each node in the range [2, 6] and the window size W in the range [1, $|$nodes$|$]. +\hl{aggiungiamo i numeri piu significativi (asse y).} +From the results, some clear trends emerge. As the number of nodes increases, the metric values tend to decrease (better data quality) as the window size increases across different node configurations. +This suggests that the heuristic performs better when it has a broader perspective of the data and services. The trend is consistent across all node cardinalities (from three to seven), indicating that the heuristic's enhanced performance with larger window sizes is not confined to a specific setup but rather a general characteristic of its behavior. Finally, the data suggest that while larger window sizes generally lead to better performance, -there might exist a point where the balance between window size and performance is optimized. +there might exist a point where the balance between window size and performance is optimized. \hl{For instance, ...} Beyond this point, the incremental gains in metric values may not justify the additional computational resources or the complexity introduced by larger windows. +\hl{RIPETERE PER TUTTI I SETTINGS} + \begin{figure} \includegraphics[width=0.95\columnwidth]{graphs/exhaustive_performance.eps}