From 4a99e884c45cef0e02f2c8026e78ada5263a20c1 Mon Sep 17 00:00:00 2001 From: Antongiacomo Polimeno Date: Wed, 12 Jun 2024 17:11:29 +0200 Subject: [PATCH] fixed asbtract --- main.tex | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/main.tex b/main.tex index f708ba1..a659588 100644 --- a/main.tex +++ b/main.tex @@ -55,9 +55,9 @@ \maketitle -\abstract{ -Today, the increasing ability of collecting and managing huge volume of data, coupled with a paradigm shift in service delivery models, has significantly enhanced scalability and efficiency in data analytics, particularly in multi-tenant environments. Data are today treated as digital products, which are managed and analyzed by multiple services orchestrated in data pipelines. This scenario calls for innovative solutions to data pipeline management that primarily seek to balance data quality and data protection. Departing from the state of the art that traditionally optimizes data protection and data quality as independent factors, we propose a framework that enhances service selection and composition in distributed data pipelines to the aim of maximizing data quality, while providing a minimum level of data protection. Our approach first retrieves a set of candidate services compatible with data protection requirements in the form of access control policies; it then selects the subset of compatible services, to be integrated within the data pipeline, which maximizes the overall data quality. Being our approach NP-hard, a sliding-window heuristic is defined and experimentally evaluated in terms of performance and quality with respect to the exhaustive approach. Our results demonstrate a significant reduction in computational overhead, while maintaining high data quality. -} +\begin{abstract} +~Today, the increasing ability of collecting and managing huge volume of data, coupled with a paradigm shift in service delivery models, has significantly enhanced scalability and efficiency in data analytics, particularly in multi-tenant environments. Data are today treated as digital products, which are managed and analyzed by multiple services orchestrated in data pipelines. This scenario calls for innovative solutions to data pipeline management that primarily seek to balance data quality and data protection. Departing from the state of the art that traditionally optimizes data protection and data quality as independent factors, we propose a framework that enhances service selection and composition in distributed data pipelines to the aim of maximizing data quality, while providing a minimum level of data protection. Our approach first retrieves a set of candidate services compatible with data protection requirements in the form of access control policies; it then selects the subset of compatible services, to be integrated within the data pipeline, which maximizes the overall data quality. Being our approach NP-hard, a sliding-window heuristic is defined and experimentally evaluated in terms of performance and quality with respect to the exhaustive approach. Our results demonstrate a significant reduction in computational overhead, while maintaining high data quality. +\end{abstract} \tikzset{ do path picture/.style={%