Skip to content

Commit

Permalink
fixed asbtract
Browse files Browse the repository at this point in the history
  • Loading branch information
antongiacomo committed Jun 12, 2024
1 parent 5feffa1 commit 4a99e88
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@

\maketitle

\abstract{
Today, the increasing ability of collecting and managing huge volume of data, coupled with a paradigm shift in service delivery models, has significantly enhanced scalability and efficiency in data analytics, particularly in multi-tenant environments. Data are today treated as digital products, which are managed and analyzed by multiple services orchestrated in data pipelines. This scenario calls for innovative solutions to data pipeline management that primarily seek to balance data quality and data protection. Departing from the state of the art that traditionally optimizes data protection and data quality as independent factors, we propose a framework that enhances service selection and composition in distributed data pipelines to the aim of maximizing data quality, while providing a minimum level of data protection. Our approach first retrieves a set of candidate services compatible with data protection requirements in the form of access control policies; it then selects the subset of compatible services, to be integrated within the data pipeline, which maximizes the overall data quality. Being our approach NP-hard, a sliding-window heuristic is defined and experimentally evaluated in terms of performance and quality with respect to the exhaustive approach. Our results demonstrate a significant reduction in computational overhead, while maintaining high data quality.
}
\begin{abstract}
~Today, the increasing ability of collecting and managing huge volume of data, coupled with a paradigm shift in service delivery models, has significantly enhanced scalability and efficiency in data analytics, particularly in multi-tenant environments. Data are today treated as digital products, which are managed and analyzed by multiple services orchestrated in data pipelines. This scenario calls for innovative solutions to data pipeline management that primarily seek to balance data quality and data protection. Departing from the state of the art that traditionally optimizes data protection and data quality as independent factors, we propose a framework that enhances service selection and composition in distributed data pipelines to the aim of maximizing data quality, while providing a minimum level of data protection. Our approach first retrieves a set of candidate services compatible with data protection requirements in the form of access control policies; it then selects the subset of compatible services, to be integrated within the data pipeline, which maximizes the overall data quality. Being our approach NP-hard, a sliding-window heuristic is defined and experimentally evaluated in terms of performance and quality with respect to the exhaustive approach. Our results demonstrate a significant reduction in computational overhead, while maintaining high data quality.
\end{abstract}

\tikzset{
do path picture/.style={%
Expand Down

0 comments on commit 4a99e88

Please sign in to comment.