Skip to content

Commit

Permalink
Claudio - Conclusioni
Browse files Browse the repository at this point in the history
  • Loading branch information
cardagna committed Oct 31, 2024
1 parent 8216843 commit 662d11d
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 5 deletions.
2 changes: 1 addition & 1 deletion major review/declarations.tex
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ \subsection{Availability of data and materials}
\subsection{Competing interests}
The authors declare that they have no competing interests.
\subsection{Funding}
Research supported, in parts, by \emph{i)} project ``BA-PHERD - Big Data Analytics Pipeline for the Identification of Heterogeneous Extracellular non-coding RNAs as Disease Biomarkers'', funded by the European Union - NextGenerationEU, under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment Line 1.1: “Fondo Bando PRIN 2022” (CUP G53D23002910006), \emph{ii)} project MUSA - Multilayered Urban Sustainability Action - project, funded by the European Union - NextGenerationEU, under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment Line 1.5: Strengthening of research structures and creation of R\&D ``innovation ecosystems'', set up of ``territorial leaders in R\&D'' (CUP G43C22001370007, Code ECS00000037), \emph{iii)} project SERICS (PE00000014) under the NRRP MUR program funded by the EU - NextGenerationEU, \emph{iv)} Università degli Studi di Milano under the program ``Piano di Sostegno alla Ricerca''. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them.
Research supported, in parts, by \emph{i)} project ``BA-PHERD - Big Data Analytics Pipeline for the Identification of Heterogeneous Extracellular non-coding RNAs as Disease Biomarkers'', funded by the European Union - NextGenerationEU, under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment Line 1.1: “Fondo Bando PRIN 2022” (CUP G53D23002910006), \emph{ii)} project MUSA - Multilayered Urban Sustainability Action - project, funded by the European Union - NextGenerationEU, under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment Line 1.5: Strengthening of research structures and creation of R\&D ``innovation ecosystems'', set up of ``territorial leaders in R\&D'' (CUP G43C22001370007, Code ECS00000037), \emph{iii)} Università degli Studi di Milano under the program ``Piano di Sostegno alla Ricerca''. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them.
\subsection{Authors' contributions}
Marco Anisetti (M.A.) and Claudio A. Ardagna (C.A.A.) jointly conceived the original idea and provided guidance on the research direction. C.A.A., Chiara Braghin (C.B.), and Antongiacomo Polimeno (A.P.) developed the theoretical framework. A.P. was also responsible for conducting the experiments and drafting all the manuscript, under the supervision of M.A., C.A.A., C.B. All authors discussed the results, contributed to revisions of the manuscript, and approved the final version for publication.

Expand Down
6 changes: 2 additions & 4 deletions major review/main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,10 @@


\section{Conclusions and Future Work}\label{sec:conclusions}
In the realm of distributed data service pipelines, managing pipelines while ensuring both data quality and data protection presents numerous challenges. This paper proposed a framework specifically designed to address this dual concern. Our data governance model employs policies and continuous monitoring to address data security and privacy challenges, while preserving data quality, in service pipeline generatiaon. The key point of the framework is in its ability to annotate each element of the pipeline with specific data protection requirements and functional specifications, then driving service pipeline construction. This method enhances compliance with regulatory standards and improves data quality by preserving maximum information across pipeline execution. Experimental results confirmed the effectiveness of our sliding window heuristic in addressing the computationally complex NP-hard service selection problem at the basis of service pipeline construction. Making use of a realistic dataset, our experiments evaluated the framework's ability to sustain high data quality while ensuring robust data protection, which is essential for pipelines where both data utility and privacy must coexist.
%To fully understand the impact of dataset selection on the retrieved quality and to ensure heuristic robustness across various scenarios, further investigation is planned for our future work. Future work will then %validate the findings of this paper and
%explore deeper insights into the applicability of our heuristics across different scenarios.
In the realm of distributed data service pipelines, managing pipelines while ensuring both data quality and data protection presents numerous challenges. This paper proposed a framework specifically designed to address this dual concern. Our data governance model employs policies and continuous monitoring to address data security and privacy challenges, while maximizing data quality, in service pipeline generation. The key point of the framework is in its ability to annotate each element of the pipeline with specific data protection requirements and functional specifications, then driving service pipeline construction. This method enhances compliance with regulatory standards and improves data quality by preserving maximum information across pipeline execution. Experimental results confirmed the effectiveness of our sliding window heuristic in addressing the computationally complex NP-hard service selection problem at the basis of service pipeline construction. Making use of a realistic dataset, our experiments evaluated the framework's ability to sustain high data quality while ensuring robust data protection, which is essential for pipelines where both data utility and privacy must coexist.

{\color{OurColor}
The paper leaves space for future work. First, we will extend our methodology with a taxonomy of possible quality dimensions and metrics supporting the definition of a multidimensional data quality that considers multiple dimensions such as, for instance, completeness, timeliness, and accuracy. Multiple dimensions and metrics will be adopted and weighted according to user priorities or task-specific requirements to better address the inherent multidimensional nature of data quality. This extension will enable more sophisticated monitoring and optimization mechanisms throughout the entire data lifecycle. Second, we will evaluate the impact of different datasets and larger sets of services and configurations on our methodology. The primary objective is to identify generalizable patterns and recurring schemes that transcend specific experimental settings, thereby enhancing the broader applicability of our findings. Third, we will evaluate our methodology in different real-world production scenarios with the scope of evaluating its practical usability and utility, bridging the gap between theoretical and practical efficiency. Finally, we will extend our methodology to consider service quality assessment as a means to complement data quality evaluation with traditional service quality metrics, enabling the development of hybrid scenarios. Such scenarios would facilitate the selection of services that optimize quality while maintaining specific non-functional requirements (e.g., execution time, resource consumption).
The paper leaves space for future work. First, we will extend our methodology with a taxonomy of possible quality dimensions and metrics supporting the definition of a multidimensional data quality. Multiple dimensions and metrics will be adopted and weighted according to user priorities or task-specific requirements to better address the inherent multidimensional nature of data quality. This extension will enable more sophisticated monitoring and optimization mechanisms throughout the entire pipeline lifecycle. Second, we will evaluate the impact of different datasets and larger sets of services and configurations on our methodology. The primary objective is to identify generalizable patterns and recurring schemes that transcend specific experimental settings, thereby enhancing the broader applicability of our findings. Third, we will evaluate our methodology in different real-world production scenarios with the scope of evaluating its practical usability and utility, bridging the gap between theoretical and practical efficiency. Finally, we will extend our methodology to consider service quality assessment as a means to complement data quality evaluation with traditional service quality metrics, enabling the development of hybrid scenarios. Such scenarios would facilitate the selection of services that optimize quality while maintaining specific non-functional requirements (e.g., execution time, resource consumption).
}

\input{declarations}
Expand Down

0 comments on commit 662d11d

Please sign in to comment.