DBTalks

Στο πλαίσιο του μαθήματος M149: Database Systems διοργανώνεται μία σειρά προσκεκλημένων ομιλιών μέσα στον Απρίλη και Μάιο (virtually) στο Τμήμα Πληροφορικής του ΕΚΠΑ.
Οι ομιλίες είναι προγραμματισμένες συγκεκριμένες Τετάρτες 18:30 – 20:00.
Το πρόγραμμα υπάρχει αναλυτικά εδώ: http://web.imsi.athenarc.gr/~georgia/courses.html

Ξεκινάει την Τετάρτη 7 Απριλίου. Λεπτομέρειες σύνδεσης βρίσκονται στο τέλος του μηνύματος.

Title : Towards Automated Validation and Inspection of Machine Learning Pipelines

Abstract: Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policy makers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. In this lecture, I will introduce some of the practical problems in this area and give an overview over two recent approaches on tackling such issues. Deequ is a library for automating the verification of data quality at scale. It provides a declarative API, which combines common quality constraints with user-defined validation code, and thereby enables ‘unit tests’ for data. Deequ efficiently execute the resulting constraint validation workload by translating it to aggregation queries on Apache Spark, and also supports the incremental validation of data quality on growing datasets. mlinspect is a library that enables the lightweight lineage-based inspection of ML preprocessing pipelines. The key idea is to extract a directed acyclic graph representation of the dataflow from ML preprocessing pipelines in Python, and to use this representation to automatically instrument the code with predefined inspections based on a lightweight annotation propagation approach. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines and does not require manual code instrumentation.

Bio : Sebastian Schelter is an Assistant Professor with the University of Amsterdam, conducting research at the intersection of data management and machine learning. He manages the AI for Retail Lab Amsterdam, and has a joint appointment as Research Fellow at Ahold Delhaize, an international retailer based in the Netherlands. His work covers many aspects, such as automating data quality validation, optimizing programs that combine operations from linear and relational algebra or tracking the lineage of machine learning pipelines. In the past, he has been a Faculty Fellow with the Center for Data Science at New York University and a Senior Applied Scientist at Amazon Research, after obtaining his Ph.D. at the database group of TU Berlin with Volker Markl. He is active in open source as an elected member of the Apache Software Foundation, and has extensive experience in building real world systems from his time at Amazon, Twitter, IBM Research, and Zalando.

Join Zoom Meeting
https://zoom.us/j/94455168723?pwd=eXVsQzhMQk5YaVhTa1N4TUp2QWxwdz09
Meeting ID: 944 5516 8723
Passcode: kDr89Y

MSc in Data Science & Information Technologies

DBTalks

MSc Thesis presentation of Mr. Giorgos Petsangourakis Tuesday, July 7, 2026

MSc Thesis presentation of Vasileios Klearchos Chatzitolios, Tuesday, 7/6/2026 at 13.00

MSc Thesis presentation of Ms. Vasso Strouthopoulou Thursday, July 2, 2026

MSc Thesis presentation of Eirini Baltzi – Thursday, 2/7/2026

MSc Thesis presentation of Georgios Xydias – Thursday 25/06/2026

DBTalks

DBTalks

Share This Story, Choose Your Platform!

Related Posts

MSc Thesis presentation of Mr. Giorgos Petsangourakis Tuesday, July 7, 2026

MSc Thesis presentation of Vasileios Klearchos Chatzitolios, Tuesday, 7/6/2026 at 13.00

MSc Thesis presentation of Ms. Vasso Strouthopoulou Thursday, July 2, 2026

MSc Thesis presentation of Eirini Baltzi – Thursday, 2/7/2026

MSc Thesis presentation of Georgios Xydias – Thursday 25/06/2026