Theodore Dalamagas - Data Science and Information Technologies

Theodore Dalamagas’ Topics

1.TITLE: Models and Algorithms for Personalized Ranking of PubMed Articles

ABSTRACT (up to 100 words): PubMed is a free repository of biomedical and life sciences journal literature at the U.S. National Institutes of Health’s National Library of Medicine, comprising more than 29 million articles. The dissertation will design and develop (a) models to capture the scientific impact of PubMed articles, (b) models to capture user retrieval needs, i.e., research topics of interest, and (c) algorithms to effectively identify valuable articles and rank them, tailored to user needs. The dissertation will build on the PageRank model, Google’s famous technology that ranks pages based on their centrality in the underlying (link) reference network

2.TITLE: Data Integration, Mining, and Visualization for Variant Interpretation to Support Precision Medicine

ABSTRACT (up to 100 words): In Precision Medicine, a common task for many types of analysis is the investigation of specific variants in a subject’s genome. Thus, life scientists may need to examine several relevant data, e.g., the coding effect of a variant for different transcripts, its genomic location, information about its pathogenicity, etc. This wide range of information is scattered in many heterogeneous resources. To make matters worse, in some cases, each resource may report variants based on different reference genome versions. As a result, the task of variant interpretation for precision medicine applications could be tedious and time-consuming. The dissertation will design and develop (a) data integration methods to collect, integrate and fuse rich data about variants from multiple heterogeneous sources (e.g., PubMed, ClinVar, etc), (b) advanced data mining techniques to enable the automatic variant classification (e.g., pathogenic – non pathogenic) and the extraction of variant-related meta-information, and (c) intuitive visualizations to facilitate data exploration

3.TITLE: Data Integration and Information Retrieval Methods for Structural Biology Repositories

ABSTRACT (up to 100 words): Due to the rapidly increasing number of scientific articles in life sciences, finding valuable information in the structural biology literature has become tedious and time consuming. Traditional literature retrieval systems (like Google Scholar or, even, PubMed) do not help much the structural biologist identify useful domain knowledge, since they provide only generic information for each publication. On the other hand, many curated databases contain domain-specific knowledge. However, their information is scattered and heterogeneous. This dissertation will design and develop (a) data integration methods to collect, integrate and fuse rich data about proteins, macromolecular structures, enzymes and their relationships in multiple, heterogeneous resources, and (b) advanced information retrieval techniques to identification useful knowledge for structural biologists. The work is part of the activities carried out in INSPIRED, the National Research Infrastructure on Integrated Biology, Drug Screening Efforts and Drug Target Functional Characterization, coordinated by the National Hellenic Research Foundation (NHRF), and will be based on and will extend the BIP! Finder (https://bip.imsi.athenarc.gr/) literature retrieval system, developed by the ATHENA Research Center, also partner of the infrastructure.

NAME & POSITION OF THE SUPERVISOR: Theodore Dalamagas (Research Director, ATHENA RC), Thanasis Vergoulis (Scientific Associate, ATHENA RC).

LAB/GROUP, DEPARTMENT, INSTITUTION where the thesis will be executed: ATHENA Research Center, Information Management Systems Institute.