On Wednesday 25 February 2026, at 12 pm, Mr Sotirios Dimitriadis, student of the postgraduate program “Data Science and Information Technologies”, will present his MSc thesis titled: “Predicting protein-membrane interfaces of peripheral membrane proteins using machine learning”
Title: “Predicting protein-membrane interfaces of peripheral membrane proteins using machine learning”
Sotirios Dimitriadis, DSIT MSc
Α.Μ.: 7115152300006
Location
Seminar Room ΙΣ2, Biomedical Research Foundation Academy of Athens, 4, Soranou Ephessiou, 115 27 Athens
Abstract
Peripheral membrane proteins are essential components in various biological activities, including cell differentiation, proliferation, and intercellular communication. Peripheral membrane proteins exhibit transient association with the lipid bilayer and are regulated with diverse mechanisms depending on their specific functions. Precise binding is required for cellular homeostasis, as protein-membrane attachment is responsible for the development of many disease pathologies. The study of these pathologies is often impeded because of the inherent difficulty of characterizing the protein-membrane interface through experimental techniques. For this reason, the specific membrane-binding domains of many peripheral membrane proteins remain unknown. This limitation has created a demand for the development of robust computational approaches for the prediction of protein-membrane interfaces. Current computational approaches, however, are often restricted by limited accuracy or excessive processing time. These constraints, when coupled with the scarcity of high-fidelity structural data, increase the development costs of small molecules intended to target the protein-membrane interface. To identify the protein-membrane interfaces we trained an ensemble machine learning model that predicts protein-membrane amino acids. The training of this model relied upon the consolidation of two distinct datasets. We curated the first dataset, compiling and manually updating high-quality experimental amino acid annotations through a comprehensive review of the recent literature. This information was supplemented by a second published dataset that incorporates interfacial binding site data from a wide variety of 9 protein superfamilies, which provides a broader representation of membrane-binding interactions. For the proteins in the consolidated dataset, we extracted physicochemical amino acid descriptors, geometrical features, and deep learning-based protein language model embeddings to capture diverse sequence and structural signals for the training of the model. To address the class imbalance and the lack of experimental annotation, we also augmented the protein-membrane interaction regions using a simple expansion strategy. Then, supervised Machine Learning training using binary classification ensued to predict the membrane-protein interacting amino acids. Model evaluation revealed that the highest predictive performance was achieved in a model trained on non-expanded experimental amino acid annotations using physicochemical and geometric features, yielding a Matthews Correlation Coefficient of 0.558 and an F1 Macro score of 0.776. The second-best model was trained using the consolidated dataset with expanded annotations, alongside a set of all available features. This work combines experimentally-curated annotations with expanded interface regions and evaluates the trade-off between physicochemical descriptors and protein language model embeddings.
Examiners
Dr. Zoe Cournia
Dr. Theodore Dalamagas
Dr. Konstantinos Vougas
Leave A Comment