
The course covers the fundamental concepts and current developments in modern RDM, with a strong emphasis on FAIR principles and their role in scientific workflows. Students will explore how well-organized, richly annotated data enables effective data sharing, reproducibility, and the application of artificial intelligence and machine learning (AI/ML) methods. The course examines the structure and purpose of Data Management Plans (DMPs) and discusses their importance in ensuring that research datasets can be reused for computational analysis and automated processing. A key focus is placed on metadata standards, ontologies, and controlled vocabularies, which are essential for both human understanding and machine-actionable representation of research data. Students will analyze major research infrastructures and repositories, including Zenodo, OSF, EOSC, and several domain-specific platforms that serve as sources of high-quality data for AI/ML research. The seminar provides an overview of NFDI consortia and their approaches to RDM. Case studies from materials science, particularly NOMAD, and MatInf, illustrate domain-specific challenges and demonstrate how curated data supports data-driven discovery and materials informatics. Throughout the semester, students will prepare literature reviews on selected topics, present their findings, and discuss how RDM practices influence the usability of research data for AI-based methods. The seminar encourages critical reflection on the current state of RDM, highlighting both the opportunities and limitations of integrating AI/ML into research workflows. The course concludes with a collective discussion on future directions, including automated metadata generation, intelligent data infrastructures, and the broader impact of AI on scientific data management.
- Kursleiter/in: Thorsten Berger
- Kursleiter/in: Victor Dudarev
- Kursleiter/in: Yorick Sens