Knowledge Discovery, Graph Neural Networks, and Language Models

Content

The lecture provides a comprehensive overview of various approaches in machine learning and data mining for knowledge extraction. It explores multiple fields, including machine learning, natural language processing, and knowledge representation. The main focus is on discovering patterns and regularities in extensive data sets, particularly unstructured text found in news articles, publications, and social media. This process is known as knowledge discovery. The lecture delves into specific techniques, methods, challenges, as well as current and future research topics within this field.
One part of the lecture is dedicated to understanding large language models (LLMs), such as ChatGPT, by exploring their underlying principles, training methods, and applications. Additionally, the lecture dives into graph representation learning, which involves extracting meaningful representations from graph data. It covers the mathematical foundations of graph and geometric deep learning, highlighting the latest applications in areas like explainable recommender systems.
Moreover, the lecture highlights the integration of knowledge graphs with large language models, known as neurosymbolic AI. This integration aims to combine structured and unstructured data to enhance knowledge extraction and representation.
The content of the lecture encompasses the entire machine learning and data mining process. It covers topics on supervised and unsupervised learning techniques, as well as empirical evaluation. Various learning methods are explored, ranging from classical approaches like decision trees, support vector machines, and neural networks to more recent advancements such as graph neural networks.

Learning obectives:

Students

  • know fundamentals of Machine Learning, Data Mining and Knowledge Discovery.
  • are able to design, train and evaluate adaptive systems.
  • conduct Knowledge Discovery projects in regards to algorithms, representations and applications.

Workload:

  • The total workload for this course is approximately 135 hours
  • Time of presentness: 45 hours
  • Time of preperation and postprocessing: 60 hours
  • Exam and exam preperation: 30 hours
Language of instructionEnglish
Bibliography
  • T. Hastie, R. Tibshirani, J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (http://www-stat.stanford.edu/~tibs/ElemStatLearn/)
  • T. Mitchell. Machine Learning. 1997
  • M. Berhold, D. Hand (eds). Intelligent Data Analysis - An Introduction. 2003
  • P. Tan, M. Steinbach, V. Kumar: Introduction to Data Mining, 2005, Addison Wesley