Large Language Models (LLMs) like ChatGPT show impressive performance in a variety of tasks like conversation, question answering and summarization. However, the outputs of these models cannot always be trusted. For instance, past studies have shown that the model outputs can exhibit stereotypes or biases against certain social groups. The models could also hallucinate facts about the real world. On top of these issues, the models are often unable to provide insights into why they generated a certain output and what the underlying reasoning was. In this seminar, we will aim to get an in-depth understanding of some of the trustworthiness issues surrounding LLMs and potential approaches to address them. We will read papers from AI and NLP conferences like ACL, ICML, NeurIPS and ICLR.




Semester: SoSe 2024