
Enhancing Safety and Ethical Alignment in Large Language Models by Dr. Rima Hazra
May 28 @ 12:00 pm - 1:00 pm
Title: Enhancing Safety and Ethical Alignment in Large Language Models
Speaker: Dr. Rima Hazra
Abstract: In this talk, we explore cutting-edge strategies for enhancing the safety and ethical alignment of large language models (LLMs). The research spans various approaches, including red teaming and jailbreaking techniques, which assess and improve model robustness and ethical integrity. We delve into how instruction-centric responses, when generated by LLMs, can increase the likelihood of unethical output, thereby highlighting the vulnerabilities of these AI systems. Through the introduction of frameworks like ‘Safety Arithmetic’ and ‘SafeInfer,’ we demonstrate methods to mitigate risks by manipulating model parameters and decoding-time behaviors to foster safer interactions. The discussions also emphasize the importance of safety alignment strategies and the challenges posed by integrating new knowledge through model edits, which can paradoxically destabilize ethical guidelines. This comprehensive examination not only sheds light on the current vulnerabilities of LLMs but also presents a pathway toward more reliable and ethically aligned AI implementations.
Bio: Dr. Rima Hazra is a senior postdoc at Eindhoven University of Technology (TU\e), Netherlands. Earlier she was a Postdoctoral Researcher at the Singapore University of Technology and Design, working in the area of AI safety alignment, natural language processing, and LLM reasoning. She earned her Ph.D. from the Indian Institute of Technology, Kharagpur, where she explored the area of Information retrieval, NLP and graph learning. With experience in information retrieval, NLP and graph learning, Dr. Hazra has published several papers in prestigious CORE A* and A conferences such as AAAI, ACL, EMNLP, NAACL, ECIR, ECMLP PKDD and JCDL. She has also received the prestigious Microsoft Academic Partnership Grant (MAPG) and the PaliGemma Academic Program award from Google for her work in AI safety alignment.