
- This event has passed.
Enhancing Safety and Ethical Alignment in Large Language Models by Rima Hazra
May 28 @ 12:00 pm - 1:00 pm
Speaker: Dr. Rima Hazra
Abstract: In this talk, we explore cutting-edge strategies for enhancing the safety and ethical alignment of large language models (LLMs). The research spans various approaches, including red teaming and jailbreaking techniques, which assess and improve model robustness and ethical integrity. We delve into how instruction-centric responses, when generated by LLMs, can increase the likelihood of unethical output, thereby highlighting the vulnerabilities of these AI systems. Through the introduction of frameworks like ‘Safety Arithmetic’ and ‘SafeInfer,’ we demonstrate methods to mitigate risks by manipulating model parameters and decoding-time behaviors to foster safer interactions. The discussions also emphasize the importance of safety alignment strategies and the challenges posed by integrating new knowledge through model edits, which can paradoxically destabilize ethical guidelines. This comprehensive examination not only sheds light on the current vulnerabilities of LLMs but also presents a pathway toward more reliable and ethically aligned AI implementations.