Wu gave an invited talk at UTD Ethical AI and HPC Seminar Series!

Dr. Xintao Wu gave an invited talk on Feb 21 at UTD Ethical AI and HPC Seminar Series, which feature distinguished researchers who will share cutting-edge work in the fields spanning philosophy, ethical AI, and high-performance computing (HPC). The seminars engaged faculty and students across the Computer Science Department at UT Dallas, the Electrical and Computer Engineering Department at Rutgers, and the Philosophy Department and AI Center at UC Davis.

Title: Towards Efficient Machine Unlearning in Large Language Models

Abstract

The widespread popularity of Large Language Models (LLMs), partly due to their emerging in-context learning ability, has highlighted the importance of ethical and safety considerations for deployment. In this talk, I will first provide my reflections on over one and half years of exploration and research as a LLM novice and a researcher of trustworthy machine learning. I will start sharing my experience of exploring LLMs for a few research tasks (e.g., hateful meme detection and mitigation, and biomedical image analysis), and continue with an overview of my preliminary research works on trustworthy LLMs. We will then focus on one research work — machine unlearning for LLMs motivated by data protection guidelines. In contrast to the growing literature on fine-tuning methods to achieve unlearning, we present a comparatively lightweight alternative called soft prompting to realize unlearning in LLMs. With losses designed to enforce forgetting as well as utility preservation, our framework Soft Prompting for Unlearning (SPUL) learns prompt tokens that are prepended to a query to induce unlearning of specific training examples at inference time without updating LLM parameters. We will show empirical results on the trade-off between utility and forgetting as well as efficiency for text classification and question-answering tasks. Finally, I will conclude the talk with some future research directions.

Categories

Archives

Recent Posts

Recent Comments