This event has passed.

Global Search and Discovery with Differential Policy Optimization

Name: Global Search and Discovery with Differential Policy Optimization
Start: 2024-12-19T12:00:00+05:30
End: 2024-12-19T13:00:00+05:30
Location: Bharti 501

December 19, 2024 @ 12:00 pm - 1:00 pm

Chandrajit Bajaj, UT Austin

Reinforcement learning (RL) with continuous state and action spaces is arguably one the most challenging problems within the field of machine learning. Most current learning methods focus on integral identities such as value (Q) functions to derive an optimal strategy for the learning agent. In this talk we present the dual form of the original RL formulation to propose the first differential RL framework that can handle settings with limited training samples and short-length episodes. Our approach introduces Differential Policy Optimization (DPO), a pointwise and stage-wise iteration method that optimizes policies encoded by local-movement operators. We prove a pointwise convergence estimate for DPO and provide a regret bound comparable with the best current theoretical derivation. Such pointwise estimate ensures that the learned policy matches the optimal path uniformly across different steps. We then apply DPO to a class of practical RL problems with continuous state and action spaces, e.g. shape and material optimization and discovery of new molecules with targeted dynamics.

This is joint work with Garvit Bansal, Minh Nguyen.

Details

Date:: December 19, 2024
Time:: 12:00 pm - 1:00 pm
Event Category:: Seminars

Venue

: Bharti 501
: IIT Campus, Hauz Khas
New Delhi, + Google Map