Global Search and Discovery with Differential Policy Optimization
Bharti 501 IIT Campus, Hauz Khas, New DelhiChandrajit Bajaj, UT Austin Reinforcement learning (RL) with continuous state and action spaces is arguably one the most challenging problems within the field of machine learning. Most current learning methods focus on integral identities such as value (Q) functions to derive an optimal strategy for the learning agent. In this talk we present the dual… Read More »Global Search and Discovery with Differential Policy Optimization