专用集成电路与系统国家重点实验室
讲座信息

Optimising Reinforcement Learning by Customised Pearlmutter Propagation

报告人:Professor Wayne Luk (英国帝国理工)
时  间:2017年10月26日上午9:30-11:30
地  点:张江校区微电子楼369室

Abstract
Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally expensive. This talk proposes Customised Pearlmutter Propagation (CPP), a novel hardware architecture that accelerates TRPO targeting Field Programmable Gate Array (FPGA) technology. We use the Pearlmutter Algorithm to address the key computational bottleneck of TRPO in a hardware efficient manner, avoiding symbolic differentiation with change of variables. Experimental evaluation using robotic locomotion benchmarks demonstrates that the proposed CPP architecture implemented on Stratix-V FPGA can achieve up to 20 times speed-up against 6-threaded Keras deep learning library with Theano backend running on a Core i7-5930K CPU.

Biography
Wayne Luk is Professor of Computer Engineering at Imperial College London. His research covers the development of hardware and software capabilities to address demanding applications, such as genomic data processing and climate modelling. He is a fellow of the Royal Academy of Engineering, the IEEE and the BCS. He is a recipient of the Research Excellence Award from Imperial College London, and 15 awards for his publications from various international conferences.

 

联系人:王伶俐

 
 
 
 

 

Copyright© 2003-2018 复旦大学微电子学院
联系我们