Quantization-aware Policy Distillation (QPD)

Avé, Thomas; Mets, Kevin; De Schepper, Tom; Latré, Steven

Title

Author

Avé, Thomas

Mets, Kevin

De Schepper, Tom

Latré, Steven

Abstract

Recent advancements have made Deep Reinforcement Learning (DRL) exceedingly more powerful, but the produced models remain very computationally complex and therefore difficult to deploy on edge devices. Compression methods such as quantization and distillation can be used to increase the applicability of DRL models on these low-power edge devices by decreasing the necessary precision and number of operations respectively. Training in low-precision is notoriously less stable however, which is amplified by the decrease in representational power when limiting the number of trainable parameters. We propose Quantization-aware Policy Distillation (QPD), which overcomes this instability by providing a smoother transition from high to low-precision network parameters. A new distillation loss specifically designed for the compression of actor-critic networks is also defined, resulting in a higher accuracy after compression. Our experiments show that these combined methods can effectively compress a policy network down to 0.5% of its original size, without any loss in performance.

Language

English

Source (book)

Deep Reinforcement Learning Workshop, NeurIPS 2022, 9 December, 2022

Publication

2022

Volume/pages

p. 1-15

Full text (open access)

https://repository.uantwerpen.be/docstore/d:irua:20860

Faculty/Department				Faculty of Sciences. Mathematics and Computer Science Faculty of Applied Engineering Sciences

Research group				Internet Data Lab (IDLab)

Publication type				P3 Proceeding

Subject				Engineering sciences. Technology Computer. Automation

Affiliation				Publications with a UAntwerp address

Source file

https://openreview.net/pdf?id=DUthafLzXr

Identifier

Creation

14.12.2023

Last edited

17.06.2024

To cite this reference

https://hdl.handle.net/10067/2015640151162165141