Deep reinforcement learning (DRL) is being increasingly used in large-scale productions such as Netflix and Facebook. However, these data-driven systems can exhibit undesirable behaviors due to changes in the production environment, known as environmental drifts. Continual Learning (CL) is a self-healing approach that allows the DRL agent to adapt to these environmental shifts. However, significant and frequent shifts can cause the production environment to deviate from its original state. Recent studies have found that these environmental drifts can lead to long and unsuccessful healing cycles in CL, resulting in issues like catastrophic forgetting, warm-starting failure, and slow convergence.
To address these challenges, we propose Dr. DRL, an effective self-healing approach for DRL systems. Dr. DRL integrates a novel mechanism of intentional forgetting into vanilla CL to overcome its main issues. It intentionally erases minor behaviors of the DRL system, focusing on prioritizing the adaptation of key problem-solving skills. We evaluate Dr. DRL using well-established DRL algorithms on various drifted environments and compare it with vanilla CL.
Our results show that Dr. DRL is able to reduce the healing time and fine-tuning episodes by an average of 18.74% and 17.72%, respectively. Furthermore, Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments that were left unsolved by vanilla CL. Additionally, for drifted environments that are resolved by both approaches, Dr. DRL maintains and even enhances the obtained rewards by up to 45%.
In conclusion, Dr. DRL is a promising approach for improving the adaptability and performance of DRL systems in the face of environmental drifts. Its intentional forgetting mechanism enables efficient and effective self-healing, resulting in faster adaptation and improved rewards in drifted environments.