Deep Reinforcement Learning for Intelligent Reflecting Surface-assisted D2D Communications




Intelligent reflecting surface (IRS), D2D communications, deep reinforcement learning


In this paper, we propose a deep reinforcement learning (DRL) approach for solving the optimisation problem of the network’s sum-rate in device-to-device (D2D) communications supported by an intelligent reflecting surface (IRS). The IRS is deployed to mitigate the interference and enhance the signal between the D2D transmitter and the associated D2D receiver. Our objective is to jointly optimise the transmit power at the D2D transmitter and the phase shift matrix at the IRS to maximise the network sum-rate. We formulate a Markov decision process and then propose the proximal policy optimisation for solving the maximisation game. Simulation results show impressive performance in terms of the achievable rate and processing time.


Download data is not yet available.


Huang, J., Xing, C.C. and Guizani, M. (2020) Power allocation for D2D communications with SWIPT. IEEE Trans. Wireless Commun. 19(4): 2308–2320.

Nguyen, K.K., Duong, T.Q., Vien, N.A., Le-Khac, N.A. and Nguyen, L.D. (2019) Distributed deep deterministic policy gradient for power allocation control in D2D-based V2V communications. IEEE Access 7: 164533–164543.

Mousavifar, S.A., Liu, Y., Leung, C., Elkashlan, M. and Duong, T.Q. (September 2014) Wireless energy harvesting and spectrum sharing in cognitive radio. In Proc. IEEE 80th Vehicular Technology Conference (VTC2014-Fall), Vancouver, BC, Canada: 1–5.

Yu, H., Tuan, H.D., Nasir, A.A., Duong, T.Q. and Poor, H. V. (2020) Joint design of reconfigurable intelligent surfaces and transmit beamforming under proper and improper Gaussian signaling. IEEE J. Select. Areas Commun. 38(11): 2589–2603.

Zou, Y., Gong, S., Xu, J., Cheng, W., Hoang, D.T. and Niyato, D. (2020) Wireless powered intelligent reflecting surfaces for enhancing wireless communications. IEEE Transactions on Vehicular Technology 69(10): 12369–12373.

Zheng, B., You, C. and Zhang, R. (2021) Efficient channel estimation for double-IRS aided multi-user MIMO system. IEEE Trans. Commun. 69(6): 3818–3832.

Nguyen, K.K., Khosravirad, S., Costa, D.B.D., Nguyen, L. D. and Duong, T.Q. (2022) Reconfigurable intelligent surface-assisted multi-UAV networks: Efficient resource allocation with deep reinforcement learning. IEEE J. Selected Topics in Signal Process. 16(3): 358–368.

Chen, Y., Ai, B., Zhang, H., Niu, Y., Song, L., Han, Z. and Poor, H.V. (2021) Reconfigurable intelligent surface assisted device-to-device communications. IEEE Trans. Wireless Commun. 20(5): 2792–2804.

Jia, S., Yuan, X. and Liang, Y.C. (2021) Reconfigurable intelligent surfaces for energy efficiency in D2D communication network. IEEE Wireless Commun. Lett. 10(3): 683–687.

Pradhan, C., Li, A., Song, L., Li, J., Vucetic, B. and Li, Y. (2020) Reconfigurable intelligent surface (RIS)-enhanced two-way OFDM communications. IEEE Transactions on Vehicular Technology 69(12): 16270–16275.

Cao, Y., Lv, T., Ni, W. and Lin, Z. (2021) Sum-rate maximization for multi-reconfigurable intelligent surface-assisted device-to-device communications. IEEE Trans. Commun. 69(11): 7283–7296.

Yang, G., Liao, Y., Liang, Y.C., Tirkkonen, O., Wang, G. and Zhu, X. (2021) Reconfigurable intelligent surface empowered device-to-device communication underlaying cellular networks. IEEE Trans. Commun. 69(11): 7790–7805.

Nguyen, K.K., Vien, N.A., Nguyen, L.D., Le, M.T., Hanzo, L. and Duong, T.Q. (2021) Real-time energy harvesting aided scheduling in UAV-assisted D2D networks relying on deep reinforcement learning. IEEE Access 9: 3638–3648.

Huang, C., Mo, R. and Yuen, C. (2020) Reconfigurable intelligent surface assisted multiuser MISO systems exploiting deep reinforcement learning. IEEE J. Select. Areas Commun. 38(8): 1839–1850.

Shokry, M., Elhattab, M., Assi, C., Sharafeddine, S. and Ghrayeb, A. (2021) Optimizing age of informa-tion through aerial reconfigurable intelligent surfaces: A deep reinforcement learning approach. IEEE Transac-tions on Vehicular Technology 70(4): 3978–3983.

Feng, K., Wang, Q., Li, X. and Wen, C.K. (2020) Deep reinforcement learning based intelligent reflecting surface optimization for MISO communication systems. IEEE Wireless Commun. Lett. 9(5): 745–749.

Nguyen, K.K., Duong, T.Q., Do-Duy, T., Claussen, H. and Hanzo, L. (2022) 3D UAV trajectory and data collection optimisation via deep reinforcement learning. IEEE Trans. Commun. 70(4): 2358–2371.

Bertsekas, D.P. (1995) Dynamic Programming and Optimal Control, 1 (Athena Scientific Belmont, MA).

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O. (2017), Proximal policy optimization algorithms. URL

Schulman, J., Moritz, P., Levine, S., Jordan, M.I. and Abbeel, P. (2016) High-dimensional continuous control using generalized advantage estimation. In Proc. 4th International Conf. Learning Representations (ICLR).

Mnih, V. et al. (2016) Asynchronous methods for deep reinforcement learning. In Proc. Int. Conf. Mach. Learn.(PMLR): 1928–1937.

Kingma, D.P. and Ba, J.L. (2014), Adam: A method for stochastic optimization. URL arXivpreprintarXiv: 1412.6980.

Abadi, M. et al. (2016) Tensorflow: A system for large-scale machine learning. In Proc. 12th USENIX Sym. Opr. Syst. Design and Imp. (OSDI 16): 265–283.

Sutton, R.S., McAllester, D., Singh, S. and Mansour, Y. (2000) Policy gradient methods for reinforcement learning with function approximation. In Adv. Neural Inf. Process. Syst.: 1057–1063.




How to Cite

Nguyen, K. K., Masaracchia, A., & Yin, C. (2023). Deep Reinforcement Learning for Intelligent Reflecting Surface-assisted D2D Communications. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 10(1), e1.