Reinforcement Learning-enhanced Policy-aware Modeling of Smart Grid Efficiency under Carbon Constraints: Integration of SBM DEA and Dynamic Policy Response Simulation
DOI:
https://doi.org/10.4108/ew.12473Keywords:
artificial intelligence, reinforcement learning, smart grid, carbon reduction efficiency, SBM-DEA, dynamic policy responseAbstract
Artificial intelligence-based sensing, forecasting, and decision optimization are being rapidly integrated into smart grid operations. Although reinforcement learning has created new opportunities for dispatch optimization and low-carbon transition under carbon constraints, most existing studies focus primarily on short-term economic or operational objectives and rarely incorporate system-level carbon reduction efficiency benchmarks into the learning process. To address this gap, this study proposes an integrated SBM-DEA and reinforcement learning framework for policy-aware smart-grid dispatch, in which carbon reduction efficiency scores are transformed from static evaluation results into dynamic learning signals for dispatch optimization. Using panel data from 30 provinces in China over the period 2011–2022, this study develops an indicator system covering capital input, labor input, electricity service output, and electricity-related carbon dioxide emissions. An SBM-DEA model with undesirable outputs is employed to measure the carbon reduction efficiency of smart grids. The estimated efficiency scores are then embedded into both the state representation and reward function of a reinforcement learning framework, where the agent learns dispatch policies that balance economic performance, carbon constraints, and efficiency improvement. A dynamic policy response simulation environment is further constructed, incorporating a hybrid energy storage system comprising battery storage and pumped hydro storage. The results show that the carbon reduction efficiency of smart grids in China exhibits stage-specific fluctuations, with annual average values ranging from 0.505 to 0.568 and pronounced interprovincial disparities. In the simulation experiments, the reinforcement learning agent trained with efficiency-based penalties achieves 7.3% lower operational costs and 8.5% higher average efficiency compared to an economic-only agent. The trained policies also exhibit clear policy-responsive behavior: when carbon prices rise, hybrid storage utilization increases and coal-fired generation declines. The main innovation of this study is that it integrates historical efficiency benchmarking with reinforcement learning-based dispatch optimization, providing a policy-aware and efficiency-guided decision-support framework for carbon-constrained smart grids.
Downloads
References
[1] Ahmadi, M., Aly, H., Gu, J. A comprehensive review of AI-driven approaches for smart grid stability and reliability[J]. Renewable and Sustainable Energy Reviews, 2026, 226: 116424.
[2] Glover, D., Krishnamoorthy, G., Ren, H., et al. Deep Reinforcement Learning for Distribution System Operations: A Tutorial and Survey[J]. Proceedings of the IEEE, 2025.
[3] Ibsen Chivata Cardenas. Mitigation of climate change. Risk and uncertainty research gaps in the specification of mitigation actions. Environmental Science & Policy, 2024, 162: 103912.
[4] Yang C, Nan Y, Li Y et al. Optimal Scheduling of Hybrid Energy Storage System Considering Economy and Wind Power Dissipation. 2024 IEEE 7th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 2024, pp. 1436-1440.
[5] Masoud Jafarian, Ehsanolah Assareh, Ali Ershadi, et al. Optimal integration of efficient energy storage and renewable sources in hybrid energy systems: A novel optimization and dynamic evaluation strategy, Journal of Energy Storage, 2024, 101, (Part B): 113880.
[6]Sarad Basnet, Karine Deschinkel, Luis Le Moyne, et al. Optimal integration of hybrid renewable energy systems for decarbonized urban electrification and hydrogen mobility. International Journal of Hydrogen Energy, 2024, 83: 1448-1462.
[7] Rajaperumal T. A. & Christopher Columbus C. Transforming the electrical grid: the role of AI in advancing smart, sustainable, and secure energy systems. Energy Inform, 2025, 8: 51.
[8] Khaleel M, Yusupov Z, Kilic H. Batteries and Secure Energy Transitions. Battery technologies In electrical power Systems: Pioneering secure energy transitions, Journal of Power Sources, 2025, 635: 237709
[9] Heluany, J. B., Gkioulos, V. A. A review on digital twins for power generation and distribution[J]. International Journal of Information Security, 2024, 23: 1171-1195.
[10] Yu, P., Zhang, H., Song, Y., et al. Safe reinforcement learning for power system control: A review[J]. Renewable and Sustainable Energy Reviews, 2025, 223: 116022.
[11] Mahmood, M., Chowdhury, P., Yeassin, R., et al. Impacts of digitalization on smart grids, renewable energy, and demand response: An updated review of current applications[J]. Energy Conversion and Management: X, 2024, 24: 100790.
[12] Qiu, D., Wang, Y., Hua, W., et al. Reinforcement learning for electric vehicle applications in power systems: A critical review[J]. Renewable and Sustainable Energy Reviews, 2023, 173: 113052.
[13] Kumar, R., De, M. Advancement in power system resilience through deep reinforcement learning: A comprehensive review[J]. Renewable and Sustainable Energy Reviews, 2025, 222: 115951.
[14] Thwe, M. M., Stefanov, A., Rajkumar, V. S., et al. Digital Twins for Power Systems: Review of Current Practices, Requirements, Enabling Technologies, Data Federation and Challenges[J]. IEEE Access, 2025, 13: 105517-105540.
[15] Hrgović, I., Pavić, I. Reward design for intelligent deep reinforcement learning based power flow control using topology optimization[J]. Sustainable Energy, Grids and Networks, 2025, 41: 101580.
[16] Ahmed, D., Hua, H. X., Bhutta, U. S. Innovation through Green Finance: a thematic review[J]. Current Opinion in Environmental Sustainability, 2024, 66: 101402.
[17] Zhao, X., Benkraiem, R., Abedin, M. Z., et al. The charm of green finance: Can green finance reduce corporate carbon emissions?[J]. Energy Economics, 2024, 134: 107574.
[18] Huang, J., An, L., Peng, W., et al. Identifying the role of green financial development played in carbon intensity: Evidence from China[J]. Journal of Cleaner Production, 2023, 408: 136943.
[19] Glavić M. (Deep) reinforcement learning for electric power system control and related problems: A short review and perspectives[J]. Annual Reviews in Control, 2019, 48: 22-35.
[20] Liang T, et al. Deep reinforcement learning-based optimal scheduling of integrated energy systems for electricity, heat and hydrogen storage[J]. Energy, 2024.
[21] Li P, Wei M, Ji H, et al. Deep Reinforcement Learning-Based Adaptive Voltage Control of Active Distribution Networks with Multi-terminal Soft Open Point[J]. International Journal of Electrical Power & Energy Systems, 2022, 141: 108138.
[22] Xiang Y, Lu Y, Liu J. Deep reinforcement learning based topology-aware voltage regulation of distribution networks with distributed energy storage[J]. Applied Energy, 2023, 332: 120510.
[23] Ranjbaran P., Ebrahimi J., Bakhshai A and Jain P., Reinforcement Learning-Based Approaches to Energy Management of Hybrid Energy Storage Systems in Electric Vehicles. 2023 IEEE 14th International Conference on Power Electronics and Drive Systems (PEDS), Montreal, QC, Canada, 2023, pp. 1-6.
[24] David Toquica, Kodjo Agbossou, Nilson Henao, Multi-agent reinforcement learning for energy management in microgrids with shared hydrogen storage [J]. International Journal of Hydrogen Energy, 2025, 144(3): 1019-1027.
[25] Department of Industrial Statistics, National Bureau of Statistics of China. China Industrial Statistical Yearbook-2021[M]. Beijing: China Statistics Press, 2021.
[26] SHAN Y., GUAN D., ZHENG H., et al. China CO2 emission accounts 1997-2015[J]. Scientific Data, 2018, 5: 170201.
[27] National Bureau of Statistics of China. China Statistical Yearbook-2022[M]. Beijing: China Statistics Press, 2022.
[28] China Electricity Council. China Electric Power Statistical Yearbook-2022[M]. Beijing: China Statistics Press, 2022.
[29] Department of Energy Statistics, National Bureau of Statistics of China. China Energy Statistical Yearbook-2022[M]. Beijing: China Statistics Press, 2023.
[30] WILCOX R R. Introduction to Robust Estimation and Hypothesis Testing[M]. 3rd ed. Amsterdam: Academic Press, 2012.
[31] LITTLE R J A., RUBIN D B. Statistical Analysis with Missing Data[M]. 3rd ed. Hoboken, NJ: John Wiley & Sons, 2019.
[32] TONE K. A slacks-based measure of efficiency in data envelopment analysis[J]. European Journal of Operational Research, 2001, 130(3): 498-509.
[33] Gu Y., Cheng Y., Chen C. L. P. and Wang X. Proximal Policy Optimization With Policy Feedback. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(7): 4600-4610.
[34] Lundberg S M, Lee S I. A unified approach to interpreting model predictions[C]. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, 30: 4768 - 477.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Gemei Shi

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 4.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.