Designing Automation for Pickup and Delivery Tasks in Modern Warehouses Using Multi Agent Path Finding (MAPF) and Multi Agent Reinforcement Learning (MARL) Based Approaches

Shambhavi Mishra; Rajendra Kumar Dwivedi

doi:10.4108/airo.3449

Authors

Shambhavi Mishra Madan Mohan Malaviya University of Technology
Rajendra Kumar Dwivedi Madan Mohan Malaviya University of Technology https://orcid.org/0000-0001-6682-1942

DOI:

https://doi.org/10.4108/airo.3449

Keywords:

Multi agent pickup and delivery problem, multi agent reinforcement learning, MARL, Multi agent path finding

Abstract

A warehouse pickup and delivery problem finds its solution using multi agent path finding (MAPF) approach. Also, the problem has been used to showcase the capabilities of the multi agent reinforcement learning (MARL). The warehouse pickup and delivery work needs the agent to pick up a requested item and successfully deliver it to the intended location within the warehouse. The problem has been solved based on two approaches that include single shot and lifelong problem solution. The single shot solution has the delivery as the final goal and thus once it reaches the delivery address, it stops whereas in case of lifelong, the agent needs to deliver the item which it had picked, deliver it to the required place and then again pick up new item until requests are satisfied. The strategy used by multi agent path finding (MAPF) approach aims at constructing collision free paths to reach the delivery location but in case of multi agent reinforcement learning (MARL), the agents’ decision making tactics (or policies) are learned which are then used to help agents decide path to be followed based on environment state and agent’s position. The results show that the lifelong conflict based search (CBS) is a better option when the agents are less in number as in that case, the re-planning will take overall less time but when the agents are large in number then this re-planning can take very long to produce conflict free paths from source to goal nodes. In this case, shared experience action critic (SEAC) which is based on multi agent reinforcement learning (MARL) approach can be more efficient choice as it takes the current environment state to give the most suitable action for that time t. For this study the agents taken for learning are homogeneous in nature that can pickup and deliver any type of requested item. We can address the same pickup and delivery problem when the agents are not all same and differ in their capabilities and the type of item they can handle.

Downloads

References

Frans A. Oliehoek, Matthijs T. J. Spaan, and Nikos Vlassis. “Optimal and Approximate Q-Value Functions for Decentralized POMDPs”. In: Journal of Artificial Intelligence Research 32 (2008), pp. 289–353.

Sinno Jialin Pan and Qiang Yang. “A Survey on Transfer Learning”. In: IEEE Transactions on Knowledge and Data Engineering 22.10 (2009), pp. 1345–1359.

Stefan Schaal. “Learning from Demonstration”. In: Advances in Neural Information Processing Systems.1997, pp. 1040–1046.

Volodymyr Mnih, Adria P. Badia, Lehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, and Koray Kavukcuoglu. “Asynchronous Methods for Deep Reinforcement Learning”. In: International Conference on Machine Learning. Vol. 4. 2016, pp. 2850–2869.

Tonghan Wang, Heng Dong, Victor Lesser, and Chongjie Zhang. “ROMA: Multi-Agent Reinforcement Learning with Emergent Roles”. In: International Conference on Machine Learning. 2020.

Kottinger, J., Almagor, S. and Lahijanian, M. (2022) ‘Conflict-based search for explainable multi-agent path finding’, Proceedings of the International Conference on Automated Planning and Scheduling, 32, pp. 692–700. doi:10.1609/icaps.v32i1.19859.

Filippos Christianos, Lukas Schafer, and Stefano V. Albrecht. Shared experience actor-critic for multiagent reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2020.

Mehul Damani, Zhiyao Luo, Emerson Wenzel, and Guillaume Sartoretti. PRIMAL2: Pathfinding via reinforcement and imitation multi-agent learning - Lifelong. IEEE Robotics and AutomationLetters, 6(2):2666–2673, 2021.

H. Ma, D. Harabor, P. Stuckey, J. Li, and S. Koenig. Searching with consistent prioritization for multi-agent path finding. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 7643–7650, 2019.

Filippos Christianos, Georgios Papoudakis, Arrasy Rahman, and Stefano V. Albrecht. Scaling multi agent reinforcement learning with selective parameter sharing. In Proceedings of the International Conference on Machine Learning (ICML), 2021.

Georgios Papoudakis, Filippos Christianos, Lukas Schafer, and Stefano V. Albrecht. Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks. arXiv preprint arXiv:2006.07869, 2021.

Oren Salzman and Roni Stern. Research challenges and opportunities in multi-agent path finding and multi-agent pickup and delivery problems. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020.

Guillaume Sartoretti, Justin Kerr, Yunfei Shi, Glenn Wagner, T.K. Satish Kumar, Sven Koenig, and Howie Choset. Primal: Path finding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 4(3):2378–2385, 2019.

G. Sharon, R. Stern, A. Felner, and N. Sturtevant. Conflict-based search for optimal multi-agent path finding. Artificial Intelligence, 219:40–66, 2015.

G. Sharon, R. Stern, A. Felner, and N. Sturtevant. Conflict-based search for optimal multi-agent path finding. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 563–569, 2012.

Kottinger, J. 2021. Explanation-Guided Conflict-Based Search for Explainable MAPF. https://github.com/ariasystems-group/Explanation-Guided-CBS.

Littman, M. L. Markov Games as a Framework for Multi Agent Reinforcement Learning. In Proceedings of the Eleventh International Conference on Machine Learning, 1994.

Lowe, R., Wu, Y., Tamar, A., Harb, J., Pieter Abbeel, O., and Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems, volume 30, pp. 6379–6390, 2017.

Wan, Q. et al. (2018) ‘Lifelong multi-agent path finding in a dynamic environment’, 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV) doi:10.1109/icarcv.2018.8581181.

R. Luna and K. E. Bekris, “Push and swap: Fast cooperative pathfinding with completeness guarantees,” in IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, 2011, pp. 294–300.

C. Wei, K. V. Hindriks, and C. M. Jonker, “Altruistic coordination for multi-robot cooperative pathfinding,” Appl. Intell., vol. 44, no. 2, pp. 269–281, 2016.

Stern, R. et al. (2021) ‘Multi-agent pathfinding: Definitions, variants, and benchmarks’, Proceedings of the International Symposium on Combinatorial Search, 10(1), pp. 151–158. doi:10.1609/socs.v10i1.18510..