P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge

Authors

DOI:

https://doi.org/10.4108/airo.9292

Keywords:

Peer-to-peer, Edge computing, Quantized LLMs, Resource-constrained edge, Decentralized generative AI, Web frameworks

Abstract

In this research, we present P2PLLMEdge, a pioneering peer-to-peer framework designed to enable localized Large Language Models (LLMs) to operate efficiently in resource-constrained edge environments, exemplified by devices such as the Raspberry Pi 4B and CPU-only laptops. The framework addresses critical challenges, including limited computational capacity, network overhead, and scalability, by leveraging lightweight RESTful communication protocols, model-specific quantization, and decentralized task distribution. Key results demonstrate that P2PLLMEdge achieves substantial performance improvements. On average, Peer 2 (CPU-only laptop) achieves a 44.7% reduction in total duration (tpeer2, total = 15.87 × 109 ns) compared to Peer 1 (Raspberry Pi 4B, tpeer1, total = 28.18 × 109 ns). The framework processes tokens at a rate of 21.77 tokens/second on advanced LLMs like Granite3.1-moe:1b, significantly outperforming the baseline. Peer 1, employing quantized LLMs such as smolm2:360m-instruct-q8_0, reduces prompt evaluation duration by 23.2% (tpeer1, prompt_eval = 0.76 × 109 ns) compared to larger models like qwen2.5:0.5binstruct (tpeer1, prompt_eval =0.99 × 109 ns). Peer 2 demonstrates superior summarization capabilities, with evaluation durations (tpeer2, eval) reduced by 72.8% (tpeer2, eval = 5.15 × 109 ns) for explanation-type prompts relative to Peer 1 (tpeer1, eval = 18.93 × 109 ns). The framework also achieves significant network efficiency, reducing inter-peer communication durations by up to 44.9% (tpeer2, network = 25.83 × 109 ns
vs. tpeer1, network = 46.92 × 109 ns). Peer-to-peer synergy ensures seamless task execution, where Peer 1 generates text and offloads computationally intensive summarization tasks to Peer 2, achieving a balance between performance and resource utilization. The novelty of P2PLLMEdge lies in its ability to seamlessly integrate lightweight LLMs with decentralized edge devices, achieving advanced natural language processing functionalities entirely on edge devices traditionally deemed unsuitable for such tasks. This framework provides an adaptable, and cost-effective approach for deploying quantized LLM-driven applications. Future directions include scaling the framework to multi-peer environments, optimizing task scheduling algorithms, and exploring integration with heterogeneous LLM-enabled systems. The codes are available on https://github.com/ParthaPRay/peer_to_peer_local_llm_interaction.

Downloads

Download data is not yet available.

References

[1] Khalfi, M.F. and Tabbiche, M.N., 2025. GPThingSim: A IoT Simulator Based GPT Models Over an Edge-Cloud Environments. International Journal of Networked and Distributed Computing, 13(1), pp.1-20.

[2] Tharayil, S.M., Krishnapriya, M.A. and Alomari, N.K., 2025. How Multimodal AI and IoT Are Shaping the Future of Intelligence. In Internet of Things and Big Data Analytics for a Green Environment (pp. 138-167). Chapman and Hall/CRC.

[3] Chelliah, A.M.R., Colby, R., Nagasubramanian, G. and Ranganath, S., 2025. 3.2 Edge AI. Model Optimization Methods for Efficient and Edge AI.

[4] Nimmagadda, Y., 2025. Model Optimization Techniques for Edge Devices. Model Optimization Methods for Efficient and Edge AI: Federated Learning Architectures, Frameworks and Applications, pp.57-85.

[5] Martin-Salinas, I., Badia, J.M., Valls, O., Leon, G., del Amor, R., Belloch, J.A., Amor-Martin, A. and Naranjo, V., 2025. Evaluating and accelerating vision transformers on GPU-based embedded edge AI systems. The Journal of Supercomputing, 81(1), p.349.

[6] Yu, D., Zhou, X., Noorian, A. and Hazratifard, M., 2025. An AI-driven social media recommender system leveraging smartphone and IoT data. The Journal of Supercomputing, 81(1), pp.1-32.

[7] Zhang, M., Shen, X., Cao, J., Cui, Z. and Jiang, S., 2024. Edgeshard: Efficient llm inference via collaborative edge computing. IEEE Internet of Things Journal.

[8] Kok, I., Demirci, O. and Ozdemir, S., 2024. When IoT Meet LLMs: Applications and Challenges. arXiv preprint arXiv:2411.17722.

[9] Kalita, A., 2024. Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks. arXiv preprint arXiv:2407.20970.

[10] Qu, G., Chen, Q., Wei, W., Lin, Z., Chen, X. and Huang, K., 2024. Mobile edge intelligence for large language models: A contemporary survey. arXiv preprint arXiv:2407.18921.

[11] An, T., Zhou, Y., Zou, H. and Yang, J., 2024. Iotllm: Enhancing real-world iot task reasoning with large language models. arXiv preprint arXiv:2410.02429.

[12] Hu, Y., Ye, D., Kang, J., Wu, M. and Yu, R., 2024. A Cloud-Edge Collaborative Architecture for Multimodal LLMs-Based Advanced Driver Assistance Systems in IoT Networks. IEEE Internet of Things Journal.

[13] Xiao, B., Kantarci, B., Kang, J., Niyato, D. and Guizani, M., 2024. Efficient Prompting for LLM-based Generative Internet of Things. arXiv preprint arXiv:2406.10382.

[14] Raspberry Pi 4B, 2025. Raspberry Pi 4 Model B Technical Overview. Available at: https://www.raspberrypi.com/products/raspberrypi-4-model-b/ [Accessed 4 Jan. 2025].

[15] Friha, O., Ferrag, M.A., Kantarci, B., Cakmak, B., Ozgun, A. and Ghoualmi-Zine, N., 2024. Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness. IEEE Open Journal of the Communications Society.

[16] Gao, Y., Song, Z. and Yin, J., 2023. Gradientcoin: A peer-to-peer decentralized large language models. arXiv preprint arXiv:2308.10502.

[17] Karanjai, R. and Shi, W., 2024, May. Trusted LLM Inference on the Edge with Smart Contracts. In 2024 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) (pp. 1-7). IEEE.

[18] He, Y., Fang, J., Yu, F.R. and Leung, V.C., 2024. Large language models (LLMs) inference offloading and resource allocation in cloud-edge computing: An active inference approach. IEEE Transactions on Mobile Computing.

[19] Olshansky, D., Colmeiro, R.R. and Li, B., 2024. Decentralized AI: Permissionless LLM Inference on POKT Network. arXiv preprint arXiv:2405.20450.

[20] Hasan, S.M., Alotaibi, A.M., Talukder, S. and Shahid, A.R., 2024. Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach. arXiv preprint arXiv:2405.08755.

[21] Chen, H., Deng, W., Yang, S., Xu, J., Jiang, Z., Ngai, E.C., Liu, J. and Liu, X., 2024. Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges. arXiv preprint arXiv:2410.18125.

[22] Ale, L., Zhang, N., King, S.A. and Chen, D., 2024. Empowering generative AI through mobile edge computing. Nature Reviews Electrical Engineering, pp.1-9.

[23] Bhardwaj, S., Singh, P. and Pandit, M.K., 2024, March. A survey on the integration and optimization of large language models in edge computing environments. In 2024 16th International Conference on Computer and Automation Engineering (ICCAE) (pp. 168-172). IEEE.

[24] Soltoggio, A., Ben-Iwhiwhu, E., Braverman, V., Eaton, E., Epstein, B., Ge, Y., Halperin, L., How, J., Itti, L., Jacobs, M.A. and Kantharaju, P., 2024. A collective AI via lifelong learning and sharing at the edge. Nature Machine Intelligence, 6(3), pp.251-264.

[25] Ollama, 2025. Ollama: Large Language Model Framework. Available at: https://ollama.com/ [Accessed 4 Jan. 2025].

[26] Ollama API, 2025. Ollama API Documentation. Available at: https://github.com/ollama/ollama/blob/main/docs/api.md [Accessed 4 Jan. 2025].

[27] Qwen2.5:0.5b-instruct, 2025. Qwen2.5:0.5binstruct Language Model. Available at: https://ollama.com/library/qwen2.5:0.5b-instruct

[Accessed 4 Jan. 2025].

[28] Smolm2:360m, 2025. Smolm2:360minstruct-q8_0. Available at: https://ollama.com/library/smollm2:360m-instructq8_0 [Accessed 4 Jan. 2025].

[29] Granite3.1, 2025. Granite3.1 Language Models by IBM. Available at: https://github.com/ibm-granite/granite-3.1-language-models [Accessed 4 Jan. 2025].

[30] Llama3.2, 2025. Llama3.2 Language Model. Available at: https://ollama.com/library/llama3.2 [Accessed 4 Jan. 2025].

[31] Qwen2.5:1.5b, 2025. Qwen2.5:1.5b Language Model. Available at: https://ollama.com/library/qwen2.5:1.5b [Accessed 4 Jan. 2025].

[32] Smolm2:1.7b, 2025. Smolm2:1.7b Language Model. Available at: https://ollama.com/library/smollm2 [Accessed 4 Jan. 2025].

[33] Flask, 2025. Flask Web Framework. Available at: https://flask.palletsprojects.com/ [Accessed 4 Jan. 2025].

[34] FastAPI, 2025. FastAPI Framework Documentation. Available at: https://fastapi.tiangolo.com/ [Accessed 4 Jan. 2025].

[35] Requests, 2025. Requests Library for Python. Available at: https://pypi.org/project/requests/ [Accessed 4 Jan. 2025].

[36] Pydantic, 2025. Pydantic for Data Validation and Parsing. Available at: https://pypi.org/project/pydantic/ [Accessed 4 Jan. 2025].

[37] Luo, Z., Yan, H. and Pan, X., 2023. Optimizing Transformer Models for Resource-Constrained Environments: A Study on Model Compression Techniques. Journal of Computational Methods in Engineering Applications, pp.1-12.

[38] Liu HI, Galindo M, Xie H, Wong LK, Shuai HH, Li YH, Cheng WH. Lightweight deep learning for resource constrained environments: A survey. ACM Computing Surveys. 2024 Jun 24;56(10):1-42.

[39] Girija SS, Kapoor S, Arora L, Pradhan D, Raj A, Shetgaonkar A. Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques. arXiv preprint arXiv:2505.02309. 2025 May 5.

[40] Careem R, Johar G, Khatibi A. Deep neural networks optimization for resource-constrained environments: techniques and models. Indonesian Journal of Electrical Engineering and Computer Science. 2024 Mar;33(3):1843-54.

[41] Waheed Z, Khalid S, Riaz SM, Khawaja SG, Tariq R. Resource-Restricted Environments Based Memory-Efficient Compressed Convolutional Neural Network Model for Image-Level Object Classification. IEEE Access. 2022 Dec 15;11:1386-406.

[42] Shabir MY, Torta G, Damiani F. Edge ai on constrained iot devices: Quantization strategies for model optimization. In Intelligent Systems Conference 2024 Jul 31 (pp. 556-574). Cham: Springer Nature Switzerland.

Downloads

Published

08-07-2025

How to Cite

[1]
P. P. Ray and M. P. Pradhan, “P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge”, EAI Endorsed Trans AI Robotics, vol. 4, Jul. 2025.