EAI Endorsed Transactions on Industrial Networks and Intelligent Systems

Artificial Intelligence-Driven Early Prediction of Student Dropout and Academic Outcomes in Higher Education: A Comparative Study of Advanced Machine Learning Approaches

Nghia Trong Vo — 2026-03-03

Student dropout in higher education remains a critical challenge with significant academic, social, and economic implications. Early identification of students at risk of dropout enables institutions to design timely and targeted interventions that support academic success and improve retention rates. This study proposes a machine learning (ML)–driven framework for the early prediction of student dropout and academic outcomes in higher education using a comprehensive, real-world dataset collected from a higher education institution. The prediction task is formulated as a multiclass classification problem with three outcomes: dropout, enrolled, and graduate. To evaluate the effectiveness of different modeling approaches, we conduct a comparative analysis of widely used ML algorithms, including Logistic Regression, Naïve Bayes, k-Nearest Neighbors, Support Vector Machine, Decision Trees, Random Forest (RF), AdaBoost, XGBoost, LightGBM, and CatBoost. Results indicate that ensemble models achieve the best performance. RF attains the highest test accuracy (0.7797) and ROC-AUC (OvR) (0.8919), while LightGBM yields the best Macro-F1 (0.7082). Feature importance analysis shows that early academic progress indicators (approved units and semester grades) are the strongest predictors, followed by selected administrative/contextual factors such as tuition-fee status and course. Overall, this study provides empirical evidence supporting the use of ML techniques as effective decision-support tools for higher education institutions. The proposed framework offers actionable insights for administrators and policymakers seeking to develop data-driven strategies aimed at reducing dropout rates, improving academic success, and promoting equitable access to educational opportunities.

HyperDyG: Hypergraph-Driven Dynamic Fusion for Semi-Supervised Multimodal Emotion Recognition

Nhut Minh Nguyen — 2026-02-19

Speech emotion recognition (SER) is important in healthcare, education, human–computer interaction, and customer service. Multimodal emotion recognition (MER) integrates audio and textual modalities to achieve a comprehensive understanding of human affect, but still suffers from limited labeled data and complex cross-modal relations. To address these challenges, we propose HyperDyG, a dynamic hypergraph-driven MER framework. The HyperDyG leverages the strengths of dynamic hypergraph learning (DHL), cross-modal transformer (CMT), and an adaptive gated multimodal unit (GMU) for robust multimodal fusion. HyperDyG is further enhanced with a semi-supervised learning strategy that incorporates weak–strong augmentation, confidence-filtered pseudo-labeling, and consistency regularization to effectively exploit large-scale unlabeled data. The HyperDyG achieves state-of-the-art (SOTA) performance on the benchmark emotion dataset and maintains stable accuracy across varying unlabeled ratios. The findings of HyperDyG highlight the effectiveness and scalability of the proposed architecture in real-world low-label MER scenarios.

Deep Reinforcement Learning Approaches Against Jammers with Unequal Sweeping Probability Attacks

Lan Nguyen — 2025-11-04

This paper investigates deep reinforcement learning (DRL) approaches designed to counter jammers that maximize disruption by employing unequal sweeping probabilities. We first propose a model and defense action based on a Markov Decision Process (MDP) under non-uniform attacks. A key drawback of the standard MDP model, however, is its assumption that the defending agent can acquire sufficient information about the jamming patterns to determine the transition probability matrix. In a dynamic environment, the attacker’s patterns and models are often unknown or difficult to obtain. To overcome this limitation, RL techniques such as Q-learning, deep Q-network (DQN), and double deep Q-network (DDQN) have been considered effective defense strategies that operate without an explicit jamming model. With Q-learning, defense strategies can still be computationally expensive and require long time to learn the optimal policy. This limitation arises because a large state space or a substantial number of actions causes the Q-table to grow exponentially. Leveraging the flexibility, adaptability, and scalability of RL, we first propose a DQN framework designed to handle large-scale action spaces across expanded channels and jammers. Furthermore, to overcome the inherent overestimation bias present in Q-learning and DQN algorithms, we investigate a DDQN framework. Assuming the estimation error of the action value in DQN follows a zero-mean Gaussian distribution, we then analytically derive the expected loss. Numerical examples are finally presented to characterize the performances of the proposed algorithms and the superiority of DDQN over DQN and Q-learning approaches.

Few-Shot Classification Of Brain Cancer Images Using Meta-Learning Algorithms

Tuyet-Nhi Thi Nguyen — 2025-11-06

The primary objective of deep learning is to have good performance on a large dataset. However, when the model lacks sufficient data, it becomes a challenge to achieve high accuracy in predicting these unfamiliar classes. In fact, the real-world dataset often introduces new classes, and some types of data are difficult to collect or simulate, such as medical images. A subset of machine learning is meta learning, or "learning-to-learn", which can tackle these problems. In this paper, a few-shot classification model is proposed to classify three types of brain cancer: Glioma brain cancer, Meningioma brain cancer, and brain Tumor cancer. To achieve this, we employ an episodic meta-training paradigm that integrates the model-agnostic meta-learning (MAML) framework with a prototypical network (ProtoNet) to train the model. In detail, ProtoNet focuses on learning a metric space by computing distances to class prototypes of each class, while MAML concentrates on finding the optimal initialization parameters for the model to enable the model to learn quickly on a few labeled samples. In addition, we compute and report the average accuracy for the baseline and our methods to assess the quality of the prediction confidence. Simulation results indicate that our proposed approach substantially surpasses the performance of the baseline ResNet18 model, achieving an average accuracy improvement from 46.33% to 92.08% across different few-shot settings. These findings highlight the potential of combining metric-based and optimization-based meta-learning techniques to improve diagnostic support in healthcare applications.

The Efficiency Cost-Sensitive Loss of Transformer based on Mamba Mechanism for Aircraft Detection in Satellite Imagery

Manh-Tuan Do — 2025-09-22

Detecting aircraft in satellite images poses considerable challenges due to complex backgrounds and variable conditions influenced by sensor geometry and atmospheric factors. Despite rapid advancements in deep learning algorithms, their main focus has been on ground-based imagery. This study offers a thorough evaluation and comparison of advanced object detection algorithms specifically designed for aircraft detection in satellite imagery. By leveraging the extensive HRPlanesV2 dataset and a rigorous validation process on the GDIT dataset, we trained a cutting-edge object detection model, YOLO-Mamba, published in June 2024. Additionally, we introduce YOLO-Mamba-TransGhost, which integrates a novel Transformer module SC3T and Ghost Convolution into the YOLO model’s backbone architecture. Furthermore, substituting the WIoU loss function with CIoU in YOLO-Mamba results in significant improvements in accuracy and small object detection. Experimental results on the GDIT dataset indicate that YOLO-Mamba-TransGhost improves mAP@.5 by approximately 2% compared to the original YOLO-Mamba. Similarly, tests on the HRPlanev2 data set reveal a notable reduction in model complexity and an impressive accuracy of 98.7% which is achieved by leveraging a cost-sensitive loss function that dynamically focuses training on higher quality samples, improving convergence and accuracy. Therefore, the proposed YOLO-Mamba-TransGhost model demonstrates superior accuracy and reduced complexity in aircraft detection from satellite imagery, highlighting its potential for practical applications in aerospace monitoring, disaster management, and surveillance systems domain.

Optimizing Energy Harvesting Efficiency for IRS-aided TS-SWIPT System with Continuous and Discrete Phase Shifts

Tuan Pham-Viet — 2025-09-02

In recent years, intelligent reflecting surfaces (IRS) have emerged as a groundbreaking technology for enhancing spectral and energy efficiency in wireless communication, offering a cost-effective and energy-efficient solution. This study explores a simultaneous wireless information and power transfer (SWIPT) network employing time-switching (TS) receivers, where a base station (BS) transmits both data and energy signals to users with the assistance of an IRS. By appropriately tuning the phase shifts of IRS elements, transmission performance is optimized in terms of both energy harvesting and data efficiency. The primary objective is to maximize energy harvesting efficiency, defined as the ratio of total harvested energy at the users to the transmission power of the BS, while ensuring the required information rate, adhering to power constraints, and considering practical limitations on phase shifts. To tackle this challenge, an iterative algorithm incorporating non-convex approximations is developed to jointly optimize information beamformers, energy covariance matrix, and TS factors. Finally, numerical simulations validate the convergence and effectiveness of the proposed methodology.

Optimizing UAV Trajectories in Optical IRS-Aided Hybrid FSO/RF Aerial Access Networks Using DRL Technique

Cuong NGUYEN — 2025-11-06

This paper investigates hybrid free-space optics (FSO)/radio frequency aerial access networks (AANs) using a high-altitude platform (HAP) and multiple UAVs to dynamically serve terrestrial users under varying environmental conditions, such as atmospheric turbulence and cloud-induced attenuation. The optical intelligent reflecting surfaces (OIRS), mounted on the HAP, enhance the FSO signal distribution to multiple UAVs by enabling precise beam manipulation, improving link reliability, and increasing network scalability. A deep reinforcement learning (DRL)-based approach is developed to optimize UAV placement and user association in real time, maximizing end-to-end throughput while adhering to backhaul capacity constraints. The study takes into account FSO channel impairments, including path loss, turbulence-induced fading, and pointing misalignment, modeled using log-normal distributions. Numerical results demonstrate that the dynamic deployment of multi-UAV configuration, trained under realistic cloudy conditions, significantly outperforms single-UAV and static deployment strategies, achieving higher data rates and stable user connectivity. This work highlights the potential of deploying OIRS-assisted AANs supporting multiple UAVs to realize robust and high-performance 6G networks.

A Multimodal Swarm Learning Approach for DDoS Detection in Internet of Things Infrastructure

Thuat Nguyen-Khanh — 2025-12-29

The Internet of Things (IoT) has emerged as a foundational platform for driving intelligent solutions, playing a central role in the Fourth Industrial Revolution. Its potential lies in enabling seamless connectivity and real-time data exchange among diverse devices and systems, thereby powering advanced applications such as intelligent transportation, smart healthcare, precision agriculture, and automated manufacturing. These solutions promise to improve efficiency, optimize resource utilization, and enhance decision-making across various sectors. However, this potential is challenged by some issues, including security vulnerabilities, privacy concerns, and significant heterogeneity arising from the vast diversity of devices, communication protocols, and data formats. In this paper, we develop a multimodal deep learning solution to detect DDoS attacks on IoT infrastructure based on two data types: packet-based data and flow-based data. Firstly, the datasets containing packets labeled as benign or attack are processed into two branches: packet-based and flow-based features. Then, each branch is trained using two independent CNN models. Finally, the feature information extracted from both modalities is fused and fed into a concatenation-based classifier for DDoS attack detection. Experimental results on Edge-IIoTset and CiCIoMT2024 datasets indicate that the multimodal deep learning model within a decentralized machine learning architecture achieves performance comparable to centralized machine learning. In addition, our proposal is also robust to non-independent and identically distributed (non-IID) data in decentralized machine learning architecture.

FedNDA: Enhancing Federated Learning with Noisy Client Detection and Robust Aggregation

Tuan Dung Kieu — 2025-07-03

Federated Learning is a novel decentralized methodology that enables multiple clients to collaboratively train a global model while preserving the privacy of their local data. Although federated learning enhances data privacy, it faces challenges related to data quality and client behavior. A fundamental issue is the presence of noisy labels in certain clients, which damages the global model's performance. To address this problem, this paper introduces a Federated learning framework with Noisy client Detection and robust Aggregation, FedNDA. In the first stage, FedNDA detects noisy clients by analyzing the distribution of their local losses. A noisy client exhibits a loss distribution distinct from that of clean clients. To handle class imbalance issue in local data, we utilize per-class losses instead of the total loss. We then assign each client a noisiness score, calculated as the Earth Mover’s Distance between the per-class loss distribution of the client and the average distribution of all clean clients. This noisiness metric is more sensitive for detecting noisy clients compared to conventional metrics such as Euclidean distance or L1 norm. The noisiness score is subsequently transfered to and used in the server-side aggregation function to prioritize clean clients while reducing the influence of noisy clients. Experimental results demonstrate that FedNDA outperforms FedAvg and FedNoRo by 4.68% and 3.6% on the CIFAR-10 dataset, and by 10.65% and 0.48% on the ICH dataset, respectively, in a high noisy setting.

Return Loss Optimization in Rectangular Microstrip Patch Antennas Using Response Surface Methodology (RSM) for 5G Applications

Thi Bich Ngoc Tran — 2025-06-12

In recent decades, wireless communication has advanced significantly. People increasingly rely on the Internet of Things, cloud computing, and big data analytics. These services require higher data rates, faster transmission and reception times, greater coverage, and increased throughput. 5G technology supports all of these features. Antennas, essential components of modern wireless devices, must be designed to meet the growing demand for fast and intelligent products. This study aims to optimize the dimensions and characteristics of a rectangular patch antenna. To examine the impact of independent variables (such as patch length, patch width, inset slot length, and inset slot width) on the response variables (return loss and resonant frequency), Response Surface Methodology (RSM) combined with Central Composite Design (CCD) was applied. The findings of the RSM analysis indicated that the experimental data were best represented by a quadratic polynomial model, with regression coefficients exceeding 0.970 for all responses. The optimized parameters identified are as follows: a patch length of 4.7 mm, a patch width of 4.7 mm, an inset slot length of 0.8 mm, and an inset slot width of 1.0 mm. The antenna designed using these optimized parameters achieved a target return loss of -45.865 dB at a frequency of 28.122 GHz. Finally, the results were validated using CST Studio Suite, which demonstrated good agreement with the experimental data.

COSMN: Clustering-Based Optimization for 360-Degree Live Streaming over Mobile Networks

Hung Nguyen Viet — 2025-11-11

The rapid growth of 360-degree video streaming has transformed how users experience immersive content, especially on mobile devices. However, delivering high-quality 360-degree video streams to mobile devices is challenging due to their constrained computational resources, limited bandwidth, and the need for real-time processing. The paper introduces COSMN (Clustering-Based Optimization for 360-Degree Video Streaming over Mobile Networks), an innovative framework to tackle these challenges. COSMN leverages a clustering-based optimization approach to dynamically adapt video streaming to the viewer’s region of interest (ROI), minimizing resource consumption while maintaining high-quality visuals for the most relevant portions of the video. The framework operates by dividing the 360-degree video into multiple tiles and clustering these tiles based on user viewing patterns. By predicting user behavior with clustering algorithms, COSMN efficiently prioritizes bandwidth and processing power for the tiles within the viewer’s ROI. The system also integrates adaptive bitrate streaming techniques to ensure seamless playback under varying network conditions. Experimental results demonstrate that COSMN significantly reduces bandwidth usage and computational load on mobile devices while providing a smooth and immersive viewing experience. Compared to traditional 360-degree online streaming methods, COSMN achieves superior performance in terms of latency, video quality, and resource efficiency. This work paves the way for scalable, 360-degree online streaming solutions on mobile platforms, making immersive video experiences more accessible and practical for everyday users.

A novel approach for graph-based real-time anomaly detection from dynamic network data listened by Wireshark

Muhammed Onur Kaya — 2025-01-07

This paper presents a novel approach for real-time anomaly detection and visualization of dynamic network data using Wireshark, globally's most widely utilized network analysis tool. As the complexity and volume of network data continue to grow, effective anomaly detection has become essential for maintaining network performance and enhancing security. Our method leverages Wireshark’s robust data collection and analysis capabilities to identify anomalies swiftly and accurately. In addition to detection, we introduce innovative visualization techniques that facilitate the intuitive representation of detected anomalies, allowing network administrators to comprehend network conditions and make informed decisions quickly. The results of our study demonstrate significant improvements in both the efficacy of anomaly detection and the practical applicability of visualization tools in real-time scenarios. This research contributes valuable insights into network security and management, highlighting the importance of integrating advanced analytical methods with effective visualization strategies to enhance the overall management of dynamic networks.

QoE-Energy Consumption Optimization for End-User Devices in Adaptive Bitrate Video Streaming Using the Lagrange Multiplier Method

Tien Vu Huu — 2025-04-14

The reduction of greenhouse gas emissions in the Internet and ICT sectors has become a critical challenge. According to recent research, the key contributors to greenhouse gas emissions in Internet include high energy consumption factors such as data centers, transmission network devices, and end-user devices. Among Internet services, video streaming is one of the services having the highest traffic volume and number of users. Consequently, developing energy-efficient solutions for video streaming networks, particularly for end-user devices, is an urgent research priority. Reducing energy consumption in end-user devices in a video streaming system often requires compromises in parameters that impact the quality of user experience (QoE). Therefore, achieving an optimal trade-off between minimizing energy consumption and maintaining an acceptable QoE is a key objective. In this study, a cost function that integrates QoE and energy consumption is developed using the Lagrange multiplier method. Based on this function, an adaptive bitrate algorithm is proposed to select optimal video segments for video players, ensuring maximum QoE while minimizing energy consumption. The performance of the proposed method is evaluated using various types of video samples under varying network bandwidth conditions. Experimental results show that the proposed method reduces energy consumption of end-user devices by up to 6.7% and enhances QoE by 20% compared to previous methods.

Security-Reliability Analysis of NOMA-Assisted Hybrid Satellite-Terrestrial Relay Multi-Cast Transmission Networks Using Fountain Codes and Partial Relay Selection with Presence of Multiple Eavesdroppers

Toan Van Nguyen — 2025-04-11

This article proposes a hybrid satellite-terrestrial relaying network (HSTRN) that integrates physical-layer security (PLS), Fountain codes (FCs), non-orthogonal multiple access (NOMA), and partial relay selection (PRS) to enhance system performance in terms of reliability, data rate, and security. In the proposed system, a satellite uses NOMA to simultaneously transmit Fountain packets to two clusters of terrestrial users. Data transmission is assisted by one of the terrestrial relay stations, selected by the PRS algorithm. We derive exact expressions for outage probability (OP) and system outage probability (SOP) at the legitimate users, as well as intercept probability (IP) and system intercept probability (SIP) at eavesdroppers. Monte Carlo simulations are realized to validate the accuracy of the analytical results, illustrate performance trends, and analyze the impact of key parameters on the considered performance.

Secure UAV-assisted Mobile Edge Computing for IoT with Backscatter Communication in the Presence of a Moving Eavesdropper

Van Long Nguyen — 2025-10-23

The perception layer security (PLS) is crucial for ensuring that the data collected by Internet of Things (IoT) devices is accurate, reliable, and protected against various security threats. It helps maintain the overall integrity of the IoT ecosystem and builds trust in its applications. Our work explores the integration of network and PLS in a UAV-enabled mobile edge computing (MEC) system for IoT. This system supports multiple users with a combined non-orthogonal and time-division multiple access (NOTDMA) scheme and is based on backscatter communication (BC). In this system, the UAV-mounted server functions as a hybrid access point (HAP) and hovers over a cluster of energy-constrained IoT devices to transmit RF energy and assist them in performing tasks by employing BC. The IoT devices apply the combined NOTDMA scheme to offload their tasks to the HAP. A mobile passive eavesdropper attempts to intercept information from IoT devices without actively launching any attacks. A partial offloading scheme with various encryption algorithms is proposed to improve the system’s secrecy, which adapts to the users’ non-linear harvested energy levels. In addition, considering the network and physical security, we derive a approximation expression for the secrecy successful computation probability (SSCP). This expression incorporates factors such as harvested energy, local computing and encryption latency, edge offloading latency, processing, decryption, and the associated secrecy costs. The optimization problem for maximizing SSCP is formulated and solved using an Immune algorithm to find the optimal set of device parameters and UAV altitude. Key parameters affecting secrecy and latency performance are analyzed to better understand the system’s behavior. Numerical simulations are provided to validate the accuracy of our analysis.

Integrated Cloud-Twin Synchronization for Supply Chain 5.0

Divya Sasi Latha — 2025-03-12

The digital twin is thus emerging means of improving real-world performance from virtual spaces, especially relatedto Supply Chain 5.0 in Industry 5.0. This framework employs the integration of cloud computing and digital twin technologies to secure data storage, trusted tracking, and high reliability, is architectural for the integration of supply-chain sustainable enterprises. In this work, we introduce a high level architecture of cloud-based digital twin model for supply chain 5.0 , which was created to align the system of supply chain through real-time observation as well as real-timesupply chain 5.0 decision-making and control. This study introduces a cloud-based twin optimization model for Supply Chain 5.0, validated through genetic algorithm (GA) simulations. The model determines optimal weights to balance objectives, achieving an optimal objective function value that reflects trade-offs among operational efficiency, cost, and sustainability. A convergence plot illustrates the model’s iterative solution improvements, demonstrating its dynamic adaptability. Lastly, the proposed model defines and test a supply chain performance analysis through dynamic simulations.

Single-level Discrete Two Dimensional Wavelet Transform Based Multiscale Deep Learning Framework for Two-Wheeler Helmet Detection

Amrutha Annadurai — 2025-03-11

INTRODUCTION: A robust method is proposed in this paper to detect helmet usage in two-wheeler riders to enhance road safety.

OBJECTIVES: This involves a custom made dataset that contains 1000 images captured under diverse real-world scenarios, including variations in helmet size, colour, and lighting conditions. This dataset has two classes namely with helmet and without helmet.

METHODS: The proposed helmet classification approach utilizes the Multi-Scale Deep Convolutional Neural Network (CNN) framework cascaded with Long Short-Term Memory (LSTM) network. Initially the Multi-Scale Deep CNN extracts modes by applying Single-level Discrete 2D Wavelet Transform (dwt2) to decompose the original images. In particular, four different modes are used for segmenting a single image namely approximation, horizontal detail, vertical detail and diagonal detail. After feeding the segmented images into a Multi-Scale Deep CNN model, it is cascaded with an LSTM network.

RESULTS: The proposed model achieved accuracies of 99.20% and 95.99% using both 5-Fold Cross-Validation (CV) and Hold-out CV methods, respectively.

CONCLUSION: This result was better than the CNN-LSTM, dwt2-LSTM and a tailor made CNN model.

Enhancing AI-Inspired Analog Circuit Design: Optimizing Component Sizes with the Firefly Algorithm and Binary Firefly Algorithm

Trang Hoang — 2025-01-08

This paper explores the use of the Firefly Algorithm (FA) and its binary variant (BFA) in optimizing analog circuit component sizing, specifically as a case study for a two-stage operational amplifier (op-amp) designed with a 65nm CMOS process. Recognizing the limitations of traditional optimization approaches in handling complex analog design requirements, this study implements both FA and BFA to enhance convergence speed and accuracy within multi-dimensional search spaces. The Python-Spectre framework in this paper
facilitates automatic, iterative simulation and data collection, driving the optimization process. Through extensive benchmarking, the BFA outperformed traditional FA, balancing exploration and exploitation while achieving superior design outcomes across key parameters such as voltage gain, phase margin, and unity-gain bandwidth. Comparative analysis with existing optimization methods, including Particle Swarm Optimization (PSO) and Genetic Algorithm (GA), underscores the efficiency and accuracy of BFA in optimizing circuit metrics, particularly in power-constrained environments. This study demonstrates the potential of swarm intelligence in advancing automatic analog design and establishes a foundation for future enhancements in analog circuit automation.

Coverage Probability of EH-enabled LoRa networks - A Deep Learning Approach

Thi-Tuyet-Hai Nguyen — 2024-12-05

The performance of energy harvesting (EH)-enabled long-range (LoRa) networks is analyzed in this work. Specifically, we employ deep learning (DL) to estimate the coverage probability (Pcov) of the considered networks. Our study incorporates a general fading distribution, specifically the Nakagami-m distribution, and utilizes tools from stochastic geometry (SG) to model the spatial distributions of all nodes and end-devices (EDs) with EH capability. The DL approach is employed to overcome the limitations of model-based methods that can only evaluate the Pcov under simplified network conditions. Therefore, we propose a deep neural network (DNN) that estimates the Pcov with high accuracy compared to the ground truth values. Additionally, we demonstrate that DL significantly outperforms the Monte Carlo simulation approach in terms of resource consumption, including time and memory.

Joint Adaptive Modulation and Power Control Scheme for Energy Efficient FSO-based Non-Terrestrial Networks

Thang V. Nguyen — 2024-12-03

Free-space optics (FSO)-based non-terrestrial networks (NTN) have garnered significant attention as a potential technology for forthcoming 6G wireless communications due to their exceptional data rate and extensive global coverage capability. Nevertheless, atmospheric attenuation, cloud attenuation, geometric loss, and atmospheric turbulence present numerous difficulties in developing these networks. To cope with these difficulties, we propose to apply a joint adaptive modulation and power control (JAMPC) scheme to FSO-based NTN. Our proposed JAMPC algorithm aims to enhance energy efficiency while guaranteeing the targeted outage probability, bit-error rate, and the required data rate. We develop mathematical models and derive closed-form expressions to implement the proposed algorithm and solve the optimization problem. The numerical results confirm that the JAMPC scheme helps NTN provide better energy efficiency and the ability to adapt to various channel conditions.

Predicting the Severity of COVID-19 Pneumonia from Chest X-Ray Images: A Convolutional Neural Network Approach

Thien B. Nguyen-Tat — 2024-11-25

This study addresses significant limitations of previous works based on the Brixia and COVIDGR datasets, which primarily provided qualitative lung injury scores and focused mainly on detecting mild and moderate cases. To bridge these critical gaps, we developed a unified and comprehensive analytical framework that accurately assesses COVID-19-induced lung injuries across four levels: Normal, Mild, Moderate, and Severe. This approach’s core is a meticulously curated, balanced dataset comprising 9,294 high-quality chest X-ray images. Notably, this dataset has been made widely available to the research community, fostering collaborative efforts and enhancing the precision of lung injury classification at all severity levels. To validate the framework’s effectiveness, we conducted an in-depth evaluation using advanced deep learning models, including VGG16, RegNet, DenseNet, MobileNet, EfficientNet, and Vision Transformer (ViT), on this dataset. The top-performing model was further enhanced by optimizing additional fully connected layers and adjusting weights, achieving an outstanding sensitivity of 94.38%. These results affirm the accuracy and reliability of the proposed solution and demonstrate its potential for broad application in clinical practice. Our study represents a significant step forward in developing AI-powered diagnostic tools, contributing to the timely and precise diagnosis of COVID-19 cases. Furthermore, our dataset and methodological framework hold the potential to serve as a foundation for future research, paving the way for advancements in the detection and classification of respiratory diseases with higher accuracy and efficiency.

An Efficient Method for BLE Indoor Localization Using Signal Fingerprint

Trong-Thanh Han — 2024-11-22

The rise of Bluetooth Low Energy (BLE) technology has opened new possibilities for indoor localization systems. However, extracting fingerprint features from the Received Signal Strength Indicator (RSSI) of BLE signals often encounters challenges due to significant errors and fluctuations. This research proposes an approach that integrates signal filtering and deep learning techniques to improve accuracy and stability. A Kalman filter is employed to smooth the RSSI values, while Autoencoder and Convolutional Autoencoder models are utilized to extract distinctive fingerprint features. The system compares random test points with a reference database using normalized cross-correlation. Performance is assessed based on metrics such as the number of reference points with the highest cross-correlation (), average localization error, and other statistical indicators. Experimental results show that the combination of the Kalman filter with the Convolutional Autoencoder model achieves an average error of 0.98 meters with . These findings indicate that this approach effectively reduces signal noise and enhances localization accuracy in indoor environments.

Drug classification system based on drug composition and usage instructions

Hoang-Dieu Vu — 2024-11-07

This study presents a natural language processing (NLP) approach to classify drugs based on compositional and usage descriptions. NLP techniques including text preprocessing, word embedding, and deep learning models were applied to a Vietnamese drug dataset. Traditional machine learning models like Support Vector Machines (SVM) and deep models including Bidirectional Long Short-Term Memory (BiLSTM) and PhoBERT were evaluated. Besides, since there is a limitation in the information of our own collected data, some data augmentation techniques were applied to increase the variation of the dataset. Results show PhoBERT achieving 95% accuracy, highlighting the benefits of transferring knowledge from large language models. Errors primarily occurred between similar drug categories, suggesting taxonomy refinement could improve performance. In summary, an automated drug classification framework was developed leveraging state-of- the-art NLP, validating the feasibility of analyzing drug data at scale and aiding therapeutic understanding. This supports NLP’s potential in pharmacovigilance applications.

Transformer Based Ship Detector: An Improvement on Feature Map and Tiny Training Set

Duc-Dat Ngo — 2024-11-06

The exponential increment of commodity exchange has raised the need for maritime border security in recent years. One of the most critical tasks for naval border security is ship detection inside and outside the territorial sea. Conventionally, the task requires a substantial human workload. Fortunately, with the rapid growth of the digital camera and deep-learning technique, computer programs can handle object detection tasks well enough to replace human labor. Therefore, this paper studies how to apply recent state-of-the-art deep-learning networks to the ship detection task. We found that with a suitable number of object queries, the Deformable-DETR method will improve the performance compared to the state-of-the-art ship detector. Moreover, comprehensive experiments on different scale datasets prove that the technique can significantly improve the results when the training sample is limited. Last but not least, feature maps given by the method will focus well on key objects in the image.

A Secure Cooperative Image Super-Resolution Transmission with Decode-and-Forward Relaying over Rayleigh Fading Channels

Hien-Thuan Duong — 2024-09-02

In addition to susceptibility to performance degradation due to hardware malfunctions and environmental influences, wireless image transmission poses risks of information exposure to eavesdroppers. This paper delves into the image communications within wireless relay networks (WRNs) and proposes a secure cooperative relaying (SCR) protocol over Rayleigh fading channels. In this protocol, a source node (referred to as Alice) transmits superior-resolution (SR) images to a destination node (referred to as Bob) with the assistance of a mediating node (referred to as Relay) operating in decode-and-forward mode, all while contending with the presence of an eavesdropper (referred to as Eve). In order to conserve transmission bandwidth, Alice firstly reduces the size of the original SR images before transmitting them to Relay and Bob. Subsequently, random linear network coding (RLNC) is employed by both Alice and Relay on the downscaled poor-resolution (PR) images to obscure the original images from Eve, thereby bolstering the security of the image communications. Simulation results demonstrate that the proposed SCR protocol surpasses both secure relaying transmission without a direct link and secure direct transmission without relaying links.

Additionally, a slight reduction in image quality can be achieved by increasing the scaling factor for saving transmission bandwidth. Furthermore, the results highlight the SCR protocol’s superior effectiveness at Bob’s end when compared to Eve’s, which is due to Eve’s lack of access to the RLNC coefficient matrices and reference images utilised by Alice and Relay in the RLNC process. Finally, the evaluation of reference images, relay allocations and diversity reception over Rayleigh fading channels confirms the effectiveness of the SCR protocol for secure image communications in the WRNs.

Emotional Inference from Speech Signals Informed by Multiple Stream DNNs Based Non-Local Attention Mechanism

Manh-Hung Ha — 2024-08-02

It is difficult to determine whether a person is depressed due to the symptoms of depression not being apparent. However, the voice can be one of the ways in which we can acknowledge signs of depression. Understanding human emotions in natural language plays a crucial role for intelligent and sophisticated applications. This study proposes deep learning architecture to recognize the emotions of the speaker via audio signals, which can help diagnose patients who are depressed or prone to depression, so that treatment and prevention can be started as soon as possible. Specifically, Mel-frequency cepstral coefficients (MFCC) and Short Time Fourier Transform (STFT) are adopted to extract features from the audio signal. The multiple streams of the proposed DNNs model, including CNN-LSTM based on an attention mechanism, are discussed within this research. Leveraging a pretrained model, the proposed experimental results yield an accuracy rate of 93.2% on the EmoDB dataset. Further optimization remains a potential avenue for future development. It is hoped that this research will contribute to potential application in the fields of medical treatment and personal well-being.

Bi-objective model for community detection in weighted complex networks

Gilberto Sinuhe Torres-Cockrell — 2024-08-02

In this study, we introduce an innovative approach that utilizes complex networks and the k_core method to address community detection in weighted networks. Our proposed bi-objective model aims to simultaneously discover non-overlapping communities while ensuring that the degree of similarity remains below a critical threshold to prevent network degradation. We leverage the k_core structure to detect tightly interconnected node groups, a concept particularly valuable in edge-weighted networks where different edge weights indicate the strength or importance of node relationships. Beyond maximizing the count of k_core communities, our model seeks a homogeneous weight distribution across edges within these communities, promoting stronger cohesion. To tackle this challenge, we implement two multi-target algorithms: Non-dominated Sorting Genetic Algorithm II (NSGAII) and a Multi-Objective Simulated Annealing (MOSA) algorithm. Both algorithms efficiently identify non-overlapping communities with a specified degree 'k'. The results of our experiments reveal a trade-off between maximizing the number of k_core communities and enhancing the homogeneity of these communities in terms of their minimum weighted interconnections. Notably, the MOSA algorithm outperforms NSGAII in both small and large instances, demonstrating its effectiveness in achieving this balance. This approach sheds light on effective strategies for resolving conflicting goals in community detection within weighted networks.

Efficient LDPC Code Design based on Genetic Algorithm for IoT Applications

Thanh-Loc Nguyen-Van — 2024-08-01

In this paper, we propose a low-density parity check (LDPC) code design scheme that improves the performance of the existing genetic algorithm-based LDPC scheme. In particular, we enhance the performance of the LDPC code by removing the girth-4 property of the parity check matrix and utilizing the min-sum decoding algorithm instead of the belief propagation decoding algorithm. In addition, we consider different short block-length scenarios, including 64-bit and 128-bit block length. Then, we evaluate the block error rate (BLER) of the LDPC code over the binary input additive white Gaussian noise (BI-AWGN) channel. Finally, extensive simulation results indicate that our proposed approach achieves more than 11% gain in terms of BLER compared with the benchmarked schemes.

ViMedNER: A Medical Named Entity Recognition Dataset for Vietnamese

Pham Van Duong — 2024-07-11

Named entity recognition (NER) is one of the most important tasks in natural language processing, which identifies entity boundaries and classifies them into pre-defined categories. In literature, NER systems have been developed for various languages but limited works have been conducted for Vietnamese. This mainly comes from the limitation of available and high-quality annotated data, especially for specific domains such as medicine and healthcare. In this paper, we introduce a new medical NER dataset, named ViMedNER, for recognizing Vietnamese medical entities. Unlike existing works designed for common or too-specific entities, we focus on entity types that can be used in common diagnostic and treatment scenarios, including disease names, the symptoms of the diseases, the cause of the diseases, the diagnostic, and the treatment. These entities facilitate the diagnosis and treatment of doctors for common diseases. Our dataset is collected from four well-known Vietnamese websites that are professional in terms of drag selling and disease diagnostics and annotated by domain experts with high agreement scores. To create benchmark results, strong NER baselines based on pre-trained language models including PhoBERT, XLM-R, ViDeBERTa, ViPubMedDeBERTa, and ViHealthBERT are implemented and evaluated on the dataset. Experiment results show that the performance of XLM-R is consistently better than that of the other pre-trained language models. Furthermore, additional experiments are conducted to explore the behavior of the baselines and the characteristics of our dataset.

ERKT-Net: Implementing Efficient and Robust Knowledge Distillation for Remote Sensing Image Classification

Huaxiang Song — 2024-07-03

The classification of Remote Sensing Images (RSIs) poses a significant challenge due to the presence of clustered ground objects and noisy backgrounds. While many approaches rely on scaling models to enhance accuracy, the deployment of RSI classifiers often requires substantial computational and storage resources, thus necessitating the use of lightweight algorithms. In this paper, we present an efficient and robust knowledge transfer network named ERKT-Net, which is designed to provide a lightweight yet accurate Convolutional Neural Network (CNN) classifier. This method utilizes innovative yet simple concepts to better accommodate the inherent nature of RSIs, thereby significantly improving the efficiency and robustness of traditional Knowledge Distillation (KD) techniques developed on ImageNet-1K. We evaluated ERKT-Net on three benchmark RSI datasets and found that it demonstrated superior accuracy and a very compact volume compared to 40 other advanced methods published between 2020 and 2023. On the most challenging NWPU45 dataset, ERKT-Net outperformed other KD-based methods with a maximum Overall Accuracy (OA) value of 22.4%. Using the same criterion, it also surpassed the first-ranked multi-model method with a minimum OA value of 0.7 but presented at least an 82% reduction in parameters. Furthermore, ablation experiments indicated that our training approach has significantly improved the efficiency and robustness of classic DA techniques. Notably, it can reduce the time expenditure in the distillation phase by at least 80%, with a slight sacrifice in accuracy. This study confirmed that a logit-based KD technique can be more efficient and effective in developing lightweight yet accurate classifiers, especially when the method is tailored to the inherent characteristics of RSIs.