Real Time Lung Cancer Classification with YOLOv5

Cancer must be appropriately categorized for effective diagnosis and treatment. Deep learning algorithms have shown tremendous promise in recent years for automating cancer classification. We used the deep learning system YOLOv5 to classify the four types of lung cancer in this study: big cell carcinoma, adenocarcinoma, normal lung tissue, and squamous cell carcinoma. We trained the YOLOv5 model using a publicly available database of lung cancer pictures. The dataset was divided into four categories: big cell carcinoma, adenocarcinoma, normal lung tissue, and squamous cell cancer. In addition, we compared YOLOv5's performance to older models such as SVM, RF, ANN, and CNN. The comparison found that YOLOv5 outperformed all these models, indicating its potential for the development of more accurate and efficient autonomous cancer classification systems. Conclusions from the research have important implications for cancer identification and therapy. Automatic cancer classification systems have the potential to increase the accuracy and efficacy of cancer detection, perhaps leading to better patient outcomes. The accuracy and speed of these systems can be enhanced by using deep learning techniques like YOLOv5, making them more effective for clinical applications. Our study's findings demonstrated high accuracy for every class, with a total accuracy of 97.77%. With the aid of accuracy, train loss, and test loss graphs, we assessed the model's performance. The graphs demonstrated how the model was able to gain knowledge from the data and increase its accuracy as it was being trained. The study's findings were also compiled in a table that gave a thorough assessment of each class's accuracy.


Introduction
Cancer is a collection of diseases that are distinguished by uncontrollable cell proliferation and spread throughout the body.If left untreated, these cells can invade and destroy normal tissues and organs, causing serious health problems and even death.Lung cancer is both one of the most common and one of the worst types of cancer.It happens when abnormal cells in the lungs grow and reproduce uncontrollably, forming a tumor that can infiltrate nearby tissues and spread to other parts of the body.Lung cancer is usually fatal because it is detected late, after the cancer has spread beyond the lungs, making treatment more difficult.
Cancer is a broad and complex term that refers to a collection of diseases that are caused by the uncontrolled growth and spread of abnormal cells in the body.Cancer cells are aberrant cells that could penetrate and destroy normal tissues and organs.
It develops when genetic abnormalities arise within normal cells, causing the processes that control cell growth and division to malfunction.These mutations can be inherited or acquired during a person's lifespan because of a variety of circumstances such as carcinogen exposure (e.g., tobacco smoke, some chemicals, radiation), chronic inflammation, or certain infections.
Because of its aggressive nature, late-stage identification, and high mortality rate, lung cancer is one of the leading causes of cancer-related deaths globally.The importance of early and correct diagnosis in improving patient outcomes and boosting survival rates cannot be overstated.Computer vision and deep learning techniques have increasingly been applied to medical image analysis in recent years, giving a potent tool for detecting and categorizing lung cancer.
Breast cancer, lung cancer, colorectal cancer, prostate cancer, and many other forms of cancer exist.Each type of cancer behaves differently and necessitates different therapies.Cancer is highly lethal for various reasons: 1. Cancer cells develop and grow at an uncontrolled rate, creating tumors that can disrupt the normal functioning of important organs and tissues.
2. Metastasis occurs when cancer cells break out from the primary tumor and spread to other parts of the body via the bloodstream or lymphatic system.This process, known as metastasis, permits cancer to grow new tumors in different organs, making treatment and control more difficult.
3. Cancer cells can acquire resistance to treatments such as chemotherapy, radiation, or targeted medications.This resistance can be attributed to genetic alterations inside cancer cells or to cancer cells' ability to adapt and survive in adverse environments.
4. Impact on organ function: Depending on the location and extent of the malignancy, it might affect normal organ function, resulting in serious consequences and organ failure.
5. Diagnosis at an advanced stage: Cancer may not create apparent symptoms in its early stages, making it difficult to identify and diagnose until it has developed to an advanced stage in some circumstances.Cancer is often more difficult to treat successfully at late stages.
Cancer detection is critical for several reasons: 1. Early detection of cancer raises the likelihood of successful therapy and improves overall patient outcomes.Early discovery provides for less intrusive treatment alternatives, improved organ function preservation, and a higher chance of complete remission or cure.
2. Improved survival rates: Cancer is frequently more curable and associated with improved survival rates when found early.Regular screenings and early detection programs have been shown to reduce cancer-related fatalities by detecting cancers at an earlier stage when they are more treatable.
3. Early detection can assist healthcare providers in developing individualized treatment programs for patients depending on the specific characteristics of their cancer.This may include identifying the most appropriate therapy, adjusting dosage, and using targeted treatments depending on the cancer's molecular profile.
Despite advancements in treatment options, lung cancer continues to be a serious public health issue and the top cause of cancer-related fatalities globally.Lung cancer is a deadly disease that can be fatal.It is one of the most common and lethal cancers on the globe.The appearance of tiny growths in the lungs known as pulmonary nodules is one of the early indicators of lung cancer.These nodules can be found with low dose computed tomography (CT).However, because there are so many CT scans to go through and the nodules can be difficult to see, doctors may miss some of these nodules in practice.
Researchers are creating computer programs known as computer-aided diagnostic (CAD) systems to assist clinicians diagnose lung cancer more accurately.These tools can analyze CT scans and identify any nodules missed by the radiologist.Using a CAD system as an additional opinion can significantly improve the accuracy of a lung cancer diagnosis.In recent years, artificial intelligence has captivated the attention of society, arousing interest in its potential to improve our lives.[1] Artificial intelligence (AI) has the potential to play a crucial role in cancer detection by analyzing massive volumes of data and identifying patterns that human specialists may find difficult to perceive.Here is some ways AI can help in cancer detection: In this study, we present a lung cancer classification method that employs the cutting-edge object identification algorithm, YOLOv5, to accurately classify lung cancer into four types: adenocarcinoma, big cell carcinoma, normal, and squamous cell carcinoma.The suggested method uses lung computed tomography (CT) scans as input and categorizes them into one of four groups.The effectiveness of the proposed system is assessed using a publicly accessible dataset of lung cancer cases, and the results show that our suggested method surpasses current state-of-the-art techniques and achieves high accuracy.
The suggested method may influence clinical practice by making it easier for patients with lung cancer to receive an early diagnosis and a tailored course of therapy, thereby improving patient outcomes and lowering mortality rates.The rest of this essay is structured as follows.The working theory and research methodology are thoroughly described in Section 3. The experimental results are provided and contrasted with the procedures that were used in Section 4. The work's conclusion and discussion are provided in Section 5. We used the most recent statistical data available for the year 2023 in our present analysis [2].We hope to obtain significant insights and undertake numerous studies on various elements of the data by relying on the most recent information.This all-encompassing approach allows us to investigate patterns, trends, and relationships within the dataset, allowing us to make informed interpretations and draw significant conclusions.We hope that by completing these analyses, we might contribute to a better understanding of cancer-related issues such as estimated new cases and fatalities across various cancer locations, as well as potential gender variations within these numbers.
Our goal is to provide useful information that can help in tackling cancer's issues and promoting further improvements in prevention, diagnosis, and treatment.
(Table -1 Several cancer sites can be evaluated in terms of genderspecific statistics using the data given.There were an estimated 54,540 new instances of oral cavity and pharynx cancer, with 11,580 fatalities.Males accounted for 39,290 new cases and 8,140 fatalities, while females accounted for 15,250 new cases and 3,440 deaths.Similarly, the predicted new cases for the digestive system were 348,840, with 172,010 deaths overall.Males were responsible for 194,980 new cases and 99,350 deaths, while females were responsible for 153,860 new cases and 72,660 deaths.There were 153,020 new instances of colon and rectal cancer, with 52,550 deaths.Males were responsible for 81,860 new cases and 28,470 fatalities, while females were responsible for 71,160 new cases and 24,080 deaths.
We determined the cancer sites with the highest estimated deaths and new cases in our analysis of the most recent statistical data for the year 2023, considering the gender breakdown for each category.The digestive system appears as the most prevalent cancer site with the highest estimated deaths, accounting for a significant number of fatalities.It kills both men and women, killing a total of 172,010 people.Furthermore, the respiratory system is a significant contribution to cancer-related mortality, accounting for 132,330 fatalities in both sexes.Males account for 71,170 deaths in this group, while females account for 61,160 deaths.In terms of estimated mortality, the genitourinary system, comprising malignancies of the reproductive organs, has a notable presence, accounting for 69,660 deaths.Males die here in 35,640 cases, while females die in 34,020 cases.A gender-based description for the cancer sites you provided: 1

LITERATURE SURVEY
As a result, developing CAD systems for pulmonary nodules is critical for improving lung cancer detection and therapy.AI is now being used extensively to improve disease identification, management, and the efficacy of medications.Because of the expanding number of cancer patients and the vast amount of data collected throughout the treatment process [3].As a result, AI is required to improve oncologic care.Cancer prognosis can reduce mortality [4].
Apsari et.al.[5] uses artificial neural networks (ANNs) to classify lung nodules into malignant and benign categories.According to the results, the suggested approach has a high accuracy of 80% in categorizing lung nodules as malignant or benign.
Rehman et.al.[6] suggested a pipeline to categorize lung nodules as malignant or benign, four machine learning methods are used: k-nearest neighbors (KNN), support vector machines (SVM), random forests (RF), and artificial neural networks (ANN).The proposed pipeline attained an accuracy of 91.5% for classifying the nodules, according to the data.The SVM classifier achieved the highest accuracy of 93.3%, followed by the RF classifier at 92.3%.
Shaukat et.al.[7] provides a method for classifying lung nodules in CT images as malignant or benign based on intensity, shape, and texture aspects using artificial neural networks (ANNs).The classifier's performance is measured using measures such as accuracy, specificity, sensitivity, and area under the curve (AUC), and it achieves a sensitivity of 95.5%.
Nageswaran et.al.[8] suggested a model for the classification and prediction of lung cancer using machine learning and image processing techniques.The suggested approach was evaluated on a dataset of lung cancer images, and the results showed that the SVM algorithm with the chosen characteristics had the highest accuracy of 98.31% in classifying the images as malignant or benign.
Yasriy et.al.[9] focuses on the use of Convolutional Neural Networks (CNN) for lung cancer diagnosis using CT scans.The suggested method was tested on a dataset of lung cancer CT scans, with the CNN achieving an accuracy of 93.3% in categorizing the CT scans as benign or malignant.The CNN also had a sensitivity of 90% and a specificity of 96.7%, indicating that it could correctly identify both benign and malignant instances.
Kuruvilla et.al.[10] present a computer-aided method for categorizing lung cancer based on CT images, which is critical for increasing a patient's chances of survival.For categorization, the skewness parameter has been proven to be the most accurate.The research presents two novel training functions for the backpropagation neural network, and the results demonstrate that the first proposed function achieves 93.3% accuracy and 91.4% sensitivity.
Lovneet et.al.[11] aim to create an automated approach for detecting lung cancer early, which will improve patient outcomes and lower mortality rates.The study presents the findings of tests carried out to assess the performance of the proposed approach.The researchers used a dataset of CT scan images from people with and without lung cancer in the studies.The results demonstrate that the suggested method was highly accurate in detecting lung cancer, with a sensitivity of 92.3% and a specificity of 97.8%.To improve the precision of lung cancer detection, Gordienko et al. [15] offer a method that combines lung segmentation and bone shadow exclusion approaches.The authors tested their proposed methodology using a dataset of 2500 chest X-ray pictures from individuals with and without lung cancer.The data show that the proposed strategy exceeded earlier methods in terms of lung cancer detection accuracy, obtaining an 88.9% success rate.Furthermore, the authors conducted a sensitivity analysis, which proved that the proposed technique is robust to multiple hyperparameters.
According to Nasrullah et al. [16], a deep learning model based on customized mixed link network (CMixNet) architectures, along with clinical criteria for nodule detection, can reduce false-positive rates and misdiagnosis in the early stages of lung cancer.It was shown to be more sensitive and specific.The suggested strategy was tested on LIDC-IDRI datasets for specificity (91%) and sensitivity (94%).
Wu et.al.[17] the proposed method employs a deep residual network (ResNet) architecture as well as a technique for transferring knowledge from a previously trained model to a new project.With a classification accuracy of 93.44%, the proposed method surpassed multiple existing state-of-the-art methods.
Park et al. [18] offer a two-stage technique that combines a 2D deep convolutional neural network (CNN) with a 3D U-Net network to improve the precision of lung cancer segmentation.The scientists used a dataset of 90 PET/CT images of people with lung cancer to evaluate their proposed method.The data show that the proposed approach, which had a mean Dice similarity coefficient of 0.66, accurately segregated lung cancer.The authors also compared their findings to those obtained using different methodologies, demonstrating that their suggested approach outperforms them.
Yanjie et al. [19] investigate the use of machine learning methods to differentiate benign from cancerous lung nodules discovered using computed tomography (CT) scans.This research aims of this research is to identify the most important features from CT scans that may be used to build an accurate classifier using a support vector machine (SVM).According to the study's findings, the SVM-based classifier had an overall accuracy of 89.8%, sensitivity of 91.7%, and specificity of 87.5%.The AUC-ROC value was 0.936.
Roy et al. [20] suggested a three-stage fuzzy inference method that included picture pre-processing, feature extraction, and classification.To increase image quality, the authors apply contrast enhancement and morphological treatments to lung pictures during the image pre-processing step.The authors extract 11 features from each image during the feature extraction stage, including texture and shape features.These features are then fed into the fuzzy inference system, which determines if the image is normal or abnormal and detects the existence of nodules.The scientists then tested their algorithm on a lung imaging dataset and compared the results to a typical CAD system.The results show that the fuzzy inference system achieves 94% accuracy.

DATASET:
The accuracy and effectiveness of medical imaging analysis for disease diagnosis and treatment have significantly increased because of the development of machine learning and deep learning techniques.One example of this is the detection of lung cancer, where precise and effective classification of lung nodules can dramatically enhance patient outcomes.In this study, we make use of a Kaggle dataset that is freely available to the public and contains CT scans of lung nodules divided into four categories: adenocarcinoma, large cell carcinoma, normal, and squamous cell carcinoma.The four categories indicate various forms of lung cancer, each with special traits and available therapies.The most frequent kind of lung cancer is adenocarcinoma, while big-cell carcinoma is less common but more dangerous.The lining of the lungs' airways is where squamous cell carcinoma first appears, as opposed to normal.3. Large cell Carcinoma: Large cell carcinoma is a less common type of lung cancer than adenocarcinoma and squamous cell carcinoma.It may be more difficult to treat than other types of lung cancer due to its propensity for rapid growth and spread.The development of large cell carcinoma can occur anywhere in the lung, and because it lacks the same distinctive features as other forms of lung cancer, it can be challenging to identify.

Small cell lung cancer:
The most dangerous kind of lung cancer, small cell lung cancer grows and spreads swiftly.Smoking is typically a risk factor for small cell lung cancer, which typically appears in the middle of the lung, close to the bronchus.Shortness of breath, chest pain, and coughing are possible symptoms.
Although radiation and chemotherapy may work well to treat small cell lung cancer, the disease frequently returns after treatment.
Lung cancer is a common and fatal illness that has several subtypes with distinct characteristics.For efficient planning of treatment and patient management, accurate identification and classification of lung tumor subtypes is critical.This study intends to fill that gap by using image classification algorithms to differentiate four major lung cancer subtypes: adenocarcinoma, big cell carcinoma, normal lung tissue, and squamous cell carcinoma.Table -2 gives the entire break-up for the dataset division.EAI Endorsed Transactions on Pervasive Health and Technology launched in 2020.Ultralytics created YOLOv5, which marks a considerable advance in terms of accuracy and speed over its predecessors.One of the most well-known object identification networks in the world, YOLOv5, now has more than just object detection up its sleeve.YOLOv5 now supports classification jobs as of August 2022.

EXPERIMENTAL SETUP:
Table -3  This well-liked computer vision model for object detection and categorization, for its classification model, defines the class names using a folder structure.This implies that a class is given to every image in the dataset according to the folder it is kept in.Although this strategy might seem unorthodox, it has several benefits.Because the folder structure clearly defines the class labels, for instance, managing the dataset is made simpler.

Table-3: Experimental Set-Up
Additionally, by using this method, the likelihood of errors when manually labeling bounding boxes can be decreased.Figure -6 shows the folder structure for the classification model.(eq-1) where: N is the number of training samples y is the ground truth label (0 or 1) y_hat is the predicted probability of the positive class (ranging from 0 to 1)

Test Loss:
Test loss, also known as validation loss or evaluation loss, measures how well the trained model generalizes to unseen data.It is calculated using the same loss function as the training loss but applied to the test/validation dataset.The formula for test loss is like the training loss formula: (eq-2) where: N is the number of test/validation samples y is the ground truth label (0 or 1) y_hat is the predicted probability of the positive class (ranging from 0 to 1) This formula simply divides the number of correctly predicted samples by the total number of samples in the dataset.
The test loss and train loss are significant metrics for assessing a machine learning model's efficiency and performance.They reveal how well the model is learning and generalizing from training data to unknown test data.
Here's how these measures help to describe the model's efficiency: 1

RESULT ANALYSIS
We used the cutting-edge categorization method YOLOv5 to divide cancer into four classes large cell carcinoma, adenocarcinoma, normal lung tissue, and squamous cell carcinoma Our goal was to assess YOLOv5's performance on this task and contrast it with earlier models.We then tested the model on a separate test data set and with an overall accuracy of 97.77%, our results demonstrated that the model was able to attain excellent accuracy for each class.We compiled the findings into a table (Table-4), which gave us the accuracy for each class, as well as the overall accuracy for the model, to further assess the model's performance.The table demonstrated that the model had excellent accuracy for each of the four types of cancer, demonstrating that the model was capable of properly classifying each class.

Table-4: Results
In addition, we evaluated our model against earlier models, such as SVM, RF, ANN, and CNN.Our findings demonstrated that YOLOv5 performed better than each of these models, demonstrating its superiority for the categorization of cancer.(Figure -10, Table-5).
The accuracy findings show the varying effectiveness of the machine learning approaches investigated for

Conclusion
The objective of this study was to explore the potential of the deep learning algorithm YOLOv5 in distinguishing between four types of lung cancer: adenocarcinoma, big cell carcinoma, normal, and squamous cell carcinoma.The outcomes of our investigation yielded positive results, as we achieved exceptional accuracy in each class and an overall accuracy rate of 97.77%.
To evaluate the performance of the model, we employed various measures including train loss, test loss, accuracy graphs, and summary tables.These metrics provided a comprehensive assessment of the model's performance and demonstrated that YOLOv5 is a valuable tool for automated cancer classification.Additionally, we compared the performance of YOLOv5 with earlier models such as SVM, RF, ANN, and CNN, revealing the superior capabilities of deep learning algorithms like YOLOv5 in creating more precise and effective automated cancer classification systems.However, it is crucial to acknowledge the limitations of our research.One significant limitation is the size of the dataset used for training and testing the model, which may not be fully representative of all lung cancer cases.This limitation highlights the need for further research with larger and more diverse datasets to thoroughly analyze the model's performance.
Despite these limitations, our research contributes to the expanding body of knowledge on the application of deep learning algorithms in cancer detection and treatment.The findings of this study hold important implications for the development of more accurate and effective automated cancer categorization systems, which have the potential to significantly improve patient outcomes.
Moving forward, it is essential to conduct additional studies that address the limitations identified in this research.Expanding the dataset to include a broader range of lung cancer cases, considering various demographics and pathological characteristics, would enhance the generalizability of the model.Furthermore, investigating the performance of YOLOv5 in real-world clinical settings and comparing it with other existing techniques can provide valuable insights for its practical implementation.
In conclusion, our study demonstrates the promising potential of YOLOv5 as a deep learning algorithm for distinguishing between different types of lung cancer.The exceptional accuracy achieved in our evaluation suggests that YOLOv5 can contribute to the development of automated cancer classification systems, ultimately leading to improved patient outcomes and more effective cancer management strategies.

1 .
Adenocarcinoma: This form of lung cancer develops from the glandular cells in the lungs that create mucus.The most prevalent form of lung cancer is adenocarcinoma, which often develops in the outer regions of the lung.It is frequently related to smoking; however, it can also occur in nonsmokers.Although adenocarcinoma grows more slowly than other types of lung cancer, it can spread to other parts of the body.2.Squamous cell carcinoma: This form of lungcancer develops from the flat cells lining the lungs' airways.Squamous cell carcinoma typically develops close to the bronchus in the middle of the lung.It frequently results from smoking and develops more slowly than smallcell lung cancer.Chest pain, breathlessness, and coughing are all possible side effects of squamous cell carcinoma.

A
. MODEL: The You Only Look Once (YOLO) algorithm, first introduced in 2016 as an object detection technique, is a popular deep learning algorithm for object detection in computer vision.It was created to recognize objects in real time and achieve high accuracy while using relatively few processing resources.YOLO has gone through various versions and changes over the years, with the most recent version being YOLOv5, which was shows the set Hyper-parameters of our model.Our experiment employs the Pytorch framework deep learning on GPU Tesla K80 by Google open Platform Colab-research.

Figure- 6 :
Figure-6: Complete folder structure of the Dataset Accuracy is a metric that measures the overall performance of a classification model.It calculates the percentage of correct predictions made by the model on a given dataset.The formula for accuracy is: Accuracy = (Number of correct predictions) (Total number of predictions) ( eq-3)

First, we
used a dataset of CT-image examples from the four classes of cancer to train the YOLOv5 model.To track the development of the training process, we plotted the train loss graph Figure-7, the test loss graph Figure-8, and the accuracy graph Figure-9.

Figure- 10 :
Figure-10: A Complete Comparison of Various Models

Table - 1
: Cancer statistics for the year 2023 [13]r et.al.[12]employsdeeplearningtechniques to categorize lung nodules in CT images.The authors propose a method that extracts deep features from CT images using convolutional neural networks (CNNs), which are then used to train a support vector machine (SVM) classifier with an overall accuracy of 75.01%, sensitivity of 83.35%, and false positive of 0.39/patient over a 10-fold cross-validation.Attique et.al.[13]proposed a novel feature selection and fusion method that combines classical features, as well as contrast-based features, to increase classification accuracy.The suggested method consists of three major steps: feature extraction, feature selection, and feature fusion.In the feature extraction procedure, the authors employ various conventional features, such as histogram, texture, and form features, as well as contrast-based features, which are based on the difference in intensity values between adjacent pixels.The findings show that the proposed method achieves an accuracy of 93.75%.

Table - 5
: Accuracy comparison.30%, demonstrating their effectiveness in this challenge.ANN and RF had somewhat lower accuracy rates of 80% and 92%, respectively, indicating their poorer performance.
classifying lung cancer subtypes.The maximum accuracy of 97.77% was reached by YOLO-v5, suggesting its better skill in effectively detecting and discriminating between the different subtypes.SVM and CNN both had high accuracy rates of 93% and 93.EAI Endorsed Transactions on Pervasive Health and Technology