Enhancing Efficiency and Energy Optimization: Data-Driven Solutions in Process Industrial Manufacturing

This paper reviews the current state of research in data analytics and machine learning techniques, focusing on their applications in process industrial manufacturing, particularly in control and optimization. Key areas for future research include selection and transfer learning for process monitoring, addressing time-varying characteristics, and enhancing data-driven optimal control with domain-specific knowledge. Additionally, the paper explores reinforcement learning techniques and robust optimization, including distributional robust optimization, for high-level decision-making. Emphasizing the importance of historical knowledge of plants and processes, this paper aims to identify knowledge gaps and pave the way for future research in data-driven strategies for process industries, with a particular emphasis on energy efficiency and optimization


Introduction
As the backbone of the global economy, process industries provide indispensable raw materials and intermediate products for key areas such as electronics, aerospace, transportation and logistics, and healthcare.This industry not only has a profound impact on other manufacturing industries, but its role in promoting the entire economic activity cannot be ignored.However, with the deep integration of informatization and industrialization, the process industry is facing unprecedented challenges, especially in data-driven intelligent optimized manufacturing.
In the context of increasing efficiency and optimizing energy, the process industry encounters a series of difficult problems.First, there is a huge gap between the massive accumulation of data and its effective utilization.Although modern sensors and information systems can generate large amounts of data, how to extract valuable information from these data to guide the optimization of the production process is still a technical problem.Secondly, existing data analysis technologies and algorithms are often difficult to adapt to the complexity and dynamics of the process industry, resulting in insufficient accuracy of prediction and decision support systems.In addition, cross-department and cross-enterprise data sharing and collaboration mechanisms are not yet complete, which limits the potential of big data in the entire industry chain.
To play a critical role in the improvement of complex industrial procedures and product traits optimization approaches are considered.However, due to the nonappearance of suitable mathematical models that describe the link between the reliant and autonomous variables the employment is delayed frequently.Particularly in the situation of the highly multivariate and non-linear relationships, due to the various conventions that outcome in changeable forecasts that are not reliable for the industrial perspective, the conventional analytical models become short.For global economic growth and societal welfares, the process industries like those of Sinopec, Shell and ExxonMobile are critical.For secure and optimum operations, the growing complexity and spatial scale of the productions process are the dares.The compact interconnections between the process and the plants at the low level control make multiscale coupling phenomena and multi-loop which makes it hard to design the control strategies competently.Furthermore, enlarged disclosure to disturbances and fault sources obscures the design phase, which leads to enlarges possibilities of abnormal events (1).

Figure 1: Facilitating the Industry 4.0 using technological advancements
At advanced planning levels and preparing supple and real-time decision making is significant to lessen the operational cost and increase the economic revenues.In smart manufacturing to discourse the reliability, competence and safety needs in the modern process industries there is a critical need for state of the art inventions and technologies.It shows both the prospects and defies for industry 4.0, which shows a shift from liable on humans to machines commendably undertaking logical tasks and generating inventions separately.
In process industries data analytics and machine learning have been transmuted by the big data age, moving from low level control loops to optimal control and high-level decisionmaking.This book gives visions into future research, assesses latest advances, and highlights related literature.It confers how the interpretability of machine learning and data analytics models relays to process knowledge (2).It can be seen in Figure 1 that big data has significantly enlarged the use of data analytics and machine learning in the process industries, allowing both active applications for high-level decision-making and optimum control against several industry hierarchies and passive applications alike process monitoring and easy sensing.
Employing developments in data mining and analytics, data-driven models have raised a feasible solution to this issue.These models are exceptional at treating complex, nonlinear multivariate systems because they use assortment of statistical approaches, comprising regression analysis, as well as many modern methods like machine learning, soft computing, and artificial intelligence (3).For example, case study on foretelling steels mechanical attributes for particular applications, industrial data is used to make models using artificial neural networks (ANNs) (4).
These state-of-the-art approaches have improved capacities for handling complex, nonlinear multivariate systems.The artificial neural network (ANN) is one of the most well-known tools in this array, according to the industrial case study that is referenced in this chapter.ANN, which is well-known for its capacity to simulate the complicated workings of the human brain, is a flexible and powerful tool that can be used to handle large datasets and reveal complex patterns that may otherwise elude traditional analytical techniques.It is incredibly well suited for modeling and forecasting complex industrial processes, where interactions between inputs and outputs frequently display non-linear and linked properties, because of its adaptive nature and ability to learn from experience.The use of ANN in the context of the above case study makes it easier to generate accuracy (5)(6)(7).The data-driven model-based industrial process optimization for manufacturing tasks is visually presented in Figure 2. Rigid industrial requirements for material qualities are encountered in large part by combining design optimization techniques with data-driven modeling.strategies for optimization, such as genetic algorithms (GAs) (8,9), But the special nature of this method makes it difficult to formulate the optimization issue.By using ANN models to associate steel qualities with input variables, the goal is to reach a certain performance level, as opposed to typical optimization procedures that aim to minimize or increase attributes.To address this, ANN outputs are transformed into a scale that runs from 0 to 1, representing how close they are to the intended values, using a desirability function.After that, GA is used to maximize the composite desirability, which represents the overall steel performance, taking into account the geometric mean of the separate desirability values.To put it simply, the combination of optimization techniques like GA with data-driven modeling-specifically, ANN-provides a powerful means of overcoming complex industrial obstacles and achieving the required characteristics of a product while adhering to stringent regulations.The industrial manufacturing process optimization process based on data-driven models is shown in Figure 3.In Figure 3, a data-driven model is applied to optimize production efficiency and energy usage.As can be seen from the figure, the decision-making process is optimized based on data collection and through demand analysis, resource data model construction and knowledge management.Data is first collected from sensors and logs on the production line, and then prepared for model training through data preprocessing and feature engineering.Next, machine learning algorithms are used to build a data-driven model that can predict key parameters in the production process and provide optimization suggestions.Finally, according to the output of the model, the production parameters are adjusted to optimize the process.

Multivariate Statistical Process Monitoring (MSPM)
and soft sensing technology are key tools for achieving efficiency improvements and energy optimization in the process industry.
MSPM keeps an eye out for irregularities, senses withdrawals from the norm, and breaks complications before they get out of hand, with the support of interrelated process variables.To recognize irregularities or irregular patterns that call for corrective action by examining multivariate data sets and building models of typical process behavior, it uses statistical methods.

Learning
It is often believed that there are two main types of modeling tasks: supervised learning and unsupervised learning.Making expressive models to define the basic structure of input data is known as unsupervised learning, and it is often applied in process data distribution monitoring.On the other hand, supervised learning, which contains both classification and regression, makes a functional mapping from input to output and stresses on output prediction accuracy.This is specifically related to soft sensing, where fast-rate process variables are used to deduce significant quality characteristics in industrial processes.Lately, there has been an increasing focus on feature learning or representation learning, which highlights the usage of domain-specific information in the model generation process.This method significantly develops the interpretability of the model, which mends the model as a whole (11).
The use of neural networks with piecewise linear units in computer vision is an example of representation learning.Refining model enactment may be attained by using piecewise linear units as domain-specific information.A unified viewpoint on supervised and unsupervised learning is provided by representation learning.To recognize interpretable core features from incoming data, unsupervised learning techniques work as "feature detectors".The enactment of supervised learning is then enhanced by using these features as inputs for classifiers or regressors.Deep learning algorithms are constructed on this notion.Basically, there is no clash between supervised and unsupervised learning; somewhat, the former significantly augments the latter (12).
In summary, an ideal model, irrespective of its complication, should have clear physical clarifications.This raises the question: What previous knowledge supports well with the features of process data?Definitely, this is a collective yet hidden focus of many Multivariate Statistical Process Monitoring (MSPM) studies.To address this, we analyze new advancements in MSPM and discover soft sensing approaches.

Feature Learning-based MSPM
MSPM is a technique that identifies and predicts potential problems by analyzing multiple variables in the production process.The basic principles of MSPM include key technologies such as principal component analysis (PCA) and partial least squares regression (PLSR).New developments in MSPM in modern industrial applications, such as adaptive monitoring strategies and machine learning integration methods, can improve the accuracy and efficiency of process monitoring.
The consequences of even minor faults can be expressively enlarged in modern process industries due to their large scale and strong combination.Therefore, uninterrupted monitoring of operational status and timely maintenance interferences are imperious to confirm manufacturing process safety, although at the cost of great manual work (13).Since the 1980s, Multivariate Statistical Process Monitoring (MSPM) has appeared as a solution to this challenge, using an extensive array of classic machine learning algorithms that suitably prove the intelligence useful in industrial manufacturing.Numerous analysis articles have delivered complete summaries of this field (13).
Modern goings-on have wanted to attach tailored previous knowledge for continuous manufacturing processes to improve effective MSPM models.Assumed that manufacturing processes normally show long resolving times, the whole system under monitoring tends to display inertia characteristics.Slowness in processes is a key attribute for capturing process dynamics and enhancing descriptions.This has led to the adoption of slow feature analysis (SFA) for modeling process data and effective monitoring and diagnosis (14,15).SFA offers distinct advantages over traditional MSPM methods, enabling distinct descriptions of steady states and temporal behaviors in industrial processes, unlike PCA, ICA, and CVA (16).By designing monitoring statistics tailored to process dynamics anomalies, SFA facilitates the distinction between nominal operating point switches and genuine faults that result in dynamic anomalies.The slowness principle has led to the development of various monitoring approaches, such as adaptive and probabilistic monitoring, which have been successfully applied in various processes (14).
Machine learning models are often unimodal, but largescale industrial processes often have multiple operating conditions and frequent mode switches.Multi-modality should be considered when designing models for MSPM.The Gaussian mixture model (GMM) (17) is the simplest for multi-mode process monitoring, but lacks information on transition probabilities between different modes.The feature learning approach to process monitoring views conventional charts as low-level features, which are then inputted into a high-level model like PCA.This method effectively combines information from multiple models by accounting for different characteristics of process data using the Gaussian distribution of extracted features.This rationalizes the use of PCA as the high-level process monitoring model (18).

Feature Learning-based Soft Sensing
Soft sensing technology estimates process variables that are difficult to measure directly by establishing mathematical models.The basic composition of soft measurement is probably the combination of data-driven model and mechanism model.At present, soft measurement technology has been widely used in various fields, including the application of deep learning in soft measurement and how to improve the prediction ability of the model by fusing multiple data sources.
Soft sensing, an intellectual sensing technology, originated from the 1978 (19) inferential control approach by Brosilow and Tong, enabling real-time estimation of challenging parameters like product quality and environmental indicators by influencing freely obtainable process variables.Particularly, the use of soft sensors is increasing to comprise Key Performance Indicator (KPI) forecasting, proposing realtime estimates of important performance metrics to support operators in decision-making processes (20).Soft sensor progress primarily demands talking to a regression problem, leading to the application of many administered machine learning algorithms, comprehensively documented in the collected works (21).
The prospective of retaining representation learning in soft sensor construction was firstly emphasized in a formative work (22), where Probabilistic Slow Feature Analysis (PSFA) was used to extract gradually changing features.These features, taking underlying process dissimilarities, show high correlations with quality indices.Matched to customary dynamic Partial Least Squares (DPLS), this method validates greater active estimate precisions and enables a semi-supervised learning framework by combining fast-rate process data with erratically experimented quality data.Following delays introduced Bayesian learning methods to remove dynamic features with unstable dynamics and improved regularized SFA for quality estimate in industrial processes (23,24).
The initiation of deep learning characterizes the spirit of demonstration learning and has been smeared to soft-sensing chores with notable success.Early efforts used Deep Neural Networks (DNNs) to envisage critical parameters such as the cut-point temperature of thick diesel in unfinished refinement units.The training process of DNNs includes both unverified and administered learning steps, with unverified learning efficiently removing nonlinear basic features that improve the regression model's enactment (25).Learning methods have further found applications in different areas such as crude oil grouping and carbon dioxide (CO2) capture process modeling, leveraging the essential features removed in the unverified learning phase for process monitoring and fault analysis (26).
Moreover, soft sensing models can deed other feature removal methods, such as correlations within a lowdimensional subspace.Principal Component Regression (PCR) stands as an primary example in this domain, using simple Principal Component Analysis (PCA) for feature extraction (27).The use of low-dimensional hidden variable models for soft sensing purposes has been widely reviewed, with methods like neighborhood maintaining entrenching working to learn the basic nonlinear structure of data and establish analysts based on it (28).

Optimal Control and High-Level Decision-Making
Optimal control and high-level decision-making are crucial in various industries like manufacturing, energy management, robotics, and autonomous systems.They enable precise manipulation of system variables to achieve desired objectives while minimizing costs and resource utilization.High-level decision-making involves strategic planning, resource allocation, and policy formulation based on data analysis and predictive modeling.These applications drive automation, optimization, and adaptability in complex environments (29).

Leveraging Data Analytics and Machine Learning for Optimal Control
Data-driven monitoring is a critical aspect in the process industries, using historical and real-time data to monitor the health of production processes.The main methods of datadriven monitoring include but are not limited to time series analysis, machine learning models and deep learning technology.In practical applications, problems such as data quality, model generalization ability and real-time performance are usually encountered.Data-driven estimation technology estimates process variables by establishing a mathematical model between input and output data; however, there are also some limitations and shortcomings, such as high dependence on data, poor model interpretability, and applicability issues in complex systems.
Model Predictive Control (MPC) is a widely recognized methodology for advanced industrial process control.It relies on a mathematical model to describe system dynamics and develop optimal control approaches (30).The use of data analytics and machine learning in optimum control can be approximately classified into two functional domains.First, prediction models are established for the doubts by fitting historical data, thus breaking down doubt into a deterministic element known before and a stochastic component representing estimate errors.For example, in the perspective of smart grid operations, estimations of electricity group from renewable sources can be imitative based on weather forecasts and other environmental factors (31).Likewise, in situations where online measurement of product quality and other critical indices isn't possible, soft sensor models are hired to offer real-time estimations, critical for executing closed-loop control.Machine learning techniques such as Support Vector Machines (SVM) and neural networks find widespread application in emerging these forecast models.The precision of such models is dominant as it directly effects control performance.
Secondly, data analytics and machine learning are applied to define ambiguity distributions in an unconfirmed manner within the MPC framework.Real-world systems are always liable to ambiguous instabilities that can depart system states from the insignificant trajectory.Robust Model Predictive Control (RMPC) and Stochastic Model Predictive Control (SMPC) have been invented to attack this doubt using diverse mathematical tools (32).RMPC pays ambiguity sets to define potential regions of ambiguity recognitions, while SMPC directly uses probability distributions.A topical trend in RMPC and SMPC includes dynamically modeling ambiguity using data-driven methods.Old-fashioned norm-based sets in RMPC often want the tractability to successfully represent ambiguity distributions, encouraging the use of data-driven ambiguity sets created via invalid learning methods.For example, retaining Support Vector Clustering (SVC) to learn a dense high-density region from existing data can troupe the subsequent optimal control problem as a classic robust optimization (RO) problem, thus attractive workability and dropping traditionalism (33).This method has proven important progresses in irrigation control, showcasing the prospective of taking out meaningful information from data (34).In accumulation, learning-based schemes integrated into MPC frameworks have been developed to handle uninteresting control tasks in autonomous systems (35).

High-Level Decision Making
General optimization techniques under ambiguous, such as Stochastic Programming (SP) (36), Robust Optimization (RO) (37), and Distributionally Robust Optimization (DRO) (38), have found widespread applications in energy system operations and supply-chain design (39).The eminence of scenario programs pivots on choosing an enough number of serious scenarios with theoretical outcomes proven to guide this selection process (40).Though prompted optimum issues often need massive limits, posing good computational challenges.To address this, many decomposition algorithms and distributed optimization techniques have been established, supporting the parallel solution of sub problems with partial communications (41).
In data-driven RO, ambiguity sets are made directly from ambiguity data, presenting an unconfirmed learning task.Nevertheless, not all unconfirmed learning methodologies are right due to the requirement to balance the precision and compliance of prompted optimum issues.Many unconfirmed learning methods designed for data-driven construction of ambiguity sets have been established.For example, a novel methodology using piecewise linear kernel-based SVC has been offered to capture the distributional geometry of ambiguous data, successfully minimizing conservation in RO issues (42).In the same way, data-driven ambiguity sets developed through PCA and kernel density approximation offer systematic usage of correlations and asymmetry (43).Then, deep learning is used as the basic algorithm to build a data-driven model.The model architecture is shown in Figure 4.In Figure 4, The manufacturing process optimum method offered using financial and production data as inputs for the deep learning algorithm, particularly using the artificial neural network (ANN).Deep learning, described by its refined architecture, is proficient at processing wide-ranging datasets.Vital to this methodology is the usage of deep neural networks, designed alike to brain cells, making them able to consider vast amounts of information and extract appropriate abstract features from it (10).

An Outlook on Future Research Directions
The future of industrial process optimization is set to see significant advancements in the integration of advanced technologies like AI, machine learning, and IoT into control and decision-making frameworks.Researchers aim to develop adaptive control systems using AI algorithms and real-time data analytics, enhancing efficiency and precision.The growing interest in quantum computing and block chain technology offers new opportunities for security, scalability, and decentralization.Interdisciplinary collaboration between experts in engineering, computer science, and data analytics is crucial for navigating challenges and seizing potential in technology and industrial optimization.This research aims to enhance sustainability, productivity, and resilience of industrial processes.

Advancements in Process Monitoring
Process monitoring is undergoing significant advancements, leveraging IoT, sensors, and data analytics to enable real-time monitoring and control of complex processes.The integration of predictive analytics and machine learning algorithms minimizes downtime, reduces operational costs, and enhances productivity.Remote monitoring solutions allow operators to oversee critical processes from anywhere, facilitating agile decisionmaking and improving responsiveness to changing conditions.As process monitoring evolves, it promises to revolutionize industrial operations by optimizing performance, ensuring quality, and ensuring safety across various applications.Several methods are available in the literature, proposed by researchers for the features selection in the design monitoring process models.The extracted set of features must be closely related to the previous knowledge to provide a comprehensive insight about the data that is important for the process (44,45).A good process model extracts useful features and provides clear interpretations for industrial experts.Future focus should be on high-level features like slowness, nonstationarity, and causality.Including industrial experts' knowledge, such as monotonicity and range data, can create interpretable models for root cause analysis and error recognition.Transfer learning can be adapted to synthesize data information under diverse operating conditions or manufacturing devices.(46,47).Designing user-friendly visualization methods can aid decisionmaking by enabling better understanding of highdimensional process data (48).
Considering the diversity and complexity of data in process industries, future research can explore new data integration and fusion techniques.This may include developing algorithms that can handle heterogeneous data from multiple sources and designing methods that can extract deeper information from big data.Through smarter data fusion strategies, the accuracy of monitoring and estimation can be significantly improved in the future.

Advancements in Soft Sensing
Soft sensing is revolutionizing data acquisition and interpretation in industrial processes by integrating machine learning algorithms and advanced signal processing techniques.This fusion of data-driven approaches with traditional sensing technologies helps engineers overcome challenges like sensor degradation, drift, and variability, enhancing the reliability and performance of soft sensing systems.Novel sensor modalities, such as flexible and stretchable electronics, offer real-time, in-situ sensing in harsh or inaccessible environments.As soft sensing evolves, it holds the promise of revolutionizing understanding and control of industrial processes with greater precision and adaptability, leading to smarter, more responsive manufacturing and production systems.Soft sensors in industrial processes can easily decrease over time, requiring significant workload for model maintenance and updates.Quality forecasting should be considered more than just a regression problem, and research should focus on the adaptive mechanism of estimate models, especially in the presence of regular operating condition deviations (49).
Furthermore, the imprecision of laboratory data produced by humans could be taken, such as ambiguous time delay, large and changing sampling intervals, and sampling practices of different operators.Out-dated controlled models are usually under a strong idea that data samples are free and identically distributed (50).The growing online learning theory offers innovative solutions for modeling tasks without specific data conventions, potentially addressing the complex mechanism of process variables affecting product eminence (51).Online learning can effectively handle deterministic, stochastically produced data, making it valuable for quality-control issues in the era of big data.With advancements in imaging technologies, more image and spectral data are being gathered in industries, providing valuable information for creating reliable prediction models (52,53).Researchers have successfully utilized image processing, machine learning, and deep learning in various domains, including industrial process optimization, to overcome challenges in high dimensionality and strong correlations (54, 55).

Advancements in Data-Driven Optimal Control
Upcoming research struggles can be accepted in combining domain-specific knowledge into developing ambiguity sets in RMPC (34).Data-Driven Optimal Control has revolutionized industrial process optimization, offering more efficient, cost-effective, and environmentally friendly solutions.Traditional optimization relied on mathematical models and theoretical frameworks, which struggled to capture complex realworld systems.Key advancements include data collection and integration, machine learning and predictive analytics, real-time control and optimization, model-free control techniques, and scalability and generalization.Data collection and integration involve the use of sensors, IoT devices, and automation systems to capture real-time information about various aspects of the industrial process.Machine learning algorithms, such as neural networks, support vector machines, and random forests, can analyze historical data and make accurate predictions about future outcomes, enabling proactive decision-making and preemptive interventions.With the rapid development of artificial intelligence and machine learning technologies, future research can explore new applications of these technologies in process industries.For example, use deep learning to model complex processes, or develop adaptive control algorithms to cope with uncertainty and dynamic changes in production processes.Real-time control and optimization enable operators to optimize process parameters in response to changing conditions, ensuring optimal performance under dynamic operating environments.Model-free control techniques like reinforcement learning and evolutionary algorithms learn optimal control policies directly from data, enabling more robust and adaptive control strategies (56).

Advancements in High-Level Decision Making
In contrast with the other applications, high-level decision-making is most significant, since it have the impact regarding the environment and economy of a company industrial process.High-level decision-making is crucial for a company's environment and economy, and data-driven RO and DRO are expected to be used in process industries in the future.However, these decisions are often made under ambiguity, making it essential to improve their solution eminence and computational effectiveness.Recent DRO methods use moment information to define probability distribution uncertainty, utilizing various types of moment information from simple data analytics tactics (54).The use of unsupervised learning methods is being encouraged to remove high-level information, such as distribution within high-dimensional feature spaces, to reduce uncertainty.Machine learning can help lessen probability distribution uncertainty, leading to less conventional solutions.Kernel-based machine learning algorithms can be used to derive nested sets, capturing the mainstream of data samples, despite the lack of systematic study on their creation (57).

Conclusion
This paper examines recent research on data-driven monitoring, estimation, control, and optimization in modern process industries.It highlights the importance of interpretability in inactive applications and the necessity for functionality in active ones.Despite the significant impact of big data, most data-driven techniques have not been applied in practice.The paper emphasizes the need for prior knowledge of plants and processes for successful applications and suggests challenges and openings for the future research.

Figure 2 :
Figure 2: Hierarchical applications of data analytics and machine learning in process.

Figure 3 :
Figure 3: Manufacturing process optimization in industries based on the data-driven models (10)