Gold Returns Prediction: Assessment based on Major Events

INTRODUCTION: Major events such as economic crises, inflation, geopolitical tensions, and interest rates can have a significant impact on the price and returns of gold. OBJECTIVES: In this work, we focus on gold return prediction in five major events that occurred in Turkey. METHODS: We work on two data, one of which is text-based and the other is financial data. In the financial part, many algorithms are tested and it is found that Extra Trees Regressor gives the best results in most metrics. In the text-based part, we first create a new dataset and then implement sentiment analysis and topic modeling. RESULTS: Working on data with two di ff erent modes (numeric and text) o ff ers di ff erent perspectives. CONCLUSION: The use of sentiment analysis alone to forecast gold returns is not advised, it should be noted. To produce a more precise and trustworthy estimate of gold returns, additional fundamental and technical elements including interest rates, inflation, geopolitical concerns, and supply and demand should also be taken into account.


Introduction
The technique of predicting future price movements of gold based on historical data, market patterns, and numerous other factors that may have an impact on the price of the precious metal is known as "gold returns prediction." For predicting gold returns, a variety of techniques are utilized, such as technical analysis, fundamental analysis, and econometric models [1]. Through the use of statistical methods and price chart analysis, technical analysts can spot patterns, trends, and levels of support and resistance that can be used to forecast future price movements [2,3]. The supply and demand for gold can be affected by a number of economic, geopolitical, and market factors, including inflation, interest rates, central bank policies, and world events [4].
Econometric models evaluate historical data using statistical methods to find correlations between different economic and market variables and the price of * Corresponding author. Email: suleyman.eken@kocaeli.edu.tr gold. Based on the projected impact of these variables on the price of gold, these models can then be used to forecast future price changes. It's crucial to remember that predicting gold returns is not an exact science, and there are numerous unpredictable factors that can affect the price of gold [5]. Furthermore, the performance of gold in the past may not always forecast how it will perform in the future [6]. As a result, any projections should be taken with a grain of salt and utilized as a part of a bigger investment strategy [7].
For investors who are interested in purchasing or selling gold as part of their investment portfolio, gold return prediction might be crucial for a number of reasons. Predicting the return on gold can be significant for the following reasons: (i) Investment decisions: Gold return forecasting can assist investors in making wise selections. An investor might decide to buy gold if they think the price will rise in the future, while they might decide to sell their gold holdings if they believe the price will fall. (ii) Risk management: Gold is frequently regarded as a safe-haven asset that can assist investors in controlling risk in their investment portfolios.
Investors can manage their overall risk exposure by adjusting their portfolio allocations by forecasting the price of gold. (iii) Economic forecasting: The price of gold is frequently used as a predictor of largerscale movements in the economy. For instance, rising gold prices could signify inflation or other economic ambiguities. Economists and decision-makers can get insight into more general economic patterns and take well-informed actions by forecasting the price of gold. (iv) Hedging strategies: Some investors utilize gold as a hedging technique to guard against inflation or other economic risks [8]. The ideal moment to buy or sell gold as part of a hedging plan can be determined by investors with the use of gold return prediction. In general, gold return forecasting can be a useful tool for investors and decision-makers to make wise financial choices and acquire an understanding of broader economic trends. It's crucial to keep in mind that predicting gold returns is not an exact science, and there are numerous unpredictable factors that might affect the price of gold. According to our research, this paper is the first one using financial data and text data to investigate gold returns in critical periods or major events. These periods are Gezi Park demonstrations, The July 15th coup attempt, the case of Pastor Andrew Brunson, the Covid-19 pandemic, and the 2021 direct foreign exchange intervention. Table 1 shows a summarization of related works.
The contributions of this work to the literature are as follows: (i) We predict gold returns on Yahoo Finance Data using time series-based methods with different parameters.
(ii) We collect our new dataset, GoldCriticalTweets, and apply sentiment analysis and topic modeling to understand opinions.

Financial data
For the experiments in this study, we need the closing prices of many instruments for the previous ten years. We can import data using a variety of paid (Reuters, Bloomberg) and unpaid (IEX, Quandl, Yahoo Finance 1 , Google Finance) resources. We found the yahoofinancials 2 package to be very helpful and straightforward because this paper required multiple types of asset classes (Equities, Commodities, Debt, and Precious Metals). Once we get the list of instruments (Gold, Silver, Crude Oil, S&P500, Russel 2000 Index, 10 Yr US T-Note futures, 2 Yr US T-Note Futures, Platinum, Copper, Dollar Index, Volatility Index, Soybean, MSCI EM ETF, Euro USD, Euronext100, Nasdaq), we must specify the time period for which we must import the data. The period we've picked is from January 2013 to December 2023. In our methodology, we emphasize that in order to forecast future returns on gold, we employ the lag times of the listed instruments' returns. We determine the short-term returns for each instrument as well as the longer-term returns for a few instruments. The underlying principle of it is that there is a higher possibility of portfolio rebalancing if a particular asset has significantly excelled or underperformed, which would affect returns on other asset classes. We can also examine how far the current Gold price is from moving averages for various windows in addition to the lagged returns. Moving averages provide asset price supports and resistances in this widely used technical analysis indicator. Moving averages, both simple and exponential, are combined. After then, we update the feature space with these moving averages. All of this is about features. Now that we know what we want to forecast, we need to defne our targets. We must choose a horizon for our predictions because we are forecasting returns. Due to the high volatility and low predictive potential of other smaller horizons, we have picked the 14-day and 22-day timelines. Now that the entire data set is available, we can begin modelling. We will test various algorithms and give results in the next section. In Figure 1, you can see the course of free market gold prices from 2011 to 2021 on a kilogram basis by month in Turkey 3 .

Text-based data
We gathered our own data, GoldCriticalTweets Datasets (please see Table 2), using BeautifulSoap 4 and Twitter They follow a decision tree methodology to investigate the behavior of gold prices using both traditional financial variables.
Gkillas et al. [4] They forecast power of geopolitical risks (over and above economic uncertainty) for the conditional distribution of gold returns volatility, using the quantiles-regression.
Risse [5] Researcher combines the discrete wavelet transform with support vector regression for forecasting gold-price dynamics.
Bentes [9] Researcher employs three volatility models of the GARCH family to examine the volatility behavior of gold returns.
Chai et al. [10] They study the dynamic relationship between the gold price returns and its affecting factors Plakandaras and Ji [11] They propose model is based on short-and long-run decomposition of input variables using the EEMD algorithm and forecasting each component separately based on the SVR method.

Our work
We collect our new dataset, GoldCriticalTweets, apply sentiment analysis and topic modeling to understand opinions, and predict gold returns on Yahoo Finance Data using time series-based methods with different parameters.
API 5 . Different keywords are used to collect data: gold, ons, gold/sell, goldpricehourly, goldjewelry, goldexchange, electronicgoldreceipts, goldinvestment, buyinggold, goldprice, and so on. Raw data are subjected to a number of pre-processing operations. First, redundant columns, NaN, and NaT rows are removed from tweets' body column before it was converted to a string type. Then, using regular expressions, all tweets are converted to lowercase and non-alphabetic letters, URLs, hyperlinks, emojis 6 , special characters, extra new lines, and references to other users are eliminated. After that, texts are lemmatized with spaCy and cleaned up with Gensim 7 , respectively. Table shows the time-ranges for collected datasets.

Regression modelling
This sub-section explains how the financial data is pre-processed and how the models are trained. All fundamental and essential data transformations, such as removing ids, One-Hot Encoding categorical components, and missing value imputation, are completed prior to model training. Different learning models on pre-processed data are used such as CatBoost Regressor (CatBoost), K-Neighbors (KNN) Regressor, Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (EGB), Random Forest (RF) Regressor, Extra 5 https://developer.twitter.com/en/docs/twitter-api 6 https://pypi.org/project/emoji/ 7 https://pypi.org/project/gensim/ Trees (ET) Regressor, and so on. In order to determine the set of parameters that best maximize the selection of metrics (in this case, R2) for fine-tuning models, we also employ a randomized grid search. Numerous outliers affect not only the model's performance but also its potential for generalization in the future. Thus, it is worthwhile to eliminate these outliers. Then, we also try to see if boosting or bagging can enhance the performance of the model. The best models can be combined to test if combined models are more effective.
Even if the results of the model blending are excellent, we would like to explore whether there is a way to get a few additional basis points of R2 from the data. We create a multi-level stack of models to accomplish that. This is distinct from blending because, in stacking layers of models, the predictions from each layer are sent along with the original features to the next layer.

Sentiment analysis and topic modeling
Sentiment analysis is a method for determining the sentiment or emotion that is being expressed in a piece of writing. In addition to fundamental and technical research, sentiment analysis can be utilized as a supplemental strategy to forecast gold returns. Sentiment analysis can assist in determining the general attitude or sentiment of market participants toward gold, which may have an impact on the amount of demand for the precious metal and, subsequently, the price. For instance, sentiment analysis can be used to examine news stories and social media to determine how people feel about gold. If sentiment analysis finds that people's attitudes toward gold are typically favorable, this could enhance demand for the precious metal and drive up the price of the commodity. On the other hand, if sentiment analysis reveals a negative attitude toward gold, this can result in less demand for gold and a decline in its price. After the data cleaning steps on Gold-CriticalTweets dataset, text classification is performed. The performance of many different classification algorithms such as Support Vector Machine (SVM) [12,13], Convolutional Neural Network (CNN) [14], Long shortterm memory (LSTM) [15,16], Multi-layer Perceptron (MLP) [17], Bidirectional Encoder Representations from Transformers (BERT) [18], ELECTRA [19] and Robustly Optimized BERT Pretraining Approach (RoBERTa) [20] are analyzed. A method used in natural language processing (NLP) called topic modeling enables the identification of hidden themes or topics within a body of text data. It is a type of unsupervised learning in which the algorithm classifies the input without the aid of predetermined categories or labels. The objective of topic modeling is to uncover the most signifcant and often occurring topics in a corpus of text data by extracting the underlying structure of the data. Latent Dirichlet Allocation (LDA), which is a probabilistic graphical model that assumes each document in the corpus is a mixture of a small number of latent themes, is the most used technique for topic modeling [21,22]. The algorithm tries to estimate the topic distributions of each document as well as the word distributions of each topic. Each topic is defined as a probability distribution over the words in the vocabulary. A set of topics, each represented by a distribution over words, and a set of documents, each represented by a distribution over topics, are the products of a topic modeling method. Here, we use Non-negative Matrix Factorization (NMF) technique, the process of figuring out the underlying topics or themes in a collection of documents. In topic modeling using NMF, a matrix made of a corpus of text documents is created, with each row denoting a text document and each column denoting a word from the vocabulary. Since word frequencies are non-negative numbers, the entries in the matrix show how frequently each word appears in each page [23]. To factorize this matrix into two non-negative matrices, W and H, where W is a matrix of topics and H is a matrix of weights, is the objective of NMF in topic modeling.

Experimental results on financial data
In this sub-section gives experimental results on financial data. Table 3 shows the performance results of different regression models for gold return prediction. Among the all models, Extra Trees Regressor has advangates over others. The reduction of bias is Extra Trees' key benefit. This relates to the sampling from the full dataset used to build the trees. Extra Trees prevents this by sampling the complete dataset, as different subsets of the data may create different biases in the findings obtained.

Gezi Park demonstrations.
A series of rallies known as the Gezi Park demonstrations occurred in Turkey in 2013. The demonstrations started as a peaceful sit-in against plans to develop Istanbul's Taksim Square's Gezi Park, a tiny park, but swiftly grew into a bigger antigovernment movement. Protesters demanded greater democracy and press freedom as the demonstrations swiftly extended to other Turkish cities. The protests attracted a lot of attention both domestically and abroad thanks to their use of social media to organize and promote their message [24]. Table 4 shows the performance results of different methods for sentiment analysis (positive, negative, and neutral) related to Gezi Park demonstrations. Considering all models, CNN and MLP have the best performance metrics. The topics extracted from tweets are given Figure 2. Each represented as bar plot using top few words based on weights. When the tweets are examined, it is seen that different vocabularies are used. According to Word Gold Council (WGC), demand for gold in the third quarter (Q3'13) totaled 868.5 tonnes and cost $37 billion. Demand decreased year over year as western investors' ETF portfolios continued to be depleted. The Gezi Park protests were the primary causes of the Turkish currency market bubble.  In the wee hours of July16th, the administration eventually recovered control of crucial areas. More than 250 people lost their lives and nearly 2,000 others were injured as a result of the coup attempt [25]. Table 5 shows the performance results of different methods for sentiment analysis (positive, negative, and neutral) related to the July 15th coup attempt. Considering all models, CNN has the best performance metrics. The topics extracted from tweets are given Figure 3. According to WGC, the demand for gold in Q2'16 continued the patterns from the previous quarter: massive ETF inflows were balanced out by weak demand for jewelry despite rising prices. Gold demand dropped 10% to 992.8 tonnes in Q3'16. Despite being the weakest quarter since Q2'15, this. Inflation and interest rates increased, employment data deteriorated and growth slowed down in Turkey.  Covid-19 pandemic. The new coronavirus SARS-CoV-2 is to responsible for the current worldwide health disaster known as COVID-19. A global pandemic resulted from the outbreak, which was initially discovered in Wuhan, China, in December 2019 [27][28][29][30]. Table 7 shows performance results of different methods for sentiment analysis (positive, negative, and neutral) related to Covid-19 pandemic. Considering all models, CNN, MLP, and LSTM have the best performance metrics. The topics extracted from tweets are given Figure 5. According to WGC, gold demand increased slightly to 1,107.9t in Q3'19 and annual gold demand dips to 4,355.7t. The global COVID-19 pandemic fueled demand for safe-haven investments like gold, offsetting pronounced weakening in market segments that cater to consumers. Negative news and comments have more of an impact on stocks, gold, and interest rates.

direct foreign exchange intervention in Turkey.
In an effort to stabilize the Turkish lira, the Central Bank of the Republic of Turkey (CBRT) made a number of direct foreign exchange operations in 2021 8 . In order to increase the availability of foreign currency and lower its price in relation to the lira, the interventions involved the CBRT purchasing foreign currency on     all models, RoBERTA has the best performance metrics. The topics extracted from tweets are given Figure 6. WGC estimates that the demand for gold in 2021 as a whole (excluding OTC) climbed to 4,021t, driven by an almost 50% increase in Q4 demand to a 10-quarter high. After the ffth intervention in foreign currency in a row, there were increases in dollar and gold prices in Turkey.

Conclusion
Many variables, such as monetary policy, geopolitical tensions, and economic conditions, can influence gold returns. So, it is crucial for investors to take these aspects into account when making decisions about their gold investments. In this paper, we use fnancial data and text data to investigate gold returns in critical periods or major events. On financial data, different machine learning models are used to predict gold returns. According to results, Extra Trees Regressor has a clear advantage over others. On our own new textbased data, we analyze sentiment analysis and extract latent topics. In topic modelling, each major events have top few words based on weights. It is possible to comment events regarding these words. However, the use of sentiment analysis alone to forecast gold returns is not advised, it should be noted. To produce a more precise and trustworthy estimate of gold returns, additional fundamental and technical elements including interest rates, inflation, geopolitical concerns, and supply and demand should also be taken into account. In the future, we will consider the following issues: (i) Enriching the dataset for more accurate results, (ii) Data collection from more platforms for text-based data, (iii) Using news sites, political news and articles for text-based data, (iv) Using interest rates, inflation and employment data.