Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Next Article in Journal
Nonlinear Dynamic Analysis of High-Strength Concrete Bridges under Post-Fire Earthquakes Considering Hydrodynamic Effects
Previous Article in Journal
Optimizing Ghana’s Socioeconomic Metabolism Amid Urbanization from 2000 to 2019: An Emergy Synthesis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting Meteorological Drought Conditions in South Korea Using a Data-Driven Model with Lagged Global Climate Variability

by
Seonhui Noh
1 and
Seungyub Lee
2,*
1
Department of Civil Engineering, Chungnam National University, Daejeon 34134, Republic of Korea
2
Department of Civil and Environmental Engineering, Hannam University, Daejeon 34134, Republic of Korea
*
Author to whom correspondence should be addressed.
Sustainability 2024, 16(15), 6485; https://doi.org/10.3390/su16156485
Submission received: 27 May 2024 / Revised: 14 July 2024 / Accepted: 25 July 2024 / Published: 29 July 2024

Abstract

:
Drought prediction is crucial for early risk assessment, preventing negative impacts and the timely implementation of mitigation measures for sustainable water management. This study investigated the relationship between climate variations in three seas and the prediction of December meteorological droughts in South Korea, using the Standardized Precipitation Evapotranspiration Index (SPEI). Climate indices with multiple time lags were integrated into multiple linear regression (MLR) and Random Forest (RF) models and evaluated using Pearson’s correlation coefficients (PCCs) and the Root Mean Square Error (RMSE). The results indicated that the MLR model outperformed RF model in the western inland region with a PCC of 0.52 for predicting SPEI-2. On the other hand, the RF model effectively predicted drought states of ‘moderate drought’ or worse (SPEI < −1) nationwide, achieving an average hit rate of 47.17% and Heidke skill score (HSS) of 0.56, particularly excelling in coastal areas. Nino 3.4 turned out to be the most influential factor for short-period extreme droughts (SPEI-2) with a three-month lag, contributed by the Pacific, Atlantic, and Indian Oceans. For periods of four months or longer, climate variations had a lower predictive value. However, integrating autocorrelation functions to account for the previous month’s drought status improved the accuracy. A HYBRID model, which blends linear and nonlinear approaches, further enhanced reliability, making the proposed model more applicable for drought forecasting in neighboring countries and valuable for South Korea’s drought monitoring system to support sustainable water management.

1. Introduction

A drought is one of major natural disasters which affects wide regions globally and significantly impacting water resources and ecosystems, ultimately jeopardizing the sustainability of our community. Meteorological drought, characterized by prolonged periods of deficient precipitation, occurs in various regions of the world as a normal part of the climate. Some other types of droughts include agricultural, hydrological, and socioeconomic droughts. Among these, hydrological droughts refer to reduced water levels in rivers, reservoirs, and groundwater, often occurring following meteorological droughts due to delayed water system responses.
In South Korea, droughts generally occur in cycles of 2–3 years and 5–8 years [1,2], Additionally, short-term droughts lasting 3 to 4 months [3] and long-term droughts such as those in 1988 and 1994–1995 [4] have been observed. Recently, severe hydrological droughts in 2014 and 2015 affected the Han River and Geum River basins due to lower-than-average rainfall, leading to restricted water supply in dam-dependent areas [5,6]. A major cause of agricultural and hydrological droughts is the persistence of meteorological droughts without returning to normal conditions [7]. To mitigate droughts, reduce their impacts, and to achieve sustainability, early-stage strategies must be implemented, starting with the development of systems for monitoring and predicting meteorological droughts.
The key for drought monitoring is by utilizing a drought index, which is dimensionless and can be used to predict and monitor drought states [8,9]. The Standardized Precipitation Index (SPI) [10,11] is a widely used index for quantifying meteorological drought state, providing critical spatiotemporal information for decision-makers [12,13]. Despite the simplicity and suitability of the SPI for drought monitoring, its accuracy can be compromised as it only considers precipitation data [14]. Instead, the Standardized Precipitation Evapotranspiration Index (SPEI), which considers both precipitation and evapotranspiration measurements, can offer better performance and higher drought representation under global warming scenarios than the SPI [15,16]. This index has been widely applied due to its high reproducibility and accuracy in assessing drought conditions [17,18,19,20]. The SPEI was specifically considered for meteorological drought analysis [17,21,22,23].
Predicting meteorological drought involves understanding its causative relationship with large-scale climate variations. These droughts arise from variations in sea surface temperatures (SSTs) or atmospheric circulation patterns in remote conditions, i.e., climate variability [24]. Climate variability, which occurs over months, seasons, or years, can correlate with drought occurrence [25,26,27,28,29,30] and serves as a basis for drought prediction [31,32,33,34,35]. In Australia, factors influencing precipitation variation have been identified and used as predictors due to its proximity to sources of large-scale climate variability [31,33,36]. While South Korea is more distant and shows less significant relationships, studies have used correlation, synthesis, and principal component analysis to explain climate variability. Climate indices can be dynamically linked to monthly precipitation and temperature in South Korea through lagged correlation coefficients [37,38]. Additionally, they can be utilized as predictor variables for precipitation forecasts [35], with various variables, rather than specific ones, influencing the relationships. The Arctic Oscillation (AO) plays a role in shaping the climate of the Northern Hemisphere [31] and maintains a significant out-of-phase relationship with winter temperatures in East Asia, including South Korea. During the negative AO phase, the Siberian high pressure creates colder conditions favorable for cold surges [32]. Jeong et al. [39] found that the negative AO might have influenced the intensification of early winter cold waves in 2016. During this period, South Korea experienced prolonged droughts, suggesting that the abnormal atmospheric circulation in the Northern Hemisphere might be related to the occurrence of these droughts [40,41]. The El Niño–Southern Oscillation (ENSO), despite occurring near the equator, significantly impacts climate variability. ENSO-related indices like the ONI (Oceanic Niño Index), MEI (Multivariate ENSO Index), and SOI (Southern Oscillation Index) correlate with winter precipitation over summer in a Pacific-East Asian teleconnection pattern [28].
Even during winter, the correlation between early and late winter differs. There is a significant positive correlation in early winter (mid-January to early December) precipitation in Korea, but in late winter, the correlation is significantly weakened by the presence of the Kuroshio high pressure [42]. El Niño, characterized by increased SSTs in the tropical Pacific, affects South Korea through the East Asia winter monsoon, causing anomalous weather [34]. Additionally, barometric pressure from the North Atlantic to East Asia is teleconnected via the Rossby wave pattern, and Atlantic SST variations can predict six-month cumulative precipitation and from May to October and early winter precipitation anomalies [34,43].
Many studies focused on the winter months when discussing climate change impacts. Winter snow cover conserves water resources until early spring, and an increase in minimum temperatures can alter snow cover and melt timing, leading to spring droughts. Extreme spring droughts, which affect rice cultivation, a staple in South Korea, highlight the need for winter drought prediction. Additionally, the nature of precipitation during the flood season influences the use of water resources secured from the previous year, which is heavily impacted by winter drought conditions. In South Korea, winter is the season with the least amount of rainfall, and the accumulated precipitation during winter determines the drought conditions for the following year. Therefore, predicting droughts in winter, especially in early winter such as December, is noteworthy as it provides early warnings for adverse weather conditions in the following year, helping to prevent natural disasters and mitigate damage [42,44,45,46].
Meteorological drought forecasting employs both statistical and dynamical methods [24]. Data-driven statistical models, which require minimal data and offer fast computational power, include linear regression models [31,36,47]. However, because the relationship between climate variability and drought is often nonlinear [25,34,48], various nonlinear machine learning algorithms were also used. Machine learning’s versatility in modeling nonlinear relationships makes it useful for drought monitoring and prediction [17,31,32,33,35,49,50]. Although many studies have used linear or nonlinear regression to forecast drought, when considering various climate variabilities in South Korea, further research is needed to enhance model reliability by comparing these linear and nonlinear models either independently or simultaneously.
Although large-scale climate variability is known to correlate with precipitation and temperature fluctuations in South Korea, effective drought forecasting requires the simultaneous consideration of these variables. This study aims to quantify these fluctuations using the SPEI as a meteorological drought index. Incorporating general climate variability from the three seas surrounding South Korea is crucial for identifying impactful factors and understanding their lag time, using both linear and nonlinear regression approaches, either independently or in combination as a HYBRID model. By predicting drought conditions and identifying dominant factors through these models, this study seeks to account for lag times ranging from 2 to 12 months in conjunction with large-scale climate fluctuations, treating the forecasting model as data-driven.

2. Study Area and Data

2.1. Study Area Description

The study area of this study is South Korea, which is a peninsula country located between 125° and 132° east longitude and 33°and 39° north latitude, situated in the northeastern part of the Asian continent (Figure 1a). The country has an elongated shape stretching from north to south, covering an area of 100,378 km2, and is surrounded by seas on three sides [51]. South Korea experiences a monsoon-influenced climate with four distinct seasons. During winter, the climate is dominated by continental forces, bringing cold and dry winds mainly from the north and west. In contrast, the summer season is characterized by hot and humid winds from the North Pacific Ocean, with prevailing winds from the southwest, south, and southeast. The climate in South Korea shows notable regional variations, heavily influenced by the orientation of mountain ranges running predominantly from north to south. This topography results in substantial climate differences between the northern and southern regions as well as between the eastern and western areas.

2.2. Data Description

2.2.1. Precipitation and Temperature Data

To compute the SPEI drought index, we utilized precipitation and temperature data sourced from the South Korean High-Resolution Daily Rainfall (K-Hidra) dataset, version 2020 [34,52], which features a resolution of 0.25° × 0.25°. This dataset spans 205 grids across South Korea, including Jeju Island and Dokdo, covering the period from 1973 to 2022, providing a comprehensive 50-year timeframe. K-Hidra is a dataset developed using observations from the Korea Meteorological Administration’s Automated Synoptic Observing System (ASOS), Automatic Weather System (AWS), and the World Meteorological Organization (WMO) Information System. This dataset effectively captures the spatiotemporal variability of the complex mountainous regions in the eastern and northern parts of Korea and has been evaluated to be comparable to data from the Climate Prediction Center, demonstrating its applicability [52]. Additionally, it has been appropriately utilized to assess drought occurrence in South Korea [34]. K-Hidra version 2020 originally had 458 grids across the Korean Peninsula, including South Korea and North Korea. For the reliability of observation data, the biased North Korean region must be excluded [52], and only 205 grids in the South Korean region were considered.
Based on the 50-year data, annual precipitation in South Korea ranges from 880.2 mm to 1816.5 mm, exhibiting uneven distribution and significant variation. The average annual precipitation was 1278.3 mm. Figure 1a illustrates the spatial distribution of anomalies for annual average and the 50-year average (1973–2022) precipitation by grid. Regionally, the southern part of the country receives more precipitation than the national average, while coastal areas in the west and both inland and coastal areas in the east experience below-average precipitation. Jeju Island’s anomalies range from 188.7 to 821.7 mm, indicating higher-than-average precipitation. Figure 1b,c present anomalies of the national average annual total precipitation and average annual temperature for the period 1973–2022. During the periods 1994–1997 and 2013–2017, precipitation was below average, and temperatures were above average. Particularly, the last five years (2015–2019) have seen the most severe drought in 50 years, with precipitation rates approximately 20% below the 1951–2019 average [4,53], resulting in restricted water supplies in certain areas. The temperature trend over the past 50 years has risen significantly, and temperature variability was used as important data for tracking and predicting meteorological droughts [15].

2.2.2. Climate Indices

A climate index is a value designed to characterize various aspects of ocean activity, serving to describe features of a geophysical system, such as global circulation patterns. These indices play a crucial role in monitoring phenomena like droughts, El Niño events, wind patterns, and internal climate variability, offering insights into changes across the world’s oceans, including the Pacific, Atlantic, and Indian Oceans [54,55].
The ENSO [56,57] represents a prominent climate index associated with Pacific Ocean events. It includes SST-based indices such as Nino 1 + 2, Nino 3, Nino 3.4, Nino 4, ONI, and atmosphere-based indices like SOI. The Korea Meteorological Administration (KMA) defines the onset of El Niño (La Niña) when the SST anomaly of the three-month moving average of the tropical Pacific Nino 3.4 region (5° S to 5° N, 170° W to 120° W) exceeds +0.5 °C (falls below −0.5 °C) for more than five months [58]. Indices related to the Atlantic Ocean include AMO, NAO, TNA, and TSA [59], while those linked to the Indian Ocean are DMI, SETIO, and WTIO [60].
This study incorporates a total of 10 large-scale indices as potential predictors, chosen for their influence and widespread use in capturing variability across the three oceans. These include Nino 3.4, PDO, SOI, AMO, NAO, TNA, TSA, DMI, SETIO, and WTIO. Their brief descriptions are provided in Table 1, and the climate indices’ monthly data spanning from 1972 to 2022 were collected to consider a maximum 12-month lag as a predictor variable.

3. Methodologies

3.1. Standardized Precipitation Evapotranspiration Index, SPEI

The SPEI, introduced by Vicente-Serrano et al. [15], is a sophisticated meteorological drought index that incorporates both temperature and evapotranspiration variables. It can be calculated as shown in Equation (1), which results in a probability-based value reflecting the degree of deviation from the mean state. Potential evapotranspiration (PET) can be generally computed using Thornthwaite, Hargreaves, and Penman–Monteith methods, with this study specifically utilizing the equation proposed by Thornthwaite [61], as shown in Equation (2). The SPEI follows a log-logistic distribution with three parameters, and Probability Weighted Moments, as outlined by Beguería et al. [62], are recommended for parameter estimation. The calculation of the SPEI was conducted using the “SPEI” package (https://cran.r-project.org/web/packages/SPEI, accessed on 13 November 2023) within the R(>= 3.5.0) Software environment.
D i = P i P E T i
P E T = 16 K 10 t n J a
J = n = 1 12 t n 5 1.514 = n = 1 12 0.0875 t n 1.514
where D i is the difference between precipitation and evapotranspiration, t n is the average monthly temperature, and J is Thornthwaite’s [61] thermal monthly index summed over 12 months.
SPEIs are versatile for monitoring various types of droughts across different time scales. For instance, a 1- to 6-month scale is reliable for meteorological and agricultural droughts, a 12-month scale is related to hydrological droughts, and a 24-month scale or longer impacts socioeconomic droughts [17,63]. The KMA employs the six-month SPI (SPI-6), with a value of −1.0 or less as a criterion for drought warnings, triggering the ‘attention’ stage (weak drought) when SPI-6 remains below −1.0 [13]. This study considered SPEIs at 2-month (SPEI-2), 4-month (SPEI-4), and 6-month (SPEI-6) time scales. The data period for each SPEI (SPEI-2, SPEI-4, SPEI-6) is 50 years, from January 1972 to December 2022, and a total of 600 months of data were used as monthly data. Table 2 presents the classification of drought conditions by the SPEI [15].

3.2. Model Description

3.2.1. Multiple Linear Models (MLR)

Regression analysis involves determining the output based on inputs in a model that best represents the data. A simple linear regression is used for one independent variable, while a multiple linear regression (MLR) is used when there are multiple independent variables. The general form of the MLR equation is given in Equation (4).
y = β 0 + β 1 x 1 + β 2 x 2 + β p x p + ε  
where y is the dependent variable (SPEI), x 1 ,   ,   x p are the independent variables (climate indices, the value from the table and has a lag time), β 0 ,   ,   β p are the regression coefficients or weights, p is number of independent variables, and ε is the error term.
The assumption in the MLR model is that the effect of each independent variable on the dependent variable is linear, and it is also assumed that the drought index uniformly varies with changes in climate variables while keeping the other independent variables constant. In this study, the “stats” package in R Software was used for developing the MLR model.

3.2.2. Random Forest (RF)

The Random Forest (RF) algorithm is a machine learning algorithm frequently used for regression and classification tasks [33,64,65,66,67]. The RF algorithm generates multiple decision trees (prediction models) through an ensemble learning process and classifies data by combining the prediction results of these trees. By making decisions based on the consensus of multiple models, the likelihood of making accurate predictions is enhanced, and utilizing a variety of complementary models can result in an overall improved prediction model compared to a single model [68]. This ensemble approach not only improves prediction accuracy but also enhances the robustness of the model in handling nonlinear relationships and managing large datasets with high-dimensional features, effectively reducing overfitting.
The RF algorithm uses out-of-bag (OOB) data for cross-validation. OOB data represent the information excluded during the tree-building process, and once the tree has grown, this OOB data are employed as a test set to evaluate the robustness of the model. The increase in prediction error, when the OOB data for a variable are permuted, is a measure of the “importance of the variable”. In this study, the “randomForest” package in R (https://cran.r-project.org/web/packages/randomForest, accessed on 13 November 2023) was used for developing the RF model, with the parameters set to default values: 500 trees (ntree = 500) and 3 predictors (mtry = 3). It is noteworthy that variable importance was determined using the “importance” function.

3.2.3. Variable Selection and Validation

Given the variation in drought conditions across 205 grids in South Korea and the differing impact of certain climate factors, the SPEI forecast model is constructed on a grid-by-grid basis. The dependent variable is derived from precipitation and temperature data at these grids from 1973 to 2022. Independent variables are climate indices with lag times ranging from 2 to 12 months. Each model includes 110 independent variables (10 indices multiplied by 11 months) for each grid. Including a large number of predictor variables in a model can lead to increased computation time and overfitting due to the curse of dimensionality. Therefore, during the training process, variables with the smallest relative importance (RI) values should be removed to optimize the model.
The methods for selecting variables differ between MLR and RF models. For the MLR model, we first calculated the Pearson’s correlation coefficient (PCC) between the SPEI and each of the 10 climate indices to identify the lag time with the highest correlation, selecting 10 variables accordingly. For instance, the MLR equation for estimating SPEI-2 for one of the two-hundred and five grids is as follows:
y = N i n o 3.4 _ l a g 2 + P D O _ l a g 4 + S O I _ l a g 2 + A M O _ l a g 8 + N A O _ l a g 10 + T N A _ l a g 7 + T S A _ l a g 6 + D M I _ l a g 8 + S E T I O _ l a g 4 + W T I O _ l a g 5
Each of the 10 variables has a different lag time that correlates with the SPEI. Next, we identified multicollinearity using the Variance Inflation Factor (VIF) [31]. Multicollinearity occurs when there is a high correlation among independent variables, which can cause significant changes in the predicted values due to small changes in the data or model. In this study, a VIF threshold of 5 was set; variables with a VIF greater than 5 were considered problematic for prediction and were removed [69,70]. Finally, to generalize the model, we used k-fold cross-validation and selected the model with the lowest Root Mean Square Error (RMSE). Consequently, the final equation derived for each of the 205 grids is as follows:
y = Nino3.4_lag2 + PDO_lag4 + SOI_lag2 + NAO_lag10 + DMI_lag8
The RF model can utilize the Boruta approach [33,71] to exclude predictor variables with low importance without the need for the user to identify the variables with the highest correlation. Detailed explanations can be found in Kursa and Rudnicki [71], and the package “Boruta” (https://cran.r-project.org/web/packages/Boruta, accessed on 13 November 2023) was used. This approach repeatedly excludes variables with lower importance than random probes through statistical tests, capturing only the data relevant to the dependent variable. Unlike the MLR model, the RF model can simultaneously select multiple lag times related to Nino 3.4. For instance, Nino3.4_lag2, Nino3.4_lag3, and Nino3.4_lag4 can all be selected. Even though redundant lag time variables are included, the RF model can effectively learn interactions and nonlinear relationships between variables, making it an appropriate predictive model. The nonlinearity and robust variable selection capability of the RF model allow it to capture various characteristics of the variables, enhancing the prediction accuracy for complex systems like climate indices.
For each grid, the data were divided into two sets: training data (80%) and test data (20%), with the training data from 1983 to 2022 and test data from 1973 to 1982. The training data were used to develop the prediction model and address the overfitting problem, and the model’s performance was then evaluated using the test data.

3.2.4. HYBRID Model

To leverage the strengths of both linear and nonlinear models, a HYBRID model that integrates an autocorrelation function (ACF) and inter-model combination is introduced. The ACF plays a crucial role in time series analysis and forecasting [72]. It represents a collection of autocorrelations over time, indicating the degree of correlation between observations of the same variable at different time points. These correlations consider various components such as trends, seasonality, cycles, and residuals [73]. Utilizing the ACF allows for the incorporation of antecedent precipitation conditions and the impact of climate variability, offering an effective approach for modeling precipitation forecasts [35] and flow predictions [74] at a specific location. This approach involves using the MLR model to identify linear relationships and the RF model to capture complex interactions. The HYBRID model combines predictions from both models to enhance overall accuracy.

3.3. Model Evaluation

Model performance is evaluated using the PCC and RMSE. The PCC measures the linear association between predicted and observed values (Equation (7)), while the RMSE represents the difference between estimated and observed values (Equation (8)).
P C C = i = 1 N ( O i O ¯ ) ( S i S ¯ ) i = 1 N ( O i O ¯ ) 2 i = 1 N ( S i S ¯ ) 2
R M S E = 1 N i = 1 N ( S i 0 i ) 2
where i is the current predictor, N is the number of predictors, S i and O i are the simulated and observed values for each time step, respectively, and S ¯ and O ¯ are the average simulated and observed values.
Drought prediction is evaluated using the hit rate and Heidke skill score (HSS) based on a contingency table, which categorizes the results into four types: ‘hit (a)’ and ‘miss (c)’ if the observation is ‘drought’ and the prediction is ‘drought’ and ‘non-drought’, respectively; ‘false alarm (b)’ and ‘correlation non-event (d)’ if the observation is ‘non-drought’ and the prediction is ‘drought’ and ‘non-drought’, respectively. The hit rate (Equation (9)), also known as the POD (Probability Of Detection), measures the proportion of correctly predicted drought events, which range between 0 and 1. The HSS (Equation (10)) measures the improvement of a prediction over a random prediction. If the HSS < 0, the probability prediction is better than the model prediction; when the HSS = 0, this indicates no skill; and when the HSS = 1, this signifies a perfect prediction.
H i t   r a t e = a ( a + c )
H S S = 2 ( a d b c ) [ a + c c + d + a + b b + d ]  
where a is the number of ‘hit’ (correct prediction of droughts), b is the number of ‘false alarm’, c is the number of ‘miss’, and d is the number of ‘correlation non-event’.

4. Results

4.1. Descriptive Statistics of SPEI and Climate Indices

The analysis of the SPEI over 205 grids in South Korea revealed significant variability in drought conditions over the past 50 years. Figure 2 illustrates the distribution of the December SPEI-2 values for each year using a boxplot. SPEI-4 and SPEI-6 can be found in the Supplementary Materials (Figure S1). It is noteworthy that each box reflects the SPEI-2 distribution for each year, with the first and third quartiles as lower and upper hinges. Values below y = −1 are marked by a blue dotted line indicating ‘moderate drought’, as discussed earlier. Average values from 205 observation data are connected by a red solid line, and outliers are marked with dark orange X-shapes. Over the observed period, moderate droughts (mean SPEI values below −1) occurred notably in the years 1983, 1988, 1995, 1998, 1999, and 2008, with frequent occurrences of severe dry conditions. Typically, these drought events lasted 1–2 years before transitioning to wetter conditions. These patterns are possibly linked to the variable activity of oceans [33]. The significant fluctuations in climate across different locations within the same year, as shown by the range of the boxplot crossing the y = 0 line, underscore the necessity of examining the relationship between drought conditions and climate variability on a location-specific basis.
Figure 3 represents the mean PCC results between the December SPEI-2 and staggered climate indices across all grids, with gradient colors indicating the strength of the correlation. Note that the numbers in each cell indicate the number of stations with significant correlations (p < 0.05). In general, Pacific Ocean-related indices (Nino 3.4, PDO, and SOI) showed a decrease in correlation with an increasing time lag, while Indian Ocean-related indices (DMI, SETIO, and WTIO) showed an increasing correlation over time. Specifically, Nino 3.4 exhibited a strong, time-dependent correlation with the SPEI, transitioning from negative to positive correlations from April to November, while October had the highest correlation (201 out of 205 grids showing significant correlation). This correlation confirms the influence of tropical Pacific SST anomalies on early winter precipitation in South Korea, which aligns with conclusions of previous research highlighting a 2–3 month lag between SST anomalies in the tropical Pacific and precipitation conditions for South Korea [28,42]. The Supplementary Materials include plots for SPEI-4 and SPEI-6 (Figure S2). Unlike SPEI-2, the climate indices for SPEI-4 and SPEI-6 did not show high correlations, and their relationships were different. This can be explained by the characteristics of South Korea’s climate, which is influenced by different air masses in each season. As the accumulation period extends beyond four months, the influence of specific climate indices becomes dispersed, resulting in lower correlations.

4.2. Model Performance

We developed a model for December SPEI-2 forecasts for 205 grids in South Korea. The performance metrics, including the PCC and RMSE, are depicted in Figure 4. Results for SPEI-4 and SPEI-6 can be found in the Supplementary Materials (Figures S3 and S4).
The national average values for the forecasting the performance of the MLR and RF models were as follows: For SPEI-2, the MLR model had a PCC value of 0.52 and an RMSE of 0.84, while the RF model had a PCC value of 0.28 and an RMSE of 0.92. For SPEI-4, the MLR model had a PCC value of 0.16 and an RMSE of 0.99, while the RF model had a PCC value of −0.15 and an RMSE of 1.09. For SPEI-6, the MLR model had a PCC value of 0.10 and an RMSE of 0.99, while the RF model had a PCC value of −0.18 and an RMSE of 1.09. When comparing the performance of two models, the MLR model exhibited better predictions relative to the observations, and showed higher performance for SPEI-2, SPEI-4, and SPEI-6 predictions. The variability in model suitability depended on the data characteristics. The low PCC values were due to the model not predicting the data effectively, given the limited 10 test periods. Additionally, if the variables exhibit a linear pattern, the MLR model may offer more accurate predictions, indicating a linear causal relationship at certain stations.
To compare the spatial differences between model predictions and observed values, we analyzed the SPEI for December 2015, as shown in Figure 5. In the Han River and Geum River basins, where a meteorological drought occurred in 2015 for a 6-month accumulation period (Figure 5a), the PCC values for the MLR and RF models were 0.68 and 0.80, respectively. As shown in Figure 5c, the RF model provided predictions that were closer to the observed values. The prediction results for SPEI-2 and SPEI-4 can be found in the Supplementary Materials (Figure S5), where the RF model outperformed the MLR model. During this period, the drought primarily occurred due to long-term accumulated precipitation deficits rather than short-term deficits, which is why drought occurrence was rarely observed in SPEI-2.
Figure 6 presents a graph comparing the time series of observed and predicted values. In Figure 6a, where the MLR model performed best, the PCC values during the test period were 0.88 and −0.15, respectively. In Figure 6b, where the RF model performs best, the PCC values during the test period were 0.87 and 0.80, respectively. Grids exhibiting good predictions with the MLR model may have less need for nonlinear models, while those with good predictions with the RF model can benefit from both linear and nonlinear models. The western part of the Baekdudaegan region (Figure 6b) showed superior prediction performance compared to other regions, indicating the effective prediction of drought conditions using the climate index. However, the coastal areas in Figure 6c,d were not well-estimated by the model, revealing limitations in the predictive power of both general linear and nonlinear models.
When comparing the performance of the forecasting models across different time scales, SPEI-2 performed the best, and the longer the accumulation time, the lower the predictive power. While it was initially expected that predicting frequent fluctuations in a 2-month drought period would be challenging due to the short accumulation period, it actually exhibited better estimation, which is likely attributed to its strong seasonal dependency. On the other hand, droughts in SPEI-4 and SPEI-6, with longer cumulative periods, showed limited predictive power, suggesting a lower contribution to the causal relationship with the variability of climate indices.

4.3. Evaluation of Model Skillfulness for Drought Forecasts

To assess the effectiveness of the data-driven models in forecasting meteorological drought, the hit rate and HSS have been investigated for the entire dataset. The results are presented in Figure 7, with detailed hit rate and HSS maps for SPEI-4 and SPEI-6 available in the Supplementary Materials (Figures S6 and S7).
The classification involved categorizing SPEI values as ‘−1’ or less for ‘drought’ and other values as ‘non-drought’, followed by constructing a 2 × 2 contingency table. When comparing the hit rate and HSS of the two models, the RF model demonstrated superior performance in predicting drought conditions. The average hit rates of the MLR and RF models for the 205 grids were 21.29% and 47.17%, and the average HSS values were 0.22 and 0.56, respectively.
Notably, the RF model excelled in predicting coastal drought conditions in the southern region of South Korea. Despite the lower overall prediction performance of the RF model compared to the MLR model, it effectively captured drought conditions, particularly for predictions below ‘−1’.

4.4. Model Comparison of Predictor Selection and Importance

A predictive model can reveal the relative influence of each variable, with the MLR model represented by the partial determination coefficient and the RF model by the predictor importance value. Figure 8 displays the number of independent variables included in the MLR and RF models for all grids.
Nino 3.4 stands out as the most frequently selected variable for predicting SPEI-2, chosen by 170 out of 205 grids, followed by NAO. The high frequency of ENSO-related parameters being selected implies that short-term drought conditions in South Korea are associated with El Niño and La Niña. For SPEI-4 and SPEI-6, TNA and NAO, linked to the Atlantic Ocean, were chosen more frequently, followed by ENSO-related parameters. In the RF model for SPEI-6, the predictors were unbiased and selected with low frequency, suggesting that no specific climatic index gained higher relative importance as the accumulation period of the drought index increased. The impact of climatic index fluctuations seems somewhat diluted over a 6-month accumulation period, as evidenced by the significant correlation of Nino 3 in early winter, followed by a weaker relationship in January and February [42].
The selected variables exhibited varying importance across lag times from 2 months to 12 months, as depicted in Figure 9. Across all grids and lag times, the ENSO-related group of predictors over the Pacific, including Nino 3.4, PDO, and SOI, is notable and color-coded in red. This ENSO-related group makes a substantial contribution, with a 69% selection with a 2-month lag in the MLR model and a 50% selection with a 3-month lag in the RF model for predicting SPEI-2 (Figure 9a,b). Specifically, in the MLR model, Nino 3.4 in September (lag time 3 months) contributes 46% to the variation of SPEI-2, and Nino 3.4 in August (lag time 4 months) contributes 60% to the variation of SPEI-4, indicating a relatively higher linear correlation than other variables. This observation is crucial for interpreting the causal relationship of drought variation. Regardless of the cumulative time of the drought period, the proportion of the ENSO predictor group tends to contribute preferentially within a 4-month time difference, owing to the relatively close proximity to Korea among the three oceans, and subsequently decreases as the time difference increases.
The Atlantic-related group, comprising AMO, NAO, TNA, and TSA, is color-coded in green. This group significantly contributed to drought variability, with a large percentage of 100% and 73% through MLR and RF models, respectively, with a lag of 5 months for predicting SPEI-2. Notably, in the MLR model, among the four variables in the Atlantic group, TSA and TNA contribute 79% and 21%, respectively. This indicates that drought fluctuations can be predicted by the influence of the Atlantic, independent of other variables. In SPEI-4 and SPEI-6, both MLR and RF models showed high contributions from Atlantic variables with a time lag of more than 9 months. Particularly, in the MLR model, the AMO variable in February (lag time 10) contributes 50% and 61% to the variation of SPEI-4 and SPEI-6, respectively. Temperature anomalies in parts of the Atlantic Ocean, farther away from the Pacific Ocean, contributed significantly to drought variability in South Korea with a long-term lag of more than 5 months. This suggests that it may be possible to forecast precipitation in the South Korea based on Atlantic SST over 6 months, as confirmed by Noh and Ahn [34].
The Indian Ocean-related group, consisting of DMI, SETIO, and WTIO, is color-coded in purple. Although this group was less frequently selected, and its percentage was lower compared to the Pacific and Atlantic groups in most time lags, it makes significant contributions of 68% and 55% with time lags of 11 and 12 months, respectively, to predict SPEI-4 in the RF model. This group was selected for longer lag times, indicating its influence on drought prediction with an extended time horizon.
The importance of variables varied between MLR and RF models, and there was no significant inconsistency across lag times. The group related to the Pacific Ocean contributes within a close time lag, followed by the Atlantic Ocean, and then the Indian Ocean. The variable importance value in the MLR model represented the explained variance to clarify the coefficient of determination between model predictions and observations. The performance of the model in predicting SPEI-4 and SPEI-6 was lower than that of SPEI-2, as indicated by low coefficient of determination values and low explained variance, which had a minimal impact on predicting drought variability. Therefore, it is crucial not to overinterpret the specific contribution of a particular lag time but to identify overall patterns.

4.5. HYBRID Model Results and Discussion

In the context of meteorological drought forecasting in the South Korea, while the RF methodology has demonstrated reliability in various application domains, MLR forecasting tends to outperform the RF model for several reasons. RF models might struggle to effectively integrate known relationships between response and predictor variables, particularly in extrapolation problems that involve forecasting beyond the domain covered by the training dataset [75]. This limitation becomes evident in years with extremely hot and dry weather, where temperature and precipitation values fall outside the range of the training dataset, posing challenges for accurate predictions.
To address this issue, the HYBRID model has been utilized. The HYBRID model first estimates one highly weighted variable with a linear form per grid using the MLR model and then calculates the residuals by applying the RF model with 10 climate indices. In Figure 10, for predicting the SPEI, the HYBRID model incorporating the combined ACF and climate change indices exhibits a higher PCC value, as indicated by the light blue box on the right, compared to the model that includes the 10 climate change indices in the light green box. Notably, when employing ACF to predict SPEI-6, a drought index spanning more than two seasons was challenging to estimate using only the MLR and RF models, and there is a significant enhancement in prediction skill.
While models have been developed for predicting the SPEI in December using MLR and RF models, their effectiveness is limited in certain regions, and the predictions may lack accuracy due to the constraints of data-driven models. For example, in Figure 7, the MLR model struggles to predict drought conditions across South Korea, and the RF model excels at predicting drought conditions only in coastal areas. Thus, to enhance the reliability of drought prediction by closely mimicking actual values, we sought to improve the accuracy using the HYBRID model. The December meteorological drought index (SPEI) in South Korea shows a strong auto-correlation with the SPEI of November (Figure 11). Due to the seasonal dependence of South Korea’s precipitation, it is possible to include SPEIs with lags of 2 to 3 months as predictive variables. However, including multiple variables raises concerns about overfitting. Additionally, similar to the selection process for climate index variables, where only the lag with the maximum correlation was chosen, the ACF included only the value from 1 month prior with the highest correlation. Although winter precipitation in South Korea with a 1-month lag shows a relatively lower correlation compared to summer [76], it exhibits self-similarity [77], making it suitable for use as an additional variable.
As illustrated in Figure 12, not only does the predictive accuracy of coastal areas significantly improve, but grids located inland were also well-estimated. The HYBRID model proved the capability of predicting droughts across the entire area of South Korea, with enhanced utility for coastal areas. Concerning SPEI-4 and SPEI-6, the HYBRID model demonstrated better drought prediction in coastal areas. However, the prediction skill for SPEI-4 and SPEI-6 was lower than that for SPEI-2. This was attributed to the diminishing influence of the climate index as the accumulation period increases, making it challenging to explain the causal relationship in the regression model. To address this, utilizing ACF as shown in Figure 10 can enhance prediction skill.
The range of the hit rate and HSS in the contingency table of the predicted and observed values by applying the MLR, RF, and HYBRID models is shown in Figure 13. The hit rate and HSS maps of SPEI-4 and SPEI-6 are presented in Figure S8 in the Supplementary Materials. In the case of SPEI-2, the HYBRID model exhibited a significantly better hit rate and HSS compared to the MLR and RF models when predicting drought conditions, achieving a national average of 67.37% and 0.70, respectively, and reaching 100% in some regions. The 100% prediction indicates the model’s ability to forecast drought conditions with a value of ‘−1’ or less.
The proposed HYBRID approach demonstrated that integrating the ACF with traditional models can substantially improve drought prediction accuracy. By leveraging the strengths of the ACF in capturing time-dependent correlations and the predictive power of climate indices, the HYBRID model provides a more robust and reliable tool for forecasting drought conditions, particularly for extended periods such as SPEI-6. This advancement is crucial for early warning systems and proactive disaster management, enabling better preparedness and mitigation strategies for severe weather events.

5. Conclusions

This study developed a model for forecasting droughts in South Korea by employing a data-driven approach and a multi-temporal climate variability index for sustainable water management. The evaluation of drought conditions was based on observed precipitation and temperature data, with the SPEI chosen as the reference data. The findings indicated that the Nino 3.4 index is a key factor contributing significantly to droughts in South Korea. As an ENSO-related variable in the Pacific Ocean, it can effectively explain the two-month cumulative December drought conditions in South Korea. Utilizing Nino 3.4 data from three months prior demonstrated significant contributions to the forecasting process.
The study incorporated commonly used climate variability indices from three oceans and underscored the importance of further research into global SST, atmospheric phenomena, and oscillatory patterns. Climate variability in the three oceans exhibited a staggered and delayed effect on drought conditions in South Korea, with the most significant contribution observed in the 2-month cumulative drought. When considering cumulative periods of 4 months or more, spanning multiple seasons, climate variability proved effective in providing reliable drought forecasts by incorporating the drought conditions of the preceding month.
Moreover, the HYBRID model, which integrates variables with linear patterns and applies them nonlinearly, successfully addressed the limitation of extrapolation in data-based models. This can lead to the improved estimation of drought conditions. The gridded drought forecast in this study offers the advantage of simulating unmeasured areas, and the proposed forecasting model can be used as a valuable source of information for anticipating future drought conditions with a high resolution.
Given its potential to predict meteorological drought conditions based on SST even at considerable distances, this model could be explored for predicting drought in neighboring countries. It has the potential to contribute significantly to enhancing the national drought monitoring system in South Korea by providing a supplementary basis for such predictions. This approach demonstrated that integrating the ACF with traditional models can substantially improve drought prediction accuracy. By leveraging the strengths of the ACF in capturing time-dependent correlations and the predictive power of climate indices, the HYBRID model provided a more robust and reliable tool for forecasting drought conditions, particularly for extended periods such as SPEI-6. This advancement is crucial for early warning systems and proactive disaster management, enabling better preparedness and mitigation strategies for severe weather events and sustainable water management.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su16156485/s1, Figure S1: Time series of 50 (1973–2022) year SPEI-4 and SPEI-6 of December for all K-Hidra grid across South Korea; Figure S2: Pearson’s correlation coefficients between SPEI-4 and SPEI-6 lagged large-scale climate indices based on grid for the K-Hidra across the South Korea; Figure S3: Model performance metrics for predicting December SPEI-4 using 10 lagged climate indices; Figure S4: Model performance metrics for predicting December SPEI-6 using 10 lagged climate indices; Figure S5: Comparison between observed values of SPEI and predicted model in December 2015; Figure S6: SPEI-4 model proficiency map for drought prediction; Figure S7: SPEI-6 model proficiency map for drought prediction; Figure S8: Proficiency map of a HYBRID model for predicting drought in SPEI.

Author Contributions

Conceptualization, S.N. and S.L.; methodology, S.N. and S.L.; software, S.N.; validation, S.L.; formal analysis, S.N.; investigation, S.N.; resources, S.L.; data curation, S.N.; writing—original draft preparation, S.N.; writing—review and editing, S.L.; visualization, S.N.; supervision, S.L.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Research Foundation of Korea (NRF), grant number 2021R1C1C2004896. The APC was funded by NRF.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Min, S.K.; Kwon, W.T.; Park, E.H.; Choi, Y. Spatial and Temporal Comparisons of Droughts over Korea with East Asia. Int. J. Climatol. 2003, 23, 223–233. [Google Scholar] [CrossRef]
  2. Kim, D.W.; Byun, H.R.; Choi, K.S.; Oh, S. Bin A Spatiotemporal Analysis of Historical Droughts in Korea. J. Appl. Meteorol. Climatol. 2011, 50, 1895–1912. [Google Scholar] [CrossRef]
  3. Song, Y.; Park, M. Rainfall Standard of Disaster Prediction for Agricultural Droughts in S. Korea. Appl. Sci. 2020, 10, 7423. [Google Scholar] [CrossRef]
  4. Hong, I.; Lee, J.H.; Cho, H.S. National Drought Management Framework for Drought Preparedness in Korea (Lessons from the 2014-2015 Drought). Water Policy 2016, 18, 89–106. [Google Scholar] [CrossRef]
  5. Jung, W.; Noh, S.; Kim, Y. Research on Boryeong Dam Water Supply Capacity Evaluation and Drought Response Method; ChungNam Institute: Gongju-si, Republic of Korea, 2016. (In Korean) [Google Scholar]
  6. Lim, G.; Noh, S.; Son, M.; Jung, K. Boryeong Dam Emergency Water Diversion Facility: Ensuring Operational Flexibility and Resilient Response to Climate Change. J. Korean Soc. Hazard Mitig. 2021, 21, 11–22. [Google Scholar] [CrossRef]
  7. Achite, M.; Bazrafshan, O.; Azhdari, Z.; Wałęga, A.; Krakauer, N.; Caloiero, T. Forecasting of SPI and SRI Using Multiplicative ARIMA under Climate Variability in a Mediterranean Region: Wadi Ouahrane Basin, Algeria. Climate 2022, 10, 36. [Google Scholar] [CrossRef]
  8. Keyantash, J.; Dracup, J.A. The Quantification of Drought: An Evaluation of Drought Indices. Bull. Am. Meteorol. Soc. 2002, 83, 1167–1180. [Google Scholar] [CrossRef]
  9. Svoboda, M.; LeComte, D.; Hayes, M.; Heim, R.; Gleason, K.; Angel, J.; Rippey, B.; Tinker, R.; Palecki, M.; Stooksbury, D.; et al. The Drought Monitor. Bull. Am. Meteorol. Soc. 2002, 83, 1181–1190. [Google Scholar] [CrossRef]
  10. Mckee, T.B.; Doesken, N.J.; Kleist, J. The Relationship of Drought Frequency and Duration to Time Scales. In Proceedings of the 8th Conference on Applied Climatology, Anaheim, CA, USA, 17–22 January 1993; pp. 17–22. [Google Scholar]
  11. McKee, T.B.; Doesken, N.J.; Kleistm, J. Drought Monitoring with Multiple Time Scales. Appl. Climatol. 1995, 233–236. [Google Scholar]
  12. Mo, K.C.; Lyon, B. Global Meteorological Drought Prediction Using the North American Multi-Model Ensemble. J. Hydrometeorol. 2015, 16, 1409–1424. [Google Scholar] [CrossRef]
  13. National Drought Information Portal (NDIP) Home Page. Available online: https://www.drought.go.kr (accessed on 13 November 2023).
  14. WMO. WMO Standardized Precipitation Index User Guide. No. 1090; World Meteorological Organization: Geneva, Switzerland, 2012. [Google Scholar]
  15. Vicente-Serrano, S.M.; Beguería, S.; López-Moreno, J.I. A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index. J. Clim. 2010, 23, 1696–1718. [Google Scholar] [CrossRef]
  16. Uddin, M.J.; Hu, J.; Islam, A.R.M.T.; Eibek, K.U.; Nasrin, Z.M. A Comprehensive Statistical Assessment of Drought Indices to Monitor Drought Status in Bangladesh. Arab. J. Geosci. 2020, 13, 323. [Google Scholar] [CrossRef]
  17. Tian, Y.; Xu, Y.P.; Wang, G. Agricultural Drought Prediction Using Climate Indices Based on Support Vector Regression in Xiangjiang River Basin. Sci. Total Environ. 2018, 622–623, 710–720. [Google Scholar] [CrossRef]
  18. Zhao, L.; Wu, J.; Fang, J. Robust Response of Streamflow Drought to Different Timescales of Meteorological Drought in Xiangjiang River Basin of China. Adv. Meteorol. 2016, 2016, 1634787. [Google Scholar] [CrossRef]
  19. Tirivarombo, S.; Osupile, D.; Eliasson, P. Drought Monitoring and Analysis: Standardised Precipitation Evapotranspiration Index (SPEI) and Standardised Precipitation Index (SPI). Phys. Chem. Earth 2018, 106, 1–10. [Google Scholar] [CrossRef]
  20. Liu, C.; Yang, C.; Yang, Q.; Wang, J. Spatiotemporal Drought Analysis by the Standardized Precipitation Index (SPI) and Standardized Precipitation Evapotranspiration Index (SPEI) in Sichuan Province, China. Sci. Rep. 2021, 11, 1280. [Google Scholar] [CrossRef] [PubMed]
  21. Rhee, J.; Im, J. Meteorological Drought Forecasting for Ungauged Areas Based on Machine Learning: Using Long-Range Climate Forecast and Remote Sensing Data. Agric. For. Meteorol. 2017, 237–238, 105–122. [Google Scholar] [CrossRef]
  22. Araneda-Cabrera, R.J.; Bermudez, M.; Puertas, J. Revealing the Spatio-Temporal Characteristics of Drought in Mozambique and Their Relationship with Large-Scale Climate Variability. J. Hydrol. Reg. Stud. 2021, 38, 100938. [Google Scholar] [CrossRef]
  23. Moazzam, M.F.U.; Rahman, G.; Munawar, S.; Farid, N.; Lee, B.G. Spatiotemporal Rainfall Variability and Drought Assessment during Past Five Decades in South Korea Using SPI and SPEI. Atmosphere 2022, 13, 292. [Google Scholar] [CrossRef]
  24. Hao, Z.; Singh, V.P.; Xia, Y. Seasonal Drought Prediction: Advances, Challenges, and Future Prospects. Rev. Geophys. 2018, 56, 108–141. [Google Scholar] [CrossRef]
  25. Namias, J. Spring and Summer 1988 Drought over the Contiguous United States—Causes and Prediction. J. Clim. 1991, 4, 54–65. [Google Scholar] [CrossRef]
  26. Sehgal, V.; Sridhar, V. Effect of Hydroclimatological Teleconnections on the Watershed-Scale Drought Predictability in the Southeastern United States. Int. J. Climatol. 2018, 38, e1139–e1157. [Google Scholar] [CrossRef]
  27. Forootan, E.; Khaki, M.; Schumacher, M.; Wulfmeyer, V.; Mehrnegar, N.; van Dijk, A.I.J.M.; Brocca, L.; Farzaneh, S.; Akinluyi, F.; Ramillien, G.; et al. Understanding the Global Hydrological Droughts of 2003–2016 and Their Relationships with Teleconnections. Sci. Total Environ. 2019, 650, 2587–2604. [Google Scholar] [CrossRef] [PubMed]
  28. Lee, J.H.; Ramirez, J.A.; Kim, T.W.; Julien, P.Y. Variability, Teleconnection, and Predictability of Korean Precipitation in Relation to Large Scale Climate Indices. J. Hydrol. 2019, 568, 12–25. [Google Scholar] [CrossRef]
  29. Singh, J.; Ashfaq, M.; Skinner, C.B.; Anderson, W.B.; Mishra, V.; Singh, D. Enhanced Risk of Concurrent Regional Droughts with Increased ENSO Variability and Warming. Nat. Clim. Chang. 2022, 12, 163–170. [Google Scholar] [CrossRef]
  30. Nguyen, T.T.H.; Li, M.H.; Vu, T.M.; Chen, P.Y. Multiple Drought Indices and Their Teleconnections with ENSO in Various Spatiotemporal Scales over the Mekong River Basin. Sci. Total Environ. 2023, 854, 158589. [Google Scholar] [CrossRef] [PubMed]
  31. Mekanik, F.; Imteaz, M.A.; Gato-Trinidad, S.; Elmahdi, A. Multiple Regression and Artificial Neural Network for Long-Term Rainfall Forecasting Using Large Scale Climate Modes. J. Hydrol. 2013, 503, 11–21. [Google Scholar] [CrossRef]
  32. Seibert, M.; Merz, B.; Apel, H. Seasonal Forecasting of Hydrological Drought in the Limpopo Basin: A Comparison of Statistical Methods. Hydrol. Earth Syst. Sci. 2017, 21, 1611–1629. [Google Scholar] [CrossRef]
  33. Feng, P.; Wang, B.; Luo, J.J.; Liu, D.L.; Waters, C.; Ji, F.; Ruan, H.; Xiao, D.; Shi, L.; Yu, Q. Using Large-Scale Climate Drivers to Forecast Meteorological Drought Condition in Growing Season across the Australian Wheatbelt. Sci. Total Environ. 2020, 724, 138162. [Google Scholar] [CrossRef] [PubMed]
  34. Noh, G.H.; Ahn, K.H. Long-Lead Predictions of Early Winter Precipitation over South Korea Using a SST Anomaly Pattern in the North Atlantic Ocean. Clim. Dyn. 2022, 58, 3455–3469. [Google Scholar] [CrossRef]
  35. Lee, J.; Kim, C.G.; Lee, J.E.; Kim, N.W.; Kim, H. Basin-Scale Monthly Rainfall Forecasts with a Data-Driven Model Using Lagged Global Climate Indices and Future Predicted Rainfall of an Adjacent Basin. Int. J. Climatol. 2023, 43, 3139–3158. [Google Scholar] [CrossRef]
  36. Esha, R.; Imteaz, M.A. Pioneer Use of Gene Expression Programming for Predicting Seasonal Streamflow in Australia Using Large Scale Climate Drivers. Ecohydrology 2020, 13, e2242. [Google Scholar] [CrossRef]
  37. Kim, M.; Kim, Y.; Lee, W. Seasonal Prediction of Korean Regional Climate from Preceding Large-Scale Climate Indices. Int. J. Climatol. 2007, 27, 925–934. [Google Scholar] [CrossRef]
  38. Cha, S.; Jeong, J.; Lee, K.; Lim, Y.-J.; Kim, G. Drought Index Forecast Using an Additive Model and the Double Penalty Approach. J. Korean Soc. Hazard Mitig. 2017, 17, 53–62. [Google Scholar] [CrossRef]
  39. Jeong, J.-H.; Park, T.-W.; Choi, J.-H.; Son, S.-W.; Song, K.; Kug, J.-S.; Kim, B.-M.; Kim, H.; Yim, S.-Y. Assessment of Climate Variability over East Asia-Korea for 2015/16 Winter. Atmosphere 2016, 26, 337–345. [Google Scholar] [CrossRef]
  40. Kim, S.; Park, C.-K.; Kim, M.-K. The Regime Shift of the Northern Hemispheric Circulation Responsible for the Spring Drought in Korea. J. Korean Meteorol. Soc. 2005, 41, 571–585. [Google Scholar]
  41. Sohn, S.J.; Ahn, J.B.; Tam, C.Y. Six Month-Lead Downscaling Prediction of Winter to Spring Drought in South Korea Based on a Multimodel Ensemble. Geophys. Res. Lett. 2013, 40, 579–583. [Google Scholar] [CrossRef]
  42. Son, H.Y.; Park, J.Y.; Kug, J.S.; Yoo, J.; Kim, C.H. Winter Precipitation Variability over Korean Peninsula Associated with ENSO. Clim. Dyn. 2014, 42, 3171–3186. [Google Scholar] [CrossRef]
  43. Myoung, B.; Rhee, J. Long-Lead Predictions of Warm Season Droughts in South Korea Using North Atlantic SST. J. Clim. 2020, 33, 4659–4677. [Google Scholar] [CrossRef]
  44. Sohn, S.J.; Tam, C.Y. Long-Lead Station-Scale Prediction of Hydrological Droughts in South Korea Based on Bivariate Pattern-Based Downscaling. Clim. Dyn. 2016, 46, 3305–3321. [Google Scholar] [CrossRef]
  45. Han, B.; Lim, Y.; Kim, H.; Son, S. Development and Evaluation of Statistical Prediction Model of Monthly-Mean Winter Surface Air Temperature in Korea. Atmosphere 2018, 28, 153–162. [Google Scholar] [CrossRef]
  46. Park, C.K.; Ho, C.H.; Park, D.S.R.; Park, T.W.; Kim, J. Interannual Variations of Spring Drought-Prone Conditions over Three Subregions of East Asia and Associated Large-Scale Circulations. Theor. Appl. Climatol. 2020, 142, 1117–1131. [Google Scholar] [CrossRef]
  47. Tigkas, D.; Vangelis, H.; Tsakiris, G. Drought and Climatic Change Impact on Streamflow in Small Watersheds. Sci. Total Environ. 2012, 440, 33–41. [Google Scholar] [CrossRef] [PubMed]
  48. Jiménez-Esteve, B.; Domeisen, D.I.V. Nonlinearity in the North Pacific Atmospheric Response to a Linear ENSO Forcing. Geophys. Res. Lett. 2019, 46, 2271–2281. [Google Scholar] [CrossRef]
  49. Gong, D.Y.; Wang, S.W.; Zhu, J.H. East Asian Winter Monsoon and Arctic Oscillation. Geophys. Res. Lett. 2001, 28, 2073–2076. [Google Scholar] [CrossRef]
  50. Jehanzaib, M.; Shah, S.A.; Yoo, J.; Kim, T.W. Investigating the Impacts of Climate Change and Human Activities on Hydrological Drought Using Non-Stationary Approaches. J. Hydrol. 2020, 588, 125052. [Google Scholar] [CrossRef]
  51. NGII National Drought Information Portal. Available online: https://www.ngii.go.kr/kor/main.do (accessed on 17 January 2024).
  52. Noh, G.H.; Ahn, K.H. New Gridded Rainfall Dataset over the Korean Peninsula: Gap Infilling, Reconstruction, and Validation. Int. J. Climatol. 2022, 42, 435–452. [Google Scholar] [CrossRef]
  53. WMO. The Global Climate in 2015–2019; Deutscher Wetterdienst: Offenbach, Germany, 2019. [Google Scholar]
  54. Schneider, D.P.; Deser, C.; Fasullo, J.; Trenberth, K.E. Climate Data Guide Spurs Discovery and Understanding. Eos, Trans. Am. Geophys. Union 2013, 94, 121–122. [Google Scholar] [CrossRef]
  55. NCAR National Center for Atmospheric Research. Available online: https://ncar.ucar.edu/ (accessed on 18 December 2023).
  56. Walker, G.T. Correlation in Seasonal Variations of Weather—A Further Study of World Weather. Mon. Weather Rev. 1925, 53, 252–254. [Google Scholar] [CrossRef]
  57. Bjerknes, J. Survey of El Nino 1957–58 in Its Relation to Tropical Pacific Meteorology Item Type Article. Inter-Am. Trop. Tuna Comm. Bull. 1966, 12, 1–62. [Google Scholar]
  58. Kug, J.; An, S.; Yeh, S.; Ham, Y. A White Paper on El Nino 2016; Korea Meteorological Administration (KMA): Seoul, Republic of Korea, 2017. (In Korean) [Google Scholar]
  59. Enfield, D.B.; Mestas-Nuñez, A.M.; Mayer, D.A.; Cid-Serrano, L. How Ubiquitous Is the Dipole Relationship in Tropical Atlantic Sea Surface Temperatures? J. Geophys. Res. Ocean. 1999, 104, 7841–7848. [Google Scholar] [CrossRef]
  60. Saji, N.H.; Goswami, B.N.; Vinayachandran, P.N.; Yamagata, T. A Dipole Mode in the Tropical Indian Ocean. Nature 1999, 401, 360–363. [Google Scholar] [CrossRef] [PubMed]
  61. Thornthwaite, C.W. An Approach toward a Rational Classification of Climate. Geogr. Rev. 1948, 38, 55. [Google Scholar] [CrossRef]
  62. Beguería, S.; Vicente-Serrano, S.M.; Reig, F.; Latorre, B. Standardized Precipitation Evapotranspiration Index (SPEI) Revisited: Parameter Fitting, Evapotranspiration Models, Tools, Datasets and Drought Monitoring. Int. J. Climatol. 2014, 34, 3001–3023. [Google Scholar] [CrossRef]
  63. Potop, V.; Boroneanţ, C.; Možný, M.; Štěpánek, P.; Skalák, P. Observed Spatiotemporal Characteristics of Drought on Various Time Scales over the Czech Republic. Theor. Appl. Climatol. 2014, 115, 563–581. [Google Scholar] [CrossRef]
  64. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  65. Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  66. Chen, W.; Li, Y.; Xue, W.; Shahabi, H.; Li, S.; Hong, H.; Wang, X.; Bian, H.; Zhang, S.; Pradhan, B.; et al. Modeling Flood Susceptibility Using Data-Driven Approaches of Naïve Bayes Tree, Alternating Decision Tree, and Random Forest Methods. Sci. Total Environ. 2020, 701, 134979. [Google Scholar] [CrossRef] [PubMed]
  67. Rahmati, O.; Falah, F.; Dayal, K.S.; Deo, R.C.; Mohammadi, F.; Biggs, T.; Moghaddam, D.D.; Naghibi, S.A.; Bui, D.T. Machine Learning Approaches for Spatial Modeling of Agricultural Droughts in the South-East Region of Queensland Australia. Sci. Total Environ. 2020, 699, 134230. [Google Scholar] [CrossRef] [PubMed]
  68. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer Series in Statistics; Springer New York: New York, NY, USA, 2009; ISBN 978-0-387-84857-0. [Google Scholar]
  69. Hair, J.F.J.; Anderson, R.E.; Tatham, R.L.; Black, W.C. Multivariate Data Analysis, 3rd ed.; Prentice Hall: New York, NY, USA, 1995. [Google Scholar]
  70. Hair, J.F.; Risher, J.J.; Sarstedt, M.; Ringle, C.M. When to Use and How to Report the Results of PLS-SEM. Eur. Bus. Rev. 2019, 31, 2–24. [Google Scholar] [CrossRef]
  71. Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
  72. Parmar, A.; Mistree, K.; Sompura, M. Machine Learning Techniques For Rainfall Prediction: A Review. In Proceedings of the 2017 International Conference on Innovations in information Embedded and Communication Systems, Coimbatore, India, 17–18 March 2017. [Google Scholar]
  73. Ridwan, W.M.; Sapitang, M.; Aziz, A.; Kushiar, K.F.; Ahmed, A.N.; El-Shafie, A. Rainfall Forecasting Model Using Machine Learning Methods: Case Study Terengganu, Malaysia. Ain. Shams. Eng. J. 2021, 12, 1651–1663. [Google Scholar] [CrossRef]
  74. Van Dijk, A.I.J.M.; Peña-Arancibia, J.L.; Wood, E.F.; Sheffield, J.; Beck, H.E. Global Analysis of Seasonal Streamflow Predictability Using an Ensemble Prediction System and Observations from 6192 Small Catchments Worldwide. Water Resour. Res. 2013, 49, 2729–2746. [Google Scholar] [CrossRef]
  75. Zhang, H.; Nettleton, D.; Zhu, Z. Regression-Enhanced Random Forests. arXiv 2019, arXiv:1904.10416. [Google Scholar]
  76. Azam, M.; Maeng, S.J.; Kim, H.S.; Lee, S.W.; Lee, J.E. Spatial and Temporal Trend Analysis of Precipitation and Drought in South Korea. Water 2018, 10, 765. [Google Scholar] [CrossRef]
  77. Kim, J.S.; Seo, G.S.; Jang, H.W.; Lee, J.H. Correlation Analysis between Korean Spring Drought and Large-Scale Teleconnection Patterns for Drought Forecasting. KSCE J. Civ. Eng. 2017, 21, 458–466. [Google Scholar] [CrossRef]
Figure 1. General illustration of (a) study area and spatial distribution of average annual precipitation anomalies and temporal anomalies and trend of (b) the precipitation and (c) the temperature in South Korea during the period 1973–2022.
Figure 1. General illustration of (a) study area and spatial distribution of average annual precipitation anomalies and temporal anomalies and trend of (b) the precipitation and (c) the temperature in South Korea during the period 1973–2022.
Sustainability 16 06485 g001
Figure 2. Time series of 50 years (1973–2022) of December SPEI-2 for all K-Hidra grids across South Korea. The box for each year reflects the distribution of SPEI values all station. The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). Values below y = −1 indicate ‘moderate drought’ and are demarcated by a blue dotted line. The average values from 205 observation stations are connected with a red solid line. Outliers are dark orange, X-shaped.
Figure 2. Time series of 50 years (1973–2022) of December SPEI-2 for all K-Hidra grids across South Korea. The box for each year reflects the distribution of SPEI values all station. The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). Values below y = −1 indicate ‘moderate drought’ and are demarcated by a blue dotted line. The average values from 205 observation stations are connected with a red solid line. Outliers are dark orange, X-shaped.
Sustainability 16 06485 g002
Figure 3. Pearson’s correlation coefficients between SPEI-2 and lagged large-scale climate indices for K-Hidra grid data across South Korea.
Figure 3. Pearson’s correlation coefficients between SPEI-2 and lagged large-scale climate indices for K-Hidra grid data across South Korea.
Sustainability 16 06485 g003
Figure 4. Model performance metrics for predicting December SPEI-2 using 10 lagged climate indices. (a,b) display PCC values, while (c,d) show RMSE values. The left panels represent the MLR model, and the right panels represent the RF model. Redder colors indicate better performance.
Figure 4. Model performance metrics for predicting December SPEI-2 using 10 lagged climate indices. (a,b) display PCC values, while (c,d) show RMSE values. The left panels represent the MLR model, and the right panels represent the RF model. Redder colors indicate better performance.
Sustainability 16 06485 g004
Figure 5. Comparison between observed values of SPEI-2 and predicted model in December 2015. (a) is the observed value, (b) is the predicted value of the MLR model, and (c) is the predicted value of the RF model.
Figure 5. Comparison between observed values of SPEI-2 and predicted model in December 2015. (a) is the observed value, (b) is the predicted value of the MLR model, and (c) is the predicted value of the RF model.
Sustainability 16 06485 g005
Figure 6. Time series comparison of MLR and RF models for SPEI-2 prediction at grids. Note that observed SPEI-2 values are in gray, MLR predictions in blue, and RF predictions in red. The time series presented are the results of the MLR (blue) and RF (red) models. The observed SPEI-2 is shown in gray. (a,c) are time series comparisons on the best and worst skill between MLR model predictions and observations, and (b,d) are best and worst skill for RF respectively.
Figure 6. Time series comparison of MLR and RF models for SPEI-2 prediction at grids. Note that observed SPEI-2 values are in gray, MLR predictions in blue, and RF predictions in red. The time series presented are the results of the MLR (blue) and RF (red) models. The observed SPEI-2 is shown in gray. (a,c) are time series comparisons on the best and worst skill between MLR model predictions and observations, and (b,d) are best and worst skill for RF respectively.
Sustainability 16 06485 g006
Figure 7. Model proficiency map for predicting December SPEI-2 drought conditions. Panels (a,b) show hit rates, while panels (c,d) present HSS values. The left panels represent the MLR model, and the right panels represent the RF model. Redder colors indicate better performance in both hit rates and HSS.
Figure 7. Model proficiency map for predicting December SPEI-2 drought conditions. Panels (a,b) show hit rates, while panels (c,d) present HSS values. The left panels represent the MLR model, and the right panels represent the RF model. Redder colors indicate better performance in both hit rates and HSS.
Sustainability 16 06485 g007
Figure 8. Predictor count of selection out of 10 variables across 205 grids for MLR and RF models.
Figure 8. Predictor count of selection out of 10 variables across 205 grids for MLR and RF models.
Sustainability 16 06485 g008
Figure 9. Predictor proportion of selection out of 110 possible selections (11 lag times for 10 variables). Predictors include predefined teleconnection indices, color-coded by ocean group: red for Pacific, green for Atlantic, and purple for Indian Ocean.
Figure 9. Predictor proportion of selection out of 110 possible selections (11 lag times for 10 variables). Predictors include predefined teleconnection indices, color-coded by ocean group: red for Pacific, green for Atlantic, and purple for Indian Ocean.
Sustainability 16 06485 g009
Figure 10. Boxplots comparing the skill of MLR and RF models for predicting SPEI-2, SPEI-4, and SPEI-6. The upper panel shows the PCC, and the lower panel shows the RMSE. The bars are colored according to the model: MLR (green), RF (light blue). C.I. stands for climate index, a model using 10 climate indices, and C.I + ACF stands for autocorrelation function plus climate index, a model including 10 climate indices, and the SPEI’s one-month prior autocorrelation variable.
Figure 10. Boxplots comparing the skill of MLR and RF models for predicting SPEI-2, SPEI-4, and SPEI-6. The upper panel shows the PCC, and the lower panel shows the RMSE. The bars are colored according to the model: MLR (green), RF (light blue). C.I. stands for climate index, a model using 10 climate indices, and C.I + ACF stands for autocorrelation function plus climate index, a model including 10 climate indices, and the SPEI’s one-month prior autocorrelation variable.
Sustainability 16 06485 g010
Figure 11. Barplots of the ACF of December’s SPEI with lagged SPEIs occurring 1 to 11 months ahead. lag1 is the correlation between December and November of the SPEI value of that year, and lag2 is the correlation of December and October of the SPEI value of that year. lag11 is December and January of the SPEI of that year.
Figure 11. Barplots of the ACF of December’s SPEI with lagged SPEIs occurring 1 to 11 months ahead. lag1 is the correlation between December and November of the SPEI value of that year, and lag2 is the correlation of December and October of the SPEI value of that year. lag11 is December and January of the SPEI of that year.
Sustainability 16 06485 g011
Figure 12. Proficiency map of the HYBRID model for predicting drought in SPEI-2 (December). The left map (a) shows hit rate percentage, and the right map (b) shows HSS. The redder the map color for hit rates and HSS, the better the performance.
Figure 12. Proficiency map of the HYBRID model for predicting drought in SPEI-2 (December). The left map (a) shows hit rate percentage, and the right map (b) shows HSS. The redder the map color for hit rates and HSS, the better the performance.
Sustainability 16 06485 g012
Figure 13. Boxplots comparing the proficiency of MLR, RF, and HYBRID models for predicting SPEI-2, SPEI-4, and SPEI-6. The upper panel shows hit rate percentage box plots, and the lower panel shows Heidke skill score (HSS). Bars are colored according to the model: MLR (green), RF (light blue), HYBRID (light red).
Figure 13. Boxplots comparing the proficiency of MLR, RF, and HYBRID models for predicting SPEI-2, SPEI-4, and SPEI-6. The upper panel shows hit rate percentage box plots, and the lower panel shows Heidke skill score (HSS). Bars are colored according to the model: MLR (green), RF (light blue), HYBRID (light red).
Sustainability 16 06485 g013
Table 1. List of the selected 10 climate indices for the analysis in this study.
Table 1. List of the selected 10 climate indices for the analysis in this study.
OriginAbbreviationClimate IndicesClimatology PeriodDatasetCalculationData Source
PacificNino 3.4East Central Tropical Pacific SST1991–2020ERSSTv5Mean SST over the Nino 3.4 region (5° N–5° S, 120° W–170° W)https://psl.noaa.gov/data/correlation/nina34.anom.data (accessed on 13 November 2023)
PDOPacific Decadal Oscillation1991–2020ERSSTv5Leading pattern (EOF) of SST anomalies in the North Pacific basin (typically, polewards of 20° N)https://www.ncei.noaa.gov/pub/data/cmb/ersst/v5/index/ersst.v5.pdo.dat
(accessed on 13 November 2023)
SOISouthern Oscillation Index1981–2010NCARstandardized based on the observed sea level pressure differences between Tahiti (18° S, 150° W) and Darwin (10° S, 130° E)https://psl.noaa.gov/data/correlation/soi.data
(accessed on 13 November 2023)
Atlantic AMOAtlantic Multidecadal Oscillation1901–1970ERSSTv5SST Anomaly for the North Atlantic region (0–80° N) with a period of about 60–80 years.https://www1.ncdc.noaa.gov/pub/data/cmb/ersst/v5/index/ersst.v5.amo.dat
(accessed on 13 November 2023)
NAONorth Atlantic Oscillation1981–2010NCARThe difference in sea level pressure between Ponta Delgada, Azores (38° N, 26° W) and Akureyri, Iceland (66° N, 18° W)https://psl.noaa.gov/data/correlation/nao.data
(accessed on 13 November 2023)
TNATropical Northern Atlantic Index1971–2000HadISST, OISSTv2Anomaly of the average of the monthly SST from 5.5° N–23.5° N and 15° W–57.5° Whttps://psl.noaa.gov/data/correlation/tna.data
(accessed on 13 November 2023)
TSATropical Southern Atlantic Index1971–2000HadISST, OISSTv2Anomaly of the average of the monthly SST from Equator –20° S and 10° E–30° Whttps://psl.noaa.gov/data/correlation/tsa.data
(accessed on 13 November 2023)
IndianDMIDipole Mode Index1981–2010HadISST1.1The difference of the WTIO and SETIOhttps://psl.noaa.gov/gcos_wgsp/Timeseries/Data/dmi.had.long.data
(accessed on 13 November 2023)
SETIOSoutheastern Tropical Indian Ocean1981–2010HadISST1.1SST anomalies in the box 90° E–110° E and 10° S–0°https://psl.noaa.gov/gcos_wgsp/Timeseries/Data/dmieast.had.long.data
(accessed on 13 November 2023)
WTIOWestern Tropical Indian Ocean1981–2010HadISST1.1SST anomalies in the box 50° E–70° E and 10° S–10° Nhttps://psl.noaa.gov/gcos_wgsp/Timeseries/Data/dmiwest.had.long.data
(accessed on 13 November 2023)
Abbreviations: HadISST; Hadley Centre Global Sea Ice and Sea Surface Temperature, NCAR; National Center for Atmospheric Research, JRA-55; The Japanese 55-year Reanalysis, ERSST; The Extended Reconstructed Sea Surface Temperature.
Table 2. Standardized Precipitation Evapotranspiration Index classification [15].
Table 2. Standardized Precipitation Evapotranspiration Index classification [15].
SPEI ValuesDrought Category
≥2.0Extreme wet
1.50 to 1.99Severe wet
1.00 to 1.49Moderate wet
−0.99 to 0.99Normal
−1.49 to −1.0Moderate drought
−1.99 to −1.50Severe drought
≤−2.0Extreme drought
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Noh, S.; Lee, S. Forecasting Meteorological Drought Conditions in South Korea Using a Data-Driven Model with Lagged Global Climate Variability. Sustainability 2024, 16, 6485. https://doi.org/10.3390/su16156485

AMA Style

Noh S, Lee S. Forecasting Meteorological Drought Conditions in South Korea Using a Data-Driven Model with Lagged Global Climate Variability. Sustainability. 2024; 16(15):6485. https://doi.org/10.3390/su16156485

Chicago/Turabian Style

Noh, Seonhui, and Seungyub Lee. 2024. "Forecasting Meteorological Drought Conditions in South Korea Using a Data-Driven Model with Lagged Global Climate Variability" Sustainability 16, no. 15: 6485. https://doi.org/10.3390/su16156485

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop