Conditional Extreme Values Theory and Tail-related Risk Measures: Evidence from Latin American Stock Markets

The purpose of this work is to extend McNeil and Frey ́s (2000) methodology by combining two component GARCH models and extreme value theory to evaluate the performance of the value at risk (VaR) and expected shortfall (ES) measures in the Latin American stock markets. In-sample analysis, the results of the backtesting indicate that there is no a model that predominates to the others in the estimation of VaR at any confidence level. However, the P-values of the Kupiec test confirm the out-of-sample predictive ability of the CGARCH-EVT models to estimate the VaR for long and short financial positions from Argentina and Mexico, although their performance is insufficient to provide accurate estimates of the ES. The modeling of fat tails, asymmetry and long memory have important implications for risk management, and hedging strategies in volatile stock markets.


INTRODUCTION
Value-at-risk (VaR) methodology has been adopted as one of the main paradigms to measure market-risk in the financial industry. According to Jorion (2007), VaR is determined by the quantile of the gains and losses distribution of a financial position, and is defined as the maximum possible loss over a time horizon, given a probability. Since VaR was adopted as a regulatory risk measurement tool for commercial banks, several approaches were developed to estimate it. Among the most prominent, are the parametric methods based on the assumption of conditional normality and the non-parametric models represented by the method of historical simulation (HS). The literature has shown that conditional volatility models, such as GARCH models, improve the predictions of VaR estimates capturing changing market conditions. However, the assumption of normality underestimates the true market risk because it ignores the wide tails and leptokurtosis caused by extreme events in financial timeseries (Duffie and Pan, 1997;Vlaar, 2000;Su and Hung, 2011).
The empirical evidence has supported the use of GARCH models based on t-Student and generalized errors distributions (GED) for the estimation of VaR in stock markets and currency markets (Huang and Lin, 2004;Bams et al., 2005;Ané, 2006;So and Yu, 2006;Angelidis et al., 2004;Marcucci, 2005;Su and Knowles, 2006). However, GED is incapable to capture the asymmetry in financial returns, providing biased estimates of VaR (Brooks and Persand, 2003). Giot and Laurent (2003) and Diamandis et al. (2011) incorporate the innovations' asymmetric distributional assumption in the returns of industrialized countries and emerging equity markets, using the skewed t-Student distribution, and achieve a better predictive performance of VaR, relative to symmetric distributions.
In contrast to the inconsistencies present in conventional VaR measures to capture the magnitude and likelihood of extreme returns, extreme value theory (EVT) provides a set of robust tools for modeling the behavior of extreme and catastrophic outcomesthat fall in the tails of empirical distributions. Numerous studies have demonstrated the potential of EVT to estimate the extreme quantiles of the distribution of returns in the stock and currency exchange markets using the technique of maximum blocks and generalized extreme values distributions (GEVD) De Jesús et al., 2013. The parenthesis should read as follows: (Longin, 2000;Ho et al., 2000;Da Silva and de Melo Mendes, 2003;Gilli and Kellezi, 2006;Cotter, 2006;Aggarwal and Qi, 2009;De Jesús and Ortiz, 2011;De Jesús et al., 2013). The technique of peaks over thresholds (POT) represents another alternative that allows the treatment of extreme risk in the tails of a distribution; this approach's cornerstone is the generalized Pareto distribution (GPD). Studies that have applied GPD to industrialized and emerging countries equity markets, have demonstrated that VaR models based on the unconditional EVT outperform conventional parametric models in the estimation of VaR, particularly in the extreme quantiles (Gencay and Selcuk 2004;and Mutu et al., 2011). In contrast, Kittiakarasakun and Tse (2011) provide empirical evidence that VaR-GARCH models achieve a greater predictive performance than the EVT-static VaR models in the stock markets of Asia. The main disadvantage of the unconditional EVT is that it does not capture the impact of unexpected changes in market conditions on the estimation of VaR, besides the assumption that returns are identically and independently distributed (i.i.d.), which is not valid in modern financial markets.
To overcome that problem, MacNeil and Frey (2000) propose the use of conditional EVT (CEVT) to estimate VaR in two stages. The first stage consists of the adjustment of a GARCH model to estimate the conditional volatility and filter the series of returns to obtain standardized series of i.i.d. residuals. In the second stage, CEVT is applied to the distribution of standardized residuals to capture the heteroscedasticity in extreme returnsthat is caused by stochastic volatility. In the context of end-to-end market risk management, the work of Bystrom (2004) confirms the potential of CEVT to estimate losses in the United States' and Sweden's stock markets during periods of relative calm and extreme volatility. Fernandez (2005) and Cotter (2007) conclude that the estimates of quantiles based on CEVT provide better predictions of risk for the stock market indices of Latin America, Asia and Europe. Ghorbel and Trabelsi (2008) evaluate the predictive performance of several parametric and non-parametric VaR models, and find that TVE-GARCH models produce better VaR estimates than traditional models for the indices of the stock markets of Paris and Tunisia.
More recently, Dimitrakopoulos et al. (2010) estimate a number of VaR models under different market conditions. Despite the fact that the approaches selected are CEVT and HS filtered, these risk measures tend to overestimate and underestimate the VaR in the portfolios of emerging markets and industrialized countries. However, Karmakar (2013) estimates the VaR and expected shortfall (ES) under EVT-GARCH with t-Student innovations for short and long positions in the index of the Bombay Stock Exchange, and his results reveal strong stability of the VaR AQ1 estimates and conditional ES on high levels of confidence. Other studies that extend EVT to asymmetric GARCH structures include Furió and Climent (2014). Their findings show the superiority of the CEVT based VaR models to estimate market risk of in-and out-ofsample returns ofthe S&P 500, FTSE 100 and NIKKEI 225 than traditional GARCH Models with normal and t-student innovations. Also, Karmakar and Shukla (2015) conduct a comparative analysis of the predictive power of different VaR approaches that empirically confirm the superiority of CEVT on the modeling of the tails of the distributions, and in the estimation and prediction VaR for six stock markets in Asia, Europe and the United States.
The bulk of the literature on the measurement of CEVT based VaR has primarily focused on industrialized countries' stock markets, especially the United States and Europe; thus, research on the estimation of quantile extremes in emerging economies equity markets is still very limited. In particular, it is well known that Latin American stock markets are fragile and have different structural features relative to more liquid and efficient developed equity markets, which makes the former experience higher volatility as well as more volatility persistence in both the short-and the long-run. This work applies CEVT to the stock market indices of Argentina, Brazil, Chile, Colombia, Mexico and Peru, analyzes the asymptotic behavior of the tails of the empirical returns' distributions, and estimates the risk of long and short positions for a period of observations that goes from January 2, 1992 through December 31, 2015.
So far, tail-risk studies have applied CEVT to TGARCH, GARCH, EGARCH and PGARCH structures. However, these traditional specifications do not capture the long-term effects of asymmetry and persistence in the prediction of volatility. Therefore, the first contribution of this work consists in the utilization of a family of symmetrical and asymmetrical CGARCH models to capture these common features of highly volatile stock markets. Since VaR estimates based on CEVT are sensitive to the setting of the GARCH model, a second original contribution of this work consists in the assessment of the predictive out-of-sample performance of symmetrical and asymmetrical standard GARCH and CGARCH models for the 2010-2015 period using the predictive power test of Hansen (2005), under four measures of predictive errors. The conditional-ES measure is also estimated for the long-and short-term financial positions because VaR has the weakness of not complying with the sub-additivityproperty. This alternative, introduced by Artzner et al. (1999), is more consistent to estimate the severity of losses that exceed VaR levels.

Basic Statistics and Estimation of GARCH and CGARCH Models
This work uses the daily closing prices of the stock market indices of Argentina, Brazil, Chile, Colombia, Mexico and Peru 1 to study the potential benefits of using CEVT to estimate market risk. The data cover a period from January 2, 1992 to December 31, 2015, for a total of 6,206 observations. All the data was retrieved from Bloomberg. , indicating that positive shocks are more frequent than negative shocks, and also show leptokurtosis, i.e. longer-and fatter-tails (in particular the upper-tail) than a normal distribution. The Jarque-Bera statistic clearly rejects the null that the series follow a normal distribution at a 1% significance level in all cases. The Ljung-Box Q(10), and Q 2 (10) statistics in Table 1 suggest the presence of significant linear and non-linear dependencies. Also, the LM test shows strong evidence of ARCH effects and time-varying volatility, and suggests the use of a two-component GARCH model to filter the data series as a preliminary step to the conditional EVT.
The specification of the conditional mean may be expressed as follows: The equation of the conditional variance for residuals is governed by a standard GARCH(1, 1) process: Where ω≥0, α≥0, β≥0, and α+β<1, and make sure that h t >0.
A shortcoming of GARCH models is that they cannot capture a slow mean-reverting long-run component in the conditional volatility; however, Engle and Lee (1999) propose a CGARCH model as an alternative to capture high-persistence in volatility by decomposing it into a transitory and a permanent component, as follows 2 : For a more detailed technical explanation of the CGARCH model, see Maheu (2005).
Where h t and q t correspond to the transitory and permanent components of the conditional volatility, respectively. The conditional volatility is mean-reverting around the permanent volatility (q t ). For the permanent component, the speed of mean reversion is determined by δ, whose value typically lies between (α+β)<δ<1.
Due to the presence of positive and negative shocks of the same magnitude, but with different impact on volatility, the flexibility of the CGARCH model can be extended to capture asymmetric effects in the short and long run as follows: Where the dummy variable governed by the Heaviside function I( . ) is equal to 1 if ε t-1 <0, and 0 otherwise. The leverage effects are observed when γ>0 and ψ>0.
The estimation results of the GARCH and CGARCH specifications, and the diagnostic tests on the simple and square standardized residuals are reported in Table 2, in the appendix for the indices of Argentina, Brazil and Mexico. The parameter μ in the conditional mean equation is statistically significant at 1%, except for the asymmetric CGARCH model for Argentina. All the ϕ parameters of the first order autoregressive process are positive and significant at 1% level. The evidence suggests that changes in stock price in one direction tend to be followed by further changes in the same direction the next period. All the models successfully capture the dynamic patterns of conditional volatility in the short-run; most of the estimated parameters are positive and statistically significant at conventional levels. Also, the sum of α and β is less one, indicating the presence of high-persistence in the transitory volatility component, especially in the CGARCH-A2 model. Moreover, the estimates between 0.9555 and 0.9881 of the δ parameter clearly reveal that the permanent volatility component is more persistent and decays at a slower rate than the transitory volatility component under CGARCH y CGARCH-A1 models.
According to the statistical significance and sign of the asymmetry parameters that capture the impact of bad news in the short-and the long-run, results are mixed under the asymmetric CGARCH models. However, a significant presence of asymmetric effects is revealed in the transitory volatility of the CGARCH-A1 model. Hence, the use of asymmetric CGARCH models is empirically justified. The diagnostic tests for GARCH and CGARCH models specifications are reported at the bottom of with only one exception: The specification of the conditional mean fails to correct the standardized residuals autocorrelation, and something similar occurs in the case of Brazil. This fact will have an impact on the assessment of the predictive ability of the volatility models.

Evaluating the Predictive Accuracy of Volatility Models
The superior predictive ability (SPA) test of Hansen (2005) is used to identify which is the best volatility-forecasting model based on out-of-sample forecasts. Four symmetric and asymmetric statistical-loss functions are implemented to assess the volatility forecast performance of GARCH and CGARCH models. The first two are the mean squared errors (MSE) and the mean absolute errors (MAE), defined as follows: Where T indicates the number of out-of-sample forecasts, h t is a proxy variable for non-observable volatility, obtained from the squared returns, and ˆt h is the volatility forecast for day t.
According to Brailsford and Faff (1996), asymmetric loss functions penalize the over and under-predictions of volatility with different weights. To account for the potential asymmetry in the loss function, they develop mean mixed error statistics that penalizes  (36) are the values of the Ljung-Box test for simple and squared residuals with 36 lags, and their probability is reported in squared brackets. *, **, and *** indicate significance at 1%, 5% and 10%, respectively. The standard errors are reported in parenthesis under-predictions (MME(U)) and over-predictions MME(O)) of volatility more heavily, and define them in the following way: Where U is the number of under-predictions, and O the number of over-predictions.
The SPA test consists in evaluating the performance of k alternative models, with respect to a benchmark model, defined as: Where L 0,t is the value of the loss function at time t for a benchmark model M 0 , and L k,t is the corresponding loss function value also at time t for a competitive model M k .
Under the assumption that the vector d k,t is strictly stationary, the null hypothesis that the benchmark model is not outperformed by any other alternative model may be expressed as follows: The estimator μ k ≡E[d k,t ] reduces the impact of those models with poor predictive performance, but at the same time controls for the impact of alternative models with μ k =0. Hansen proposes the following consistent estimator for μ: Where 1 {•} is an indicator function. An immediate result of the stationarity assumption is that the threshold rate 2log logn ( ) guarantees the consistency of the estimator c k µ for n sufficiently large, even for alternative models with μ k =0.
So, the statistic of the null hypothesis is defined as: and the P-value of the SPA statistic T n SPA , Hansen (2005) suggests the use of a stationary bootstrap procedure, similar to that in Politis and Romano (1994), to obtain the distribution of the test statistic under the null hypothesis. A high P-value indicates that the benchmark model gives superior performance than alternative models. Table 3 shows the P-values of SPA tests based on 10000 stationary bootstrap samples under different loss functions: MSE, MAE, MME(U), MME(O). The null that the predictive out-of-sample performance of alternative models is widely overcome by the benchmark model, cannot be rejected. In the cases of Argentina, Brazil and Mexico, the CGARCH-A1 and CGARCH-A2 models give superior volatility predictions of financial returns, compared with GARCH and CGARCH models. The empirical results clearly suggest the volatility response of the Argentine, Brazilian and Mexican stock markets is different, depending on whether arriving news are good or bad. In turn, it means that negative shocks in these exchanges have a stronger impact in the short-run, but little effect in the long-run. The SPA test values indicate that the CGARCH-A2 model displays the best out-of-sample performance for Argentina and Mexico under the MSE, MAE and MME(O) loss functions. For the MME (U), the CGARCH-A1 model provides the highest SPA test values for both countries. This is explained by the fact that the MME(U) loss function penalizes the under-prediction of volatility which, in this case, represents approximately 60.45% and 77.28% of the sample for the Argentine and the Mexican markets, respectively.
The SPA P-value for the CGARCH-A2 model is above the 5% significance level, so it can still be considered an excellent benchmark model to predict the future volatility of the Mexican stock market while, in the case of the Argentine market, it is only second to a CGARCH Model, according to the insignificant asymmetric short-and long-term parameters. In the case of Brazil, the predictive ability of the asymmetrical CGARCH model is

The Peak over Threshold Approach
There are two alternative approaches to model the asymptotic behavior of extreme values: The block maxima (BM), based on the GEVD and the POT approach, based on the GPD. The first one focuses on the collection of minimum and maximum observations drawn from each of the blocks or subsamples during a fixed period of time, and represents the cornerstone of the classic EVT through the Fisher-Tippett-Gnedenko theorem.
The BM procedure is efficient when there are enough extreme values. But, the way the samples are built produces a loss of key information that may be important for the estimation. Since the nature of extreme events is rare and they usually appear in clusters, their study requires large samples and sophisticated statistical techniques 3 . The efficient use of POT takes advantage of the data, to the extent that extreme values tend to appear in clusters through time. Let R 1 ,R 2 ,…,R n be a sequence of i.i.d. random variables that represent losses with an unknown distribution, F(r)= Pr(R i ≤r).
Since the analysis is interested in those losses that exceed a threshold u, the distribution function of excess losses (GEVD) is defined as y i =r i −u, given that r i exceeds u as: F y Pr r u y r u Pr u r u y For a sufficiently high threshold u, the theorems of Balkema and de Haan (1974) and Pickands(1975) show that the excess distribution converges to a GPD as follows: Here ξ is the tail index, and σ>0 is the scale parameter.
Similarly, F may be defined as F(r)=(1-F(u)) G ξ (y)+F(u). The non-parametric estimation of F(u) is determined by n-k⁄n, where n is the total number of observations and k is the number of observations that exceed u. Substituting the estimated value for F(u) and Equation (18) in F(r), the following expression is obtained for the tail estimator: Where ξ and σ are the maximum likelihood estimators of ξ and σas u increases.
The value of the parameter ξ may be positive, negative, or zero, and serves to determine the properties of the tails of the GPD. When ξ>0, the GPD takes the form of an ordinary Pareto distribution, which is more appropriate to model heavy tails distributions, like in the case of financial returns. When ξ=0 and ξ<0, the GPD has the form of the exponential and Pareto type II distributions, respectively.

Threshold Selection in Extreme Value Analysis
In practice, the selection of suitable thresholds is crucial to properly determine the region of the tail before which the GPD is fitted, as well as to reduce the bias and variance in the estimated model. According to Coles (2001), the selection of thresholds that are too small contributes to the violation of the asymptotic properties of the model, leading to biased estimates. On the contrary, a threshold that is too-large generates estimates with high standard errors as a result of the limited number of observations in the sample.
There are two standard tools that can be used to determine the appropriate threshold, which are the mean excess function (MEF) and the Hill-plot.
This analysis uses the MEF, defined as: The MEF is the sum of the excesses over the threshold u divided by the number of data points which exceed the threshold u. It is an estimate of the mean excess function which describes the expected overshoot of a threshold once an exceedance occurs. The MEF is a linear function of the threshold u when the excess distribution has the form of a GPD.

Estimating VaR and ES Based on the CEVT
For a given probability p, VaR and ES can be defined as: Where F -1 is the so-called quantile function defined as the inverse of the loss distribution F with negative sign, while the ES for risk R at given confidence level p is expressed by the conditional expectation of a loss that exceeds VaR p .
So, the extreme quantile and ES of the GPD for probability p, is defined as: The literature on tail-risk measurement has evidenced the success of the unconditional EVT for the estimation of extreme quantiles. However, the application of EVT directly on raw data can lead to biased estimates, due to the presence of the conditional heteroscedasticity, volatility clustering, and strong time dependence exhibited by most financial return series. To ease this problem, this study estimates GARCH and CGARCH models on the original returns series to obtain standardized residuals that are closer to independent and identically distributed (i.i.d.) series than the raw-return series.
The EVT tool is applied to standardized residuals, Z t , and estimates the VaR and ES measures for a one-day horizon according to the following expressions: Where 1 t µ + and 1t h + are the predictions of the conditional mean and variance for period t+1, respectively.

Backtesting for VaR and ES
The quality and accuracy of VaR and ES models require of a statistical validation process to prove that the risk measures meet certain theoretical properties required by regulatory authorities to estimate sufficient capital requirements. The procedure of backtesting consists in comparing the VaR and ES with the realized returns of the next period. The likelihood ratio test proposed by Kupiec (1995) is used to examine whether the failure rate is statistically equal to the expected failure rate, α=1-p, where p is the confidence level used to estimate the VaR and the ES. If T indicates the overall number of observations, then the number of failures n follows a binomial distribution with probability α.
Kupiec's likelihood ratio test statistic is computed as: Where LR~χ 2 is distributed with one degree of freedom under the null 0 n H T α = = . If the value of LR is smaller than the critical value, the test is not rejected, which means that the estimation of VaR is reliable, while the alternative hypothesis is chosen if the model generates a number of failures that is either too large or too small.

Determination of the Threshold
Optimal thresholds are needed to determine the tail region of an empirical distribution. To do this, the positive and negative standardized residuals from the AR(1)-GARCH and AR(1)-CGARCH models are used. The MEF is applied directly to the positive residuals to estimate the threshold and identify the relevant tail region, while negative standardized residuals are transformed into positive values by multiplying times -1, so the MEF for the minimum can be directly subtracted from those of the maximum extreme values. Figure 1 shows the empirical MEF for both tails of the distribution of residuals, for the six sampled stock markets 4 . For positive standardized residuals, the MEF plots show a similar downward trend but, after they reach a certain value, they display an upward trend, particularly in the cases of Brazil and Colombia. Nevertheless, this upward trend is reversed in Argentina, while it is relatively stable in Chile andmore volatilein the case of Mexico and Peru. For negative standardized residuals, the MEF plots show signs of an upward trend in thresholds exceeding 2 for Argentina and Chile, while the MEF plots keep ascending up to a value of 3.75 for Mexico and Peru; thereafter, they tend to decline rapidly. Moreover, the trend in MEF plots is highly unstable in Brazil and Chile. Usually this increasing instability in the MEF plots is a feature of the technique, attributable to the dispersion of observations in the ranges of higher thresholds. This fact indicates that the asymptotic behavior of the standardized residuals is largely explained by the GPD with a positive and stable shape parameter, and a positively sloped straight line for the Latin America stock indexes. On the other hand, the horizontal behavior of the MEF indicates that the positive and negative standardized residuals for Chile and Colombia appear to follow an exponential distribution with a negatively sloped line.

Estimation of GPD Parameters
The scale and shape parameters of the GPD are required to independently analyze the asymptotic behavior of the tails of the distribution of the standardized residuals, and to estimate the losses related to the long and short financial positions in the sample of Latin American stock markets. For that reason, the maximum likelihood method is used to estimate the unknown parameters. The lower-tail estimates of the scale parameters tend to be higher than the upper-tail, with values in the range of 0.4544 and 0.7022, probably due to the fact that the selected thresholds for the negative residuals are also higher than those of the positive residuals, leading to a reduction in the number of exceedances, particularly for the Argentine, Brazilian and Mexican markets series. In most cases, the shape parameter estimate is positive and stable, i.e., significantly different from zero. This fact confirms that a GPD properly models the tail behavior of standardized residuals. Notwithstanding, for Chile and Colombia the lower-tail estimates are very unstable and close to zero, and even attain a negative value in the case of the upper-tail under the ACGARCH1 model. The upper-tail tends to be fatter and riskier than the lowertail for Argentina, Chile, Colombia and Peru confirming that the upper-tail is more stable. However, the opposite occurs in the case of Brazil and Mexico, which implies that these markets are more exposed to financial crashes that to economic booms. These results are supported by the selected thresholds for the lower-tail. Hence, the importance of the results will probably lead to more conservative estimates of the VaR and ES as the tails become fatter.

Estimations of VaR and ES, and Backtesting
To illustrate the usefulness of the CEVT for tail-risk management and measurement in Latin American stock markets, raw returns are initially filtered with a family of AR-GARCH, AR-CGARCH models to obtain identically and independently distributed standardized residuals on which EVT can be implemented. To evaluate the in-sample forecasting performance of VaR and ES, their estimates are compared with the next day´s return. The process of backtesting covers a period from January 4, 2010 through December 31, 2015, for a total number of 1,547 daily observations.  Tables 5 and 6 show the estimates of VaR and ES. The number of failures is reported in parentheses, for short and long positions, for different quantiles, as well as the P-values of Kupiec's statistic to evaluate the performance of the risk measures. The expected number of failures for the 5%, 2.5%, 1%, 0.5% and 0.1% are equivalent to 77.35, 38.68, 15.47, 7.74 and 1.55, respectively. The VaR and ES are expressed in percentages, to stay in line with the actual returns series. In most stock markets, the VaR estimates range between 1.77% and 7.59% for a short position, and between 1.52% and 8.54% for a long position, under the 95% to 99.9%quantiles, except forin the case of Chile where it reaches values between 1.24%-2.46% and 1.08%-2.69% for the short and long positions, respectively.
Another important finding is that the VaR estimates augment significantly as the quantiles are increased from 95% to 99.9% for both, short and long positions. The more risk-exposed markets are those of Argentina and Brazil, followed by Colombia and Peru. The results of the ES are relatively more conservative. These findings are in line with the stylized facts observed in Latin America stock markets, which experience relatively frequent extreme events (booms and crashes), as well as high and persistent volatility over time. To select the best performing model, either the highest P-values of Kupiec's test, or  may be adopted. If the P ≤ 0.05 the CEVT model underestimates or overestimates the VaR and ES at a 5% significance level, depending on whether the number of failures is above or below the expected number. For the short position, the P-values of Kupiec's test reveal the predictive ability of the CGARCH-EVT and ACGARCH2-EVT models to forecast VaR in the Argentinian stock market for the 95%, 97.5% and 99% quantiles. However, the performance of all CEVT models is reduced for the 99.5% and 99.9% extreme quantiles; this fact is even more noticeable in the case of the ES performance. This implies that the CEVT models underestimate market risk because the number of real failures is significantly small, relative to the expected number of failures, except in the case of the CGARCH-EVT model that improves the ES performance at the 99% and 99.5% quantiles. Regarding long positions, the ACGARCH2-EVT, GARCH-EVT, and CGARCH-EVT models provide the best predictive performance in estimating VaR at the 97.5%, 99% and 99.5% quantiles, respectively. In addition, the ACGARCH1-EVT model provides the best ES results for quantiles >97.5%; even for the 99% quantile, any CEVT model captures the lower-tail behavior of the Argentine stock market returns.
The asymmetric CGARCH-EVT models provide the best insample VaR performance for any quantile and financial position for the Brazilian stock market. More specifically, the ACGARCH2-EVT and CGARCH1-EVTmodelsdeliver the best VaR estimates for the upper-and lower-tail, respectively. According to Kupiec's test P-values, the number of failures is not significantly different from the expected number of failures. However, the CEVT models The GPD parameters are estimated by maximum likelihood. The values in parentheses represent the standard errors of the estimators, and k indicates the number of exceedances  (7) 3.265 0.072 (23) 4.08 0.015 (7) 3. 0.000 (7) 3.293 0.000 (20) 4.121 0.000 (7) 3.12 0.028 (26)   provide more conservative estimates of ES for any financial position and quantile, except for the 99.9% quantile, in which the ACGARCH1-EVT model performs better for the long position than all other models. These findings show warning signals for risk managers, investors and bank regulators regarding the uncertainty and magnitude of extreme events.
According to Kupiec's test P-values, the Chilean market's short position analysis reveals that, the ACGARCH1-EVT and ACGARCH2-EVT models provide the best predictive performance for the 95%, 97.5% and 99.9% quantiles. For the 99% and 99.5% quantiles, the CEVT models show a poor performance in estimating VaR because the expected number of failures is significantly greater than the observed number of failures. Nevertheless, the CGARCH-EVT, ACGARCH1-EVT and ACGARCH2-EVT models deliver accurate ES estimates for the extreme quantiles. For long positions, the GARCH-EVT model provides a better in-sample VaR performance, compared with symmetric and asymmetric CGARCH-EVT models, for the 95% quantile, but all models based on CEVT perform similarly for the 97.5% level quantile. However, underestimated VaRs lead to poor inference on the true risk of the position for the 99%, 99.5% and 99.9% quantiles, albeit the symmetric and asymmetric CGARCH-EVT models improve significantly the ES estimates.
In the case of the Colombian market, the VaR estimates results for the short position are mixed. All the CEVT-based models on only perform well at the 99% and 99.5% quantiles, while the CGARCH-EVT and ACGARCH1-EVT models show the best predictive performance in estimating ES for the 99.5% quantile.
In the case of the long position, the predictive performance of the ACGARCH2-EVT approach is superior to all other models at any quantile, except for the 95% quantile, where the GARCH-EVT model offers the most robust estimation. On the other hand, the CEVT models suffer from too few losses beyond the VaR level, implying an overestimation of ES forecasts at any confidence level.
In the case of the Mexican stock market, the backtesting results reveal that the performance of the ACGARCH2-EVT model is robust when used to estimate the short position risk, since its predictive ability is only rejected at a 95% level, and it is outperformed by CGARCH-EVT model for the 97.5% quantile. The ACGARCH1-EVTmodel provides the best VaR estimates for a long position at any quantile, except for the 99.9% quantile, where several alternative models perform similarly. The strong evidence of short-and long-term asymmetric effects on conditional volatility explains these findings. On the other hand, the ES performance under CEVT models is not satisfactory for short-nor long-positions for the 95%, 97.5% and 99% quantiles. The reason is that the expected number of failures exceeds the number of real failures, although model improves the ES estimates at extreme quantiles. For the Peruvian stock market, the small P-values for the Kupiec's test reveal that the four models have a poor VaR performance for the short position since the number of failures is significantly different from the expected number of failures for the 95% and the 99.9% quantiles, but not for either the symmetric nor the asymmetric CGARCH-EVT models at the 97.5% and 99% quantiles. Notwithstanding, predictive ability is significantly improved with the GARCH-EVT and CGARCH-EVT models for long positions at any quantile. In the case of ES performance, most CEVT models are preferred for measuring market risk since Kupiec's test P-values are highly acceptable in all cases (both short and long positions), at the 99%, 99.5% and 99.9% quantiles.
The qualitative behavior between the current daily returns and the 1-day-ahead VaR estimates for short and long positions using different models for the stock markets of Argentina and Mexico, is also of interest. The out-of-sample backtesting analysis uses a fixed rolling-window of 4660 observations based on a rollingsample of 1500 observations, for the period from January 4, 2010 to December 31, 2015. This technique consists in simply removing the first observation of the series and adding the most recent observation to re-estimate the parameters of each model, and daily update the standardized residuals. The main advantage of the rolling window technique is that it captures the statistical characteristics of returns and the time-varying nature of market risk in different time periods. However, it is important to highlight that backtesting for longer periods can be problematic because it is difficult to choose the GARCH or CGARCH specification with the best parameterization every time. To ease this problem, the superiority of these models in forecasting the out-of-sample volatility of returns is assumed to be supported by the SPA test. Figures 2 and 3 show the performance of the four models from 2010 to 2015. The out-of-sample VaR estimates for the 95-99.5% quantiles present very similar dynamic patterns for both short and long positions, and quickly respond to the time-varying volatility experienced by the stock markets. In periods of financial crashes or high volatility, GARCH and CGARCH-EVT models provide more conservative estimates of VaR due to the fact that the gap of the lower-and upper-quantile is wider for a 1-day time horizon, in particular for the Argentine stock market. In periods of relative calm, the out-of-sample VaR estimates tend to be more stable as the volatility is less persistent, particularly for the Mexican stock market. Another important feature is that the out-of-sample VaR estimates depends on the volatility models that take into account heteroscedasticity, fat tails and volatility asymmetric effects.
In the out-of-sample VaR analysis for Mexico, the backtesting results reveal that all EVT-based models have a very poor forecasting performance for the short position at any quantile 5 . According to Kupiec's test, the number of failures is significantly different to the expected number of failures. However, asymmetric CGARCH-EVT models improve the VaR forecasting accuracy for long position, particularly the ACGARCH1-EVT approach. Similarly, in the case of Argentina, asymmetric CGARCH-EVT models outperform the forecasting performance of alternative models in both short and long positions for quantiles >95%.
Although for the 99% quantile, the CGARCH-EVT model provides the best forecasting performance for the short position. The results are supported by the P-values of the SPA test, which confirm the ability of the asymmetric CGARCH-EVT models in  forecasting the out-of-sample volatility. These findings confirm the importance of the long-term effects of asymmetry and persistence to improve hedging and tail-risk management in volatile stock markets.
In the case of out-of-sample ES estimates, the predictive performance of the CEVT models overestimates the losses of long and short positions in Argentina's and Mexico's stock markets, as a result of excessive failures. However, this information may be relevant and useful for risk-averse investors who participate in global financial markets that often experience episodes of high volatility caused by asymmetric effects and fattails in financial returns, and who require dynamic and flexible models to estimate the tail quantiles during financial crises and stock-market crashes.

CONCLUSION
This work presents an extension to the GARCH-EVT based risk measure proposed by McNeil and Frey (2000), with the aim of improving the predictive performance of the in and out-of-sample VaR and ES tools. It proposes a family of volatility models with two symmetric and asymmetric components known in the literature as CGARCH models, which allow combining the characteristics of asymmetry and long memory in the volatility in the short and longterm. The statistical validation of the symmetric and asymmetric EVT-CGARCH models is carried out for six Latin American stock markets, for a period observations that goes from January 2, 1992 to December 31, 2015, and studies the implications of risk management of the upper and lower tails of the returns distribution with confidence levels of 95%, 97.5%, 99%, 99.5% and 99.9%.
In terms of the P-values within the sample, the backtesting results indicate that there is no model based on EVT, that provides a satisfactory predictive performance in the estimation of VaR and ES for all the stock markets in the sample and for any confidence level, and/or that compliant with all financial and mathematical assumptions. It is important to mention that the asymmetric CGARCH-EVT models often provide excellent tail risk estimates in the stock markets of Mexico and Brazil. However, participants in the Brazilian market must be careful when making investment decisions or implementing hedging strategies, because no GARCH nor CGARCH specification was able to reduce the autocorrelation in the simple standardized residuals.
For the out-of-the-sample backtesting, the P-values of Kupiec's confirm the predictive power of the asymmetric CGARCH-EVT models in the estimation of VaR for short and long positions in the stock markets of Argentina and Mexico, in particular the ACGARCH1-EVT model. These models capture the asymmetry and persistence in volatility in the long term, as well as the fat tails in the filtering process, through CEVT to improve the performance of the VaR and ES. Despite the fact that this last measure overestimates the risk, its implementation can be a useful tool for risk averse investors. One of the shortcomings of the study is the limited capacity of the proposed volatility models to provide identically and independently distributed standardized residuals for the stock markets of Brazil, Chile, Colombia and Peru. Therefore, future research should estimate long-term memory volatility by including FIAPARCH FIEGARCH, FIGARCH models, to attempt better estimates of VaR and ES.