1. Introduction
The pillar of traditional finance is the Efficient Market Hypothesis (EMH), which states that all relevant information is immediately reflected in the stock price. All investors are rational in making their trading decisions. However, because of its strict assumptions, traditional finance fails to explain the unusual, not “reasonable” phenomena that occur in the market such as the January Effect, the-day-of-the-week effect, or the market bubbles leading to the stock market crash. Behavioral finance was developed to provide a different view from traditional finance by making the basic assumption that financial asset prices are not always driven by reasonable expectations of future returns. Behavioral finance supporters argue that human beings including market participants are entities of emotions, rather than purely rational. Sentiment leads to the overreaction, underreaction, or herding behaviors of investors. As the center of behavioral finance, sentiments including investor sentiments and news sentiments have been widely recognized and measured. According to
Baker and Wurgler (
2006), finding an appropriate and realizable way to measure sentiments becomes a challenging and necessary task for researchers because of the increasing role of sentiments on stock pricing. Recently, with the development of public media and its attraction to investors, a news article about the financial market does not merely provide information but can also affect moods depending on the content of the messages inside. It can be positive news if it is favorable, and it can be negative news if it is unfavorable to the stock market. Therefore, sentiments from new information should be revealed and analyzed to understand its effects on the investors.
In Vietnam, sentiments have been analyzed to explain the investor’s behaviors, but research on this topic mainly focuses on constructing a comprehensive sentiment index (
Phan et al. 2021). Despite its popularity, the sentiment index fails to take into account the emotional response of investors to public information related to the stock market. The use of textual analysis for news sentiments has been applied globally by many scholars such as
Nguyen et al. (
2015);
Renault (
2017);
Huang et al. (
2020);
Petropoulos and Siakoulis (
2021); and
Liu et al. (
2023). However, in Vietnam, there is a research gap in news sentiments because of the lack of a well-trained textual analysis model in the Vietnamese language. The complication of the Vietnamese language may prevent the application of the existing models which are working with texts in English. Therefore, this study expects to be one of the first works performing a Natural Language Processing model to read, analyze, and classify articles on the finance and security market in the Vietnamese language according to their sentiments.
Studies on news sentiments have reported the relationship between news sentiments and stock price movements, volatility, and trading volume. News sentiments even have the ability to predict the returns of specific stocks as well as the market index (
Li et al. 2020). However, the level and the sign of the effects depend on the market characteristics and the time frame used for analysis. Therefore, another objective of this work is to assess the roles of news sentiments on the Vietnamese stock market—the market is still in the early stage of its development history.
Our research is expected to make contributions at some point. Firstly, it is one of the earliest studies training a language model with large news articles on financial and economic websites in Vietnamese to analyze news sentiments. Secondly, this study provides insights to explain the reactions of investors to news to understand investors’ behaviors in the Vietnamese stock market. Therefore, a well-trained model is necessary for researchers, investors, stock analysts, and the firm’s managers.
2. Literature Review
Efficiency exists when the stock prices fully reflect all available information (
Fama 1970) and the stock prices follow a random route. From the general definition of market efficiency, researchers have conducted several tests for measuring the level of efficiency. For example,
Chow et al. (
2016) perform tests focusing on the variance ratio to assess the role of market liberalization in improving market efficiency in Latin American countries. The market efficiency theory assumes that the risk premium of a stock solely depends on its systematic risk because investors hold a well-diversified portfolio. Investors cannot earn higher returns without accepting more risk. However, in studies taken to recognize market efficiency, practitioners and scholars have discovered anomalies and market inefficiency or inadequacies.
Schwert (
2003) discusses the return anomalies which are the deviations from the returns expected from the traditional equity pricing models.
Schwert (
2003) mentions a few factors affecting the stock returns besides the stock’s systematic risk such as the company’s size, value, weekend, and dividend yield.
Cho et al. (
2007) also examine the the-day-of-the-week effect and discovered that bad news on Monday makes the returns worse than bad news on the other days of the week. Similarly,
Chui et al. (
2020) tried to analyze the calendar anomalies by interpreting the Halloween effect on Western markets.
Behavioral finance has tried to examine sentiments in different aspects to explore the effects of sentiments on stock return patterns in the global market. Sentiment is defined as the belief about the expected returns and risks related to a stock without fact justification (
De Long et al. 1990). In particular, this kind of belief may lead to pricing errors in a great number of traders. In general, sentiments make investors become bullish or bearish in their trading behaviors (
Brown and Cliff 2004). According to
Pandey and Sehgal (
2019), the existence of sentiments cannot be denied but the matter is about how to identify and measure this factor. Sentiment is recognized with a market-based method using comprehensive indexes or a survey-based method using questionnaires and surveys. It is also identified with a text-based method of interpreting investors’ messages on social media and financial news articles (
Shen et al. 2022). In other words, sentiments can be measured using a direct, indirect, or any approach through which sentiment-related data can be collected.
2.1. Sentiment Measures
In a direct method, the sentiments of investors can be collected via surveys taken regularly or through making an analysis of information-searching behaviors on the internet. According to
Brown and Cliff (
2005), the survey’s results are an appropriate proxy for investor sentiments. The most popular surveys are: the Confidence survey for Michigan consumers (
Aggarwal 2018;
Qiu and Welch 2004); Investors Intelligence—II (
Brown and Cliff 2005;
Verma and Verma 2008); and the American Association of Individual Investors—AAII (
Fisher and Statman 2000;
Brown and Cliff 2004;
Verma and Verma 2008). The AAII sentiment index, for example, measures the percentage of respondents who are bullish, bearish, or neutral. It is conducted on a weekly basis to collect members’ views on the stock market for the coming six months. The shortcomings of using direct surveys are mainly from the concerns of the difference between actual investors’ behaviors and how they respond to the survey. In addition, the value of the survey results depends significantly on the size of respondents as well as the response frequency of the survey (
Aggarwal 2018).
The indirect approach to measuring investor sentiments which is applied widely is the construction of a sentiment index from several proxies.
Baker and Wurgler (
2006) created a composite sentiment index from six proxies: the closed-end fund discount, the market turnover, the number of Initial Public Offering (IPO) and returns of the first-day trading of IPO stocks, the new issuance share, and the dividend premium.
Baker et al. (
2012) removed 3 variables including the closed-end fund discount, the new issuance share, and the dividend premium from the set of proxy components provided by
Baker and Wurgler (
2006), and added volatility premium as a new proxy for sentiment index. In the work of other researchers, investor sentiments are also recognized by interpreting some trading activities including margin borrowing, short interests change, and short sales of specialists (
Brown and Cliff 2004). The investor sentiment index succeeds in presenting the market sentiment over a period, but it fails to measure how rapidly investors react to new information in the market.
Thanks to the development of machine learning and data mining approaches, textual analysis has been applied in reading, interpreting, and extracting sentiments from several online platforms. The textual analysis approach applies the Natural Language Processing model to texts in different categories depending on research purposes.
Petropoulos and Siakoulis (
2021) extracted the sentiments from the speeches of the central bank on the economic and financial outlook and relevant policies. Their work shows that the sentiment index from speeches can predict financial market turmoil.
Huang et al. (
2020) analyzed all public news on “a wide array of major news sources” about a corporation and found a relationship between news and institutional trading. Institutions mainly trade on the news tone immediately after the first news release and move the market returns in the weeks after that.
Baker et al. (
2019) also analyzed news on policies to track the volatility of the market based on news.
Daudert (
2021) even gave sentiment scores to the analysis of a company’s financial performance.
Sentiments can be discovered by analyzing the emotions behind the investor’s comments on social media such as message boards, Facebook, or Twitter. Moreover, sentiments can be taken from stock market news or economic news because news articles can transfer tone and emotions to the reader. Therefore, news articles reflect the expectations of investors on the market in general and for specific stocks according to the information they provide. This measure of sentiments is called media-based investor sentiments (
Sun et al. 2016;
Nguyen et al. 2015).
Antweiler and Frank (
2004) measured sentiments by collecting and interpreting a great number of messages on finance websites such as Yahoo! Finance. Others extracted sentiments from investor’s posts on media including blogs, Facebook, or Twitter (
Barber and Odean 2008;
Bar-Haim et al. 2011;
Bollen et al. 2011;
Dougal et al. 2012). According to
Li et al. (
2020), there is increasing interest in textual sentiment analysis among researchers in financial behavior because this method can reduce the bias that may be found in the survey-based sentiment approach (
Schumaker and Chen 2009;
Renault 2017;
Shapiro et al. 2022).
Liu et al. (
2023) utilized news sentiments related to each firm to be a proxy of investor sentiments on the stock of that firm. The “Overall Sentiment Score” ranging from −1 to 1 reflects the level of optimism or pessimism. The higher the score is, the greater the investor’s optimism about the stock.
Bali et al. (
2016) stated that the increases in market volatility relate to unusual news. Unusual news induces investors’ disagreement on valuing firms. “Given the high costs of short selling, pessimistic investors sit on the sidelines, while optimistic investors bid up stock prices to reflect their own valuations” (
Bali et al. 2016).
2.2. News Sentiments and Investor Response
As an emotional entity, an investor may not be rational in interpreting new information to make essential responses to it. Behavioral finance recognizes bias in belief according to how the investors behave when news appears.
Barberis et al. (
1998) believe that investors overreact to information in some cases and underreact in other cases. According to
Montier (
2002), the stock market tends to underweight the fundamental information on dividend payments or a firm’s earnings. Therefore, the investors are recognized to have conservatism bias when they anchor their investments solely on their forecasts about the company. Conservatism bias makes investors react very slowly to the news. For example, when there is unfavorable information about the price of stocks that investors are holding, investors may hesitate to sell. Investors may hold stocks for too long before being forced to sell after suffering unnecessary losses.
Many researchers divide new information into different groups according to the moods or sentiments of the information (
Vu et al. 2012;
Nguyen et al. 2015;
Renault 2017;
Costola et al. 2020). They all find a close relationship between the sentiments from news and the movements of the stock market. Sentiments that can be extracted from general news information (
Nguyen et al. 2015;
Renault 2017) or news on the COVID-19 pandemic (
Costola et al. 2020;
Baker et al. 2020) have a significant ability to predict future stock returns.
Feng et al. (
2022) explored the relationship between news sentiments and stock market volatility in Japan. The research findings show that news sentiments have realizable effects on the volatility of stock returns.
Liu et al. (
2023) suggest that firm-specific news sentiments can boost stock trading activity if it is optimistic. However, news with a pessimistic tone has stronger power in predicting stock returns than one with an optimistic tone. Similarly,
Shen et al. (
2022) also present the role of news tone on volatility and stock returns in China.
Feng et al. (
2022) explored the relationship between news sentiments and stock market volatility in Japan. The research findings show that news sentiments have realizable effects on the volatility of stock returns.
Liu et al. (
2023) suggest that firm-specific news sentiments can boost stock trading activity if they are optimistic. However, news with a pessimistic tone has stronger power in predicting stock returns than news with an optimistic tone. Similarly,
Shen et al. (
2022) also present the role of news tone on volatility and stock returns in China.
Cho et al. (
2007) analyzed the market reactions to negative news combined with the-day-of-the-week effect and concluded that a negative return on the previous Friday worsens the return on the next Monday. The underreaction of the market to news can be found on the other days of the week.
Cho et al. (
2007) also found different levels of reaction from different market indexes depending on the number of stocks that the indexes cover.
3. Methodology
The news sentiments were extracted by training a Natural Language Processing model (NLP model) to interpret texts in the daily news. In this study, news on the stock market, domestic economics, and international finance was collected from high-traffic financial and economic online platforms in Vietnam. News on specific firms was excluded because the study aimed to test the reactions of the market on general news instead of the news on any specific stock. Similar to the work of
Huang et al. (
2020), in this study, a large type of news that appeared in mass media is collected. However,
Huang et al. (
2020) constructed a “news cluster” that combined news on a particular firm.
There were two steps in the NLP model to discover news sentiments using textual analysis. In the first step, news articles from the websites were collected for model training. The news was taken from 3 websites: Cafef.vn, Vneconomy.vn, and Stockbiz.vn accesed on 12 July 2021. The Pandas (Beautifulsoup) library was applied to collate all the articles needed. After collecting and processing, the set of 40,000 articles was separated into a training sample (70%) and a testing sample (30%). According to
Petropoulos and Siakoulis (
2021), it is necessary to construct dictionaries for sentiments. In this study, the Convolutional Neural Network (CNN) layer, which will be discussed later in this section, was applied, so no sentiment dictionary was required. For model training, the news was labeled manually by the researchers to avoid mistakes. Specifically, a set of 2738 news articles was labeled as one of two groups: positive sentiment and negative sentiment. News was classified into different sentiment groups by recognizing words showing emotions not only in the title but also in the whole content of the news. Words and phrases presenting positive sentiment that could be found in the news were: “a significant increase”, “attractive”, “net buying”, and “price ceiling increase”. Words and phrases belonging to the negative or pessimistic group included “market washout”, “net selling”, “floor price”, “decrease”, and “shrinkage”. An example of news with positive sentiment was one showing the positive expectation of the firm’s earning growth rate in 2021 from Dragon Capital, an Investment Fund in Vietnam. News was labeled as 1 for having a negative sentiment and 2 for having a positive sentiment.
In the second step, labeled news in the previous step was used for training the NLP model. The NLP model is an Artificial Intelligence application resembling the human brain for analyzing texts. As all news was in the Vietnamese language, PhoBERT (an expansion of the BERT model) was trained in this study. The BERT model is a shortcut to the Bidirectional Encoder Representation from the Transformers model developed by
Devlin et al. (
2019). We developed a backbone-based model by starting with the PhoBERT model, a pre-trained language model, and fine-tuned it. To enhance its capabilities, we introduced several advanced techniques. In the PhoBERT base architecture, we integrated Convolutional Neural Network (CNN) layers with varying kernel sizes. These CNN layers were used to extract context representation vectors, thereby improving the model’s understanding of contextual information. This technique was proven to be very effective by
Pham et al. (
2021). However, because of the limitation of resources, we could not capture entire news articles in the training processing. The Spacy Library was implemented to synthesize the opinion of each paragraph in a news article and give a unified point of view of the whole piece of news. The Spacy Library separated each sentence and combined them into a paragraph that met the standard of no more than 200 words. However, sometimes each paragraph did not represent the main content of the entire news. Meanwhile, the news headline was an important piece of information that summarized the main content that the whole text referred to. Thus, news headlines were incorporated in front of each paragraph to give more context and main content to the paragraph. The final sentiment of the news articles was summed up by all the points of view in each paragraph through majority voting.
For the training model, we employed the Adam Optimizer with a learning rate schedule that included linear warmup and linear decay. The peak learning rate was set to 1 × 10−5. Before the classifier layer, a dropout with a rate of 0.1 was applied. During the training process, we used a batch size of 16 and conducted training for 20 epochs.
The model’s performance was assessed through its ability to identify the sentiment of news in the testing data and classify news according to its sentiment. The classification accuracies were measured using the estimations of Accuracy, Precision, Recall or Sensitivity, Specificity, and F1-score.
TP: True positive, the number of news articles containing positive sentiments is classified as Positive.
TN: True negative, the number of news articles containing negative sentiments is classified as Negative.
FP: False positive, the number of news articles containing negative sentiments is classified as Positive.
FN: False negative, the number of news articles containing positive sentiments is classified as Negative.
The accuracy is calculated in the whole dataset. It demonstrates the overall ability of the model to correctly classify news into a Negative sentiment group or a Positive sentiment group. The higher the accuracy, the better the model was at sentiment prediction. Precision measures the ratio of True Positive on the sum of True Positive and False Positive, while Recall (Sensitivity) determines the fraction between the True Positive and the whole actual Positive classes. Similarly, the Specificity measures the relationship between True Negative and the whole actual Negative class. The F1-score estimate is the harmonic mean of Precision and Recall. As it is a representative of Precision and Recall, F1-score is an appropriate accuracy estimate of a classification model.
To understand the relationship between news and the market, the study also attempted to reveal the changes in the stock market as a response to positive and negative news. To gain better knowledge of the market reactions to the news, the return movements of more than one index should be analyzed (
Cho et al. 2007;
Chow et al. 2016). We collected returns calculated from the 2 main indices including VN30-Index and HNX30-Index. VN30-Index and HNX30-Index presented the comprehensive price index of the top 30 market capitalization stocks in the Hochiminh stock exchange and the Hanoi stock exchange, respectively.
The date that the news was released was the event date. Returns of the VN30-Index and HNX30-Index were calculated within 10 days, including 5 days before and 5 days after each event date to examine the rapid reactions of the market to news. The market returns were also measured within 60 days: 30 days before and 30 days after the event date for checking the market movements in longer periods. To explore the effects of news on market returns, we performed several tests to compare the return variances and return means measured before and after the event dates. The variance ratio test was performed to assess the null hypotheses:
H01. There is no significant difference in the ratio of market return variances before and after positive news releases.
H02. There is no significant difference in the ratio of market return variances before and after negative news releases.
To highlight the importance of risks in investment analysis,
Cho et al. (
2007),
Chow et al. (
2016), and
Fang and Post (
2022) ordered the risks of returns collected in different periods using the stochastic dominance method. Therefore, in this work, the Cumulative Distribution Function (DCF) was estimated together with variance tests to rank the risks of each index. For mean comparison, the
t-test was employed to explore the significant return changes from the effects of news sentiments. There were two hypotheses used for
t-tests.
H03. There is no significant difference in the market returns before and after positive news releases.
H04. There is no significant difference in the market returns before and after the negative news releases.
All the above statistical tests were applied to 100 separate event dates. It includes 50 dates posting positive news and 50 dates announcing negative news.
5. Conclusions and Future Research
New information can be favorable or unfavorable to investors and it requires careful assessment. This study constructs an NLP model to discover the various moods of the news posted daily on three websites concerning finance and the economy in Vietnam. After being trained, the model obtains a high level of accuracy (over 81%) in assigning an article to a positive group or negative group by reading its title and content. As one of the pioneers in building a textual analysis model in the Vietnamese language, the NLP model in this paper can be applied by investors to extract the sentiment of any news article. It can be a tool for assessing the wider market or economic condition that can be favorable or unfavorable to stock investment. Therefore, investors can obtain an understanding of the current market without extensive reading. Researchers who are interested in textual analysis should find the results of this study noteworthy because it provides a well-trained model in the Vietnamese language.
Regarding the research objective of examining the effects of news sentiments on the stock market, this study concludes that there is no significant change in the return means of the index before and after the news release. However, employing the variance ratio test and CDF presentation, this study reveals that there is a change in the index risk when a piece of negative news is available.
For future research, there should be a broader index that reflects the prices of a large number of stocks used to explore the reactions of the whole market to the news. Examples of current stock indexes in Vietnam that can be selected are Vn-Index, measuring the price changes of all stocks traded on the Hochiminh Stock Exchange, and HNX-Index, accounting for the prices of all stocks on the Hanoi Stock Exchange. To discover any news leakage, future research is intended to rearrange the time periods to detect any potential effects of news on the market before public announcements.