Money has poured into environmental, social and governance funds. But even as ESG takes its place among major market factors, data remains a problem. Machine learning can help.
For many experts, the key to mitigating climate change involves reducing our carbon footprint. That may sound straightforward. It’s not.
The collection and analysis of carbon-emissions data is an art. Carbon data can be measured in two ways: directly, as so-called Scope 1 emissions, which are recorded “at source” by companies; or indirectly, as Scope 2 and 3 emissions. Data providers use economic input–output life-cycle analysis models to convert activity data into usable carbon-emission data. There are downsides to this process. First, direct measurement is expensive, which shifts a lot of carbon accounting to indirect methods to cut costs. Scope 2 emissions data come from a company’s consumption of various forms of energy. Scope 3 data are based on activities of suppliers and distributors, fuel usage, waste disposal and the end use of sold products — sources often beyond a company’s direct control. Without standardized reporting guidelines, the data may vary because of assumptions that companies have to make on a variety of issues.
Data providers have to make modeling choices to fill gaps in time series and to estimate emissions for companies that do not provide data directly. MSCI ESG Research found that some methodologies based on these models may overstate emissions by more than 200 percent versus when actual emissions are eventually measured and disclosed. A third degree of variation arises from the selection of an appropriate peer universe as data providers try to generate normalized carbon-emissions scores. In addition to data collection, treatment and augmentation, variation can also come from how vendors control for country, region, sector and risk factors, thereby resulting in, say, Chevron or ExxonMobil being higher or lower relative to another oil major on that vendor’s ESG score.
Carbon is obviously an important factor in sustainability measures. For all the complexities, the underlying data are supported by multiple data providers, and the emissions can be quantified. Other data, particularly when it comes to social and governance metrics, are harder to collect and standardize.
Datasets tracking ESG factors often leave much to be desired. Investors continue to struggle with questions about ESG data collection and interpretation.1 The absence of standardized ESG datasets and reporting methodologies makes it difficult for issuers to disclose meaningful information on sustainability. Data may not be widely available or manually collected by analysts, leaving data providers little choice but to produce subjective qualitative assessments. Such an approach means that different ESG research companies and data providers use their own, often inconsistent methodologies to generate ESG scores.
Better data and better data tools could resolve many of these issues. The good news is that tools now exist that were not available even a few years ago. Sophisticated players are learning how to use artificial intelligence (AI) techniques like machine learning, deep learning and neural networks to significantly improve the quality and amount of usable data, and to analyze it more effectively.
The issues around ESG data are symptomatic of the underlying challenge of gathering and interpreting information from three different factors: environmental, social and governance. Environmental considerations, which are hardly perfectly determinative, have the most standardized data. As time passes, and as the level of understanding of the factors driving climate science and sustainability improves, the relation between the “E” component and market performance will strengthen further.
Social and governance components are less quantitative and standardized. They involve social sciences rather than physical sciences. Measuring data on corporate social practices and tying that to market performance involves a degree of subjectivity — and proxy data. How do you measure, for example, “bad” labor practices or score the ability to balance the needs of different stakeholders?2
Governance may be even murkier. The conventional wisdom on what constitutes good governance (and the underlying legal context) has shifted over time. From the Great Depression to the 1970s, governance was based on balancing stakeholders — then the emphasis switched to shareholders.3 That’s been the prevailing model since then, though it appears to be eroding now, with a renewed call for longer-term thinking and greater consideration for stakeholders. Today, those definitional differences limit so-called corporate social responsibility efforts,4 and views of “good” corporate governance differ around the world.5
Even if those issues are effectively resolved, how should an investor weigh three very different components in a single ESG score or grade?
This captures the essential challenge of ESG — and around the use of data in markets more generally. Even if investors can gather, standardize and use data on the three components, are they meaningful? Some skeptics still argue that there is little historical evidence of excess returns in ESG factors; others maintain that by screening out bad companies, investors raise their cost of capital, translating into higher returns for those willing to buy the stock of those companies.6
A multifactor portfolio typically has a slightly higher ESG score than a market benchmark, partly because investing styles such as low volatility and quality are inherently skewed toward better-behaving companies. However, purely maximizing ESG scores would skew style exposures and trail a regular factor strategy.7 Some investors believe that ESG funds more or less track broader markets but outperform if they have a defensive tilt and if carbon-intensive industries such as airlines and oil underperform the broader market.
ESG’s Growth Track
Despite the data limitations, ESG has continued to grow as an investment strategy. RBC Global Asset Management found that 75 percent of respondents in its 2020 Responsible Investment Survey of more than 800 institutional investors had integrated ESG principles into their investment approach, up from 67 percent in the 2017 survey.8 In its recently released biennial, the Forum for Sustainable and Responsible Investment estimates that U.S.-domiciled assets employing sustainable investment strategies hit $17.1 trillion at the start of 2020, up from $12 trillion in 2018.9 And despite the pandemic, Morningstar reported that some $51.1 billion flowed into “sustainable” exchange-traded funds (ETFs) in 2020, more than double the inflows of 2019 and ten times those of 2018.10 In fact, all three years set records. Morningstar defines sustainable funds as those that incorporate ESG criteria throughout the investment process.
Clearly, ESG investing is taking its place among major market factors, such as value, momentum and volatility. The taxonomy of ESG factors has proved adaptive, as the market empirically prices new indicators (see Figure 1 for index provider MSCI’s taxonomy). In addition, recent advances in quantifying the effect of ESG factors on performance, in developing a regulatory and legal framework for ESG, and in establishing new ESG ratings should continue to have a positive effect on asset flows into ESG-related strategies.11
ESG has long sought to prove that positive correlations between ESG characteristics and financial performance imply causality. Recent research by MSCI analyzed three channels of valuation, risk and performance that transmit ESG information to the equity market.12 Using MSCI ESG ratings data, the research found statistically significant evidence that companies with high ESG scores had higher profitability and shareholder yield, reduced tail risk, less systematic risk, lower market beta and higher valuations.
By examining historical data, researchers have also identified potential linkages between a company’s ESG-related metrics and its corporate financial performance and market returns. Although results vary and a number of studies on the topic contradict one another, a 2015 meta-study found that 90 percent of 2,000 empirical studies reported a small but positive correlation between ESG and performance, with an average positive statistical correlation of 0.15.13
Today, investor demand for financial products that allow trading on ESG factors has led to the creation of ESG index funds, which have historically outperformed risk-adjusted results in excess returns, ten-year returns and conditional value at risk for all countries, compared with non-ESG index funds.14 Inflows over this period have likely helped ESG performance. And ten years is not long enough to establish the statistical significance of these factors — especially because their performance amid outflows from the strategy has not been clearly tested.
The Data Problem Revisited
It’s easy to forget, but before 2015 — just six years ago — ESG data was still generated and aggregated subjectively by research analysts. This not only left room for potential analyst bias, it also made it difficult to frequently update ESG scores. As a result, there was little correlation among ESG data providers. Some scores were updated only annually, which is a long way from real time.
ESG investing continues to face similar obstacles. In particular, data remains noisy and incomplete, not fully standardized, not integrated and not transparent. The subjective and qualitative aspects of ESG have persisted.
Given the rising number of ESG data sources, each one must not only add value to security selection but add incremental value to the sources already being used. Theoretically, this can be thought of as a straightforward request for integrating ESG information into alternative databases. This is easier said than done. Because underlying factors are not standardized, the calculation of similar factors by two data providers can be materially different.15
One underlying reason for this discrepancy is that ESG data is “unstructured.” In machine learning parlance, structured data is something with a schema and well-defined relationships. In the case of ESG data, there is much less structure, with some fields purely based on the data mining of unstructured text documents or records. This increases subjectivity and the intuition required by data researchers to convert the same ESG factor into a quantifiable metric. There can be varying amounts of noise introduced into the measurement of factors even from the same provider, owing to parameters used in the methodology of mapping text or records to a quantified metric. As data collection and mapping methodologies evolve and become mainstream, one can expect the amount of noise in some ESG factors to recede.
Not surprisingly, many factors have short histories, as providers have only begun sourcing or recording underlying data. Data can be missing for some fields or not updated, particularly for specific time periods. Noise, short histories and gaps pose challenges in the use of ESG data.
There’s also a lack of a standard taxonomy. ESG factors continue to evolve, and the dictionary changes as investors move through sectors and industries. ESG factors are difficult to reproduce over time and across geographies, partly because of differences in data across regions and in how the recording of data has evolved. Moreover, there may be discrepancies between what certain factors are expected to do and what they actually end up doing.
Amid the lack of standardization, most governments and regulators have few requirements for ESG reporting, so the choice of factors to report and even their qualitative content are often at a company’s discretion. As a result, there is poor comparability across companies for the same ESG factors, and among the 125-plus ESG data providers, small to moderate differences in data analysis and investment strategy cascade into disparate results.
A 2019 study comparing the rating systems of four popular ESG data providers — MSCI, Sustainalytics, RobecoSAM and Bloomberg — found weak pairwise correlations.16 In particular, the MSCI ESG score was correlated only about 0.5 with each of the other three scores. As the study suggests, many aspects of data analysis, from the inclusion and exclusion of ESG factors to the statistical algorithms used for estimating unreported ESG data, may contribute to significant differences among ESG indices.
The Promise of AI
Machine learning can be used to address some of these challenges, particularly the integration of ESG data into more stable, comprehensive databases. Advances in natural language processing (NLP) in deep learning and downstream machine learning ensemble techniques can make it possible to integrate similar fields from different datasets, reducing noise while retaining most of the information and value. Given the plethora of unstructured data available on public companies over long periods and on a daily basis, it’s possible to mine industry publications, regulatory filings, news, government studies and social media to compose a score for each of the ESG factors for a broad array of U.S. companies.
NLP algorithms have the ability to read articles, categorize items and extract positive and negative sentiments to produce an array of potential predictive indicators. Investors can use such algorithms to dig into a broad range of categories of underlying data, or they can roll up all the ESG scores at the portfolio level to see how they are exposed to specific ESG factors.
Many rating agencies are now integrating alternative datasets and using methods such as machine learning to provide more flexible and up-to-date information. For example, Sensefolio uses NLP to analyze different sources of alternative data to provide ESG ratings on more than 20,000 companies worldwide, based on 150 ESG metrics.17 Using more than 50,000 sources of news signals across 20 languages, Arabesque not only delivers aggregate scores based on principles identified by the U.N. Global Compact (corporate sustainability guidelines that preceded the U.N. Principles for Responsible Investment), it also provides industry-specific analysis. RepRisk uses advanced machine learning to screen more than 500,000 documents daily to try to identify ESG event risk.
These Big-Data-driven ESG signals use self-learning quantitative models and data from unbiased sources with more frequency, granularity and real-time analysis than ones based on traditionally sourced ESG data. NLP can be employed to identify and extract both entity graphs, which can be used to auto-extract data with specific attributes from a database, and ESG factors, which can then be validated for accuracy against previous taxonomies. (Graphs are structures used to store data in the form of nodes and edges.)
Data augmentation can increase the diversity of data in training models without collecting new data; it can fill in gaps in time or cross sections, thereby taking another step toward standardization. Variation autoencoders (VAEs) and generative adversarial network (GANs) techniques18 can be used to augment data in slightly different ways than simple encoders, which compress an input and then recreate it with minimum data loss. VAEs improve on the process by allowing the input to have some sort of underlying probability distribution and then attempt to replicate the parameters of this distribution when recreating data. GANs take a game theory approach to augmenting data. Two neural nets — a generator and a discriminator — compete with one another, as the generator artificially manufactures outputs that could easily be mistaken for real data, and the discriminator tries to identify which of the data it receives is artificially generated. This technique learns to generate new data with the same characteristics as the original data.
Tensor-completion techniques currently at the forefront of machine learning show promise to address the problem of ESG data gaps; they have been used successfully in other fields.19, 20, 21 Interpolation can fill missing data in time series, and tensor-completion techniques can be used to extrapolate fields with short histories based on related fields with longer histories. The main consideration is how to extend the relationship between known data to the missing data. Most traditional techniques make the implicit assumption that the missing entries mainly rely on their “neighbors” — that is, closely related known points in either the cross section or the time series. Tensor completion goes above and beyond traditional interpolation techniques by combining cross-sectional and temporal information in a way that carries over many characteristics of the original data to fill in the missing data. This enables such techniques to directly capture global information in the data even when missing entries might depend on entries that are far away.
The next logical step is to use these new sets of concepts and categories (“ontologies”) to update current predictive models, which can then be applied toward specific areas of focus, validate expert hypotheses and support decision-making. Traditional machine learning techniques like classification or regression, combined with ensemble approaches, subsequently can be utilized to blend different ontologies with expert hypotheses. Information in the entity graph can be used to propagate relevant data to related entities, allowing for a richer dataset that leverages the entity graph’s latent knowledge. The steps we’ve laid out can be incorporated into a holistic AI-supported workflow where prediction algorithms based on enriched data are used to generate ESG scores.
Conclusion
Recent ESG performance has benefited from strong asset inflows, as it appears that corporate executives and investors alike are starting to be more open to considering ESG factors in conjunction with companies’ primary objective of maximizing shareholder value. The common consensus is that transparency and ESG integration will become more profound — the ability to operate with reliable data will play a key role in that process. AI can become the key factor in helping investors and risk managers analyze ESG data that can be collected in both structured and unstructured formats. Not only can AI help extract relevant information from existing data sources, it also offers exciting opportunities to create new ones.
Michael Kozlov is Senior Executive Research Director at WorldQuant and has a PhD in theoretical particle physics from Tel Aviv University.
Ashish Kulkarni is a Vice President, Research, at WorldQuant and has a master’s in information systems from MIT and an MS in molecular dynamics from Penn State University.
Xin Li is a Vice President, Research, at WorldQuant and has a master’s in computer science from Tsinghua University.
Duc Ho is a Senior Investment Software Researcher at WorldQuant and has a master’s in mathematics from the University of Chicago.