INSIGHTS BLOG > Problems with Aggregate Measures
Problems with Aggregate Measures
Written on 18 September 2009
by Ruth Fisher, PhD
A recent article in the NYT, “G.D.P. Seen as Inadequate Measure of Economic Health” by David Jolly, discussed the inadequacy of using GDP alone (“It’s not a question of replacing G.D.P. It’s a question of complementing it with other indicators that can provide other measures of well-being.”) as an indicator of a society’s well-being:
G.D.P. is the measure of the market value of all the goods and services produced in the economy. Its development in the 1930s, when the U.S. government was looking for new tools to measure national income and output more accurately, has been described as one of the most important advances in macroeconomics.
However, there has long been criticism that, while it accurately captures the growth or contraction of the overall economy, it is a crude tool for describing social health…
The articles mentions some other measures of well-being in addition to GDP that are reported by other countries around the world in an attempt to better capture the country’s total well-being. For example,
[T]he Himalayan kingdom of Bhutan has chosen to focus on “gross national happiness…
The United Nations Development Program’s human development index…also seek[s] to incorporate the value of a long and healthy life, access to knowledge and a decent standard of living.”
The Problems with Aggregate Measures
GNP, Net Income, SAT, GPA. Everybody loves aggregate metrics because they’re so easy to use. They provide simple snapshots of complex situations. And because everyone wants things to be easy, they convince themselves (and/or others convince them) that the single number proffered is all they really need to know to have a good understanding of what’s going on. Unfortunately, this is generally not the case.
Some of the bigger problems associated with aggregate metrics include the following.
1. Aggregate metrics often do not capture all issues of importance.
As the article points out, aggregate metrics often fail to capture important issues associated with the picture the metric is trying to paint:
One of the most glaring problems with using economic growth as a proxy for well-being was the fact that it excluded the damage to society and ultimately to the economy of environmentally non-sustainable activities.
In particular, aggregate metrics often fail to include important issues when those factors are difficult to measure. GNP does not include the costs associated with depletion of beneficial resources or the increase in harmful emissions. Net income does not capture the depletion or generation of intangible assets. SAT does not capture perseverance. GPA does not capture class difficulty.
Because aggregate measures are generally incomplete, comparisons of aggregate measures across actors (companies, countries, individuals, etc.) often provide apples-to-oranges comparisons. This means any conclusions drawn from the cross-actor comparisons of the aggregate statistics are bound to be flawed.
2. Aggregate metrics, by definition, are derived from a larger number of individual events, "transactions", or measures.
This means that the aggregate metric tends to mask a lot of potentially important information about the individual measures that are combined to generate the aggregate measure:
How many individual transactions were combined to create the aggregate measure?
How does the overall aggregate metric compare with the same metric calculated for different subgroups of transactions? For example, what is the contribution of private vs. public activities to GNP? Products vs. services? Large organizations vs. small organizations? How much of a company’s net income was generated from primary or core activities vs. secondary activities?
3. Cross-actor comparisons of aggregate metrics can be problematic if different actors calculate the metric differently.
A recent article in the WSJ, “Hate Calculus? Try Counting Cow Carbon” by Jeffrey Ball, provides a perfect description of this issue as it relates to measure the carbon dioxide emissions associated with the provision of a pint of whole milk:
Tesco PLC, the big U.K. retailer, began last month labeling milk sold under its store brand. Its studies concluded that a pint of whole milk generates an amount of greenhouse gas equivalent to about two pounds of carbon dioxide...
Another study by the U.S. dairy industry came up with a preliminary footprint that is about 15% lower, when expressed in terms of a comparably sized container of milk.
What may account for some of the difference is another set of dizzying variables in the carbon calculation. Some farms have more energy-efficient machinery. Some cows eat less corn, which typically is grown with petroleum-based fertilizers. And some kinds of feed cause cows to burp more methane, a potent source of carbon. That bovine belching is widely agreed to be the biggest source of carbon emissions in milk production.
But some parts of the equation are subjective. Cows produce multiple sellable goods: milk while they are alive, and, once they are slaughtered, products including beef, leather and bones. So how much of the emissions from the dairy farm should be blamed on the milk, and how much on the making of the steak and shoes?
Tesco attempts to resolve that question by splitting the emissions according to the relative economic value of the milk versus the cow's carcass. If, say, a dairy farm got 90% of its revenue from selling milk and 10% from selling the cow, then 90% of its emissions would be ascribed to the milk and 10% to the other products...
The U.S. dairy industry is updating its own study, and the new version uses a more-complicated calculation preferred by the International Organization for Standardization. It seeks essentially to look inside the cow, separating the portion of the animal's biological functions that go to producing milk from the portion that go to producing the cow itself. Those functions include the cow's eating, burping, flatulence and waste.
4. Giving attention to particular metrics tends to cause problems over time.
In merit-based systems individual actors are, by definition, rewarded for better performance. Higher company profits mean higher employee compensation. Higher SAT scores or GPA mean admittance to better colleges. Higher GNP means greater country “wealth” and reelection for politicians. Lower emissions mean greater recognition for being socially responsible.
Aggregate metrics are usually designed to be used with other measures of performance, which, taken together, paint a more complete picture of a situation that the aggregate metric alone is able to provide. Unfortunately, the simplicity and ease of use associated with aggregate metrics tend to lead people to focus on the metric alone, to the exclusion of all else.
When the metric is used as a standout measure of performance, and when individuals want to optimize their performance, they tend to focus on optimizing the metric. Over time, trying to optimize the metric, rather than the situation the metric represents will tend to cause actors to focus on the wrong things:
- Maximizing current GNP, instead of optimizing long run total social welfare.
- Maximizing short-term profits and stock price, instead of maximizing long run company value.
- Teaching the test, instead of providing a well-rounded education.
5. Aggregate metrics don't provide any indication as to how to improve the situation.
Since aggregate metrics give no information on the plusses and minuses, only the net outcome, it really gives you no information as to how to the underlying dynamics, nor how to improve the situation.
- What are the strengths and what are the weaknesses?
- Have the strengths been changing, have the weaknesses be changing, or have both been changing?
- Is one subgroup is benefitting at the cost of other subgroups?