interactive lorenz curve

15 March 2015

I've been playing around with fractals while redoing the splash page and testing the implementation of the various algorithms. While fractals have mathematical value of their own can make pretty neat art, their intuitive appeal often comes from real-world analogies to things like rivers and broccoli. Can we find an appealing application of self-similarity in economics?

It's worth mentioning that modern information/learning theory — the economics branch, anyway — relies relatively heavily on the Poisson distribution. In part, this is because the Poisson distribution has a constant hazard rate, and therefore can imbue some nice stationarity properties to the economic model it's part of. Although this is certainly self-similarity, it hardly has the intuitive appeal of “economics” as considered by the layperson.

For a more engaging perspective, it's worth looking at current events. Income inequality has been in the news a lot lately; can we find some aspect of self-similarity there?(1) That is, if we look at a small segment of the population, will this segment perceive income inequality the same as the population as a whole? To check, I've loaded CPS income data into the interactive inequality graph below.(2) Inequality is graphed as the Lorenz curve, which plots the aggregate income earned by everyone up to the \( x^{\text{th}} \) income percentile, as a fraction of aggregate income. Perfect equality — when, say, 40% of people earn 40% of the aggregate income — is displayed as the grey “Equity” line; the further the red line is from the grey, the worse inequality is. What you can see from playing around is obvious: there is no self-similarity to income inequality; mathematically it's clear why this is the case (but you'll have to keep reading, and endure some equations).

Instructions: first select a data month and year.(3) Drag the sliders below to constrain the income percentiles: setting the left slider to, say, 20% and the right slider to 80% will show the Lorenz curve — and associated Gini coefficient — for people whose income is between the 20th and 80th percentiles.

Loading graphing system.

why do we care?

Economists, as a rule, care more about efficiency than inequality. Or anything else, for that matter. However, it's been suggested that while some inequality is accepted by a body of people and can provide a good motivator — your extra effort must possibly yield results in order for you to exert it in the first place — larger amounts of income inequality can harm economic growth, which ultimately harms efficiency. When we consider the recent rise of populist politicking in the U.S. and the inequality-pandering on which it's based, the idea of inequality-messaging is apropos.

Why is self-similarity relevant for messaging? Our friends, our neighbors, and our coworkers ultimately look a lot like us. Call this implicit bias, or neighborhood selection, or the fact that we're doing the same jobs at the same company: it doesn't matter. If income inequality is not self-similar, we'll have a tendency to observe less drastic inequality than actually exists. If you're a politician trying to leverage this for election, or an economist trying to motivate policy changes, this is pertinent. People respond to their own observations more strongly than scientific rigor; aggregate inequality statistics are irrelevant if the 1% doesn't recognize the existence of the 99%, or vice versa.

As for the Gini coefficient in the graph — essentially the difference between your percentile of income and the percent of aggregate income generated by you and everyone poorer — there are many other ways to measure inequality in a particular dimension, but Gini is the most popular. There's an interesting follow-up question that I know of no particular work on: do people identify inequality according to the Gini coefficient, or some other measure? Is it even perceived in a one-dimensional way? For our purposes here it is sufficient as a global measure of inequality — lower Gini is more equal — but I'm sure this isn't the only way to summarize the data; that's part of the motivation of the subset-conditioned graph.

but about self-similarity

Since this originally started as a question about the self-similarity of income inequality, it's helpful to address: what does inequality look like if its measure is identical no matter which slice of society we're looking at? The answer is that there could be no inequality.

To see this, it's helpful to return to the Lorenz curve, which is the basis for most measures of inequality. If inequality is self-similar, the Lorenz curve shouldn't change when we restrict attention to a subset of the population. Letting \( L(x) \) denote the standard Lorenz value, “The percentage of aggregate income earned by individuals up to the \( x^{\text{th}} \) income percentile,” we'll start by defining a restricted Lorenz curve,

\[ L\left( x; \underline{x}, \overline{x} \right) = \frac{L\left( \underline{x} + \left( \overline{x} - \underline{x} \right) x \right) - L\left( \underline{x} \right)}{L\left( \overline{x} \right) - L\left( \underline{x} \right)}. \]

That is, \( L(x;\underline{x},\overline{x}) \) is the percentage of aggregate income earned by individuals up to the \( x^{\text{th}} \) income percentile of the constrained population lying between percentiles \( \underline{x} \) and \( \overline{x} \). It's slippery to phrase verbally, so: \( L(\overline{x}) - L(\underline{x}) \) is the total percentage of aggregate income earned by individuals in the \( \underline{x}^{\text{th}} \) to \( \overline{x}^{\text{th}} \) income percentiles, and \( L(x) - L(\underline{x}) \) is the total percentage of aggregate income earned by individuals in the \( \underline{x}^{\text{th}} \) to \( x^{\text{th}} \) income percentiles. Divide the latter by the former and we have the within-population percentage.

Now, if the Lorenz curve is everywhere self-similar with respect to constraining attention to subsets of the population, it must be that for all \( (x;\underline{x},\overline{x}) \) we have

\[ \begin{split} L\left( x \right) &= L\left( x; \underline{x}, \overline{x} \right)\\&\implies\;\;\left( L\left( \overline{x} \right) - L\left( \underline{x} \right) \right) L\left( x \right) = L\left( \underline{x} + \left( \overline{x} - \underline{x} \right) x \right) - L\left( \underline{x} \right). \end{split} \]

Since small changes in \( x \) won't change this equality — we've assumed it holds everywhere — we can take the derivative with respect to \( x \) and find

\[ \left( L\left( \overline{x} \right) - L\left( \underline{x} \right) \right) L^\prime\left( x \right) = L^\prime\left( \underline{x} + \left( \overline{x} - \underline{x} \right) x \right) \left( \overline{x} - \underline{x} \right). \]

We can rearrange the left-hand multiplier back to the right-hand side, leaving

\[ L^\prime\left( x \right) = L^\prime\left( \underline{x} + \left( \overline{x} - \underline{x} \right) x \right) \left[ \frac{\overline{x} - \underline{x}}{L\left( \overline{x} \right) - L\left( \underline{x} \right)} \right]. \]

For any \( (\underline{x},\overline{x}) \) we can find \( \varepsilon \geq 0 \) so that \( \overline{x} = \underline{x} + \varepsilon \). Letting \( \varepsilon \searrow 0 \) — that is, letting \( \underline{x} \) and \( \overline{x} \) become close — we have

\[ \begin{split} L^\prime\left( x \right) &= \lim_{\varepsilon \searrow 0} L^\prime\left( \underline{x} + \varepsilon x \right) \left[ \frac{\varepsilon}{L\left( \underline{x} + \varepsilon \right) - L\left( \underline{x} \right)} \right] \\ &= L^\prime\left( \underline{x} \right) \left[ \lim_{\varepsilon \searrow 0} \frac{\varepsilon}{L\left( \underline{x} + \varepsilon \right) - L\left( \underline{x} \right)} \right] \\ &= L^\prime\left( \underline{x} \right) \left[ \frac{1}{L^\prime\left( \underline{x} \right)} \right] = 1. \end{split} \]

So the only way to have a globally self-similar Lorenz curve is to have \( L^\prime(x) = 1 \), which — given \( L(0) = 0 \) and \( L(1) = 1 \) — requires \( L(x) = x \). That is, the aggregate income earned by the earned up to the \( x^{\text{th}} \) income percentile is \( x \) percent;(4) there can be no inequality. \( \square \)

This presupposes that the Lorenz curve is well-behaved and differentiable, but this isn't such a huge leap. Ultimately the curve is meant to capture an underlying “true” distribution of income, not necessarily the empirical reality. In economics we have a grand tradition of assuming that variables behave continuously, even if we would never capture such behavior when measured.

The larger question is why must the equality hold everywhere. For pointwise measures of inequality, such as the 20:20 ratio or the Palma ratio, holding everywhere is necessary; for integrated measures of inequality, such as the Gini coefficient or the Theil index (PDF), it is plausible to allow for measure-zero deviations from self-similarity. But if this is a problem, it will be the case that the Lorenz curve has a discontinuity, raising the same issues as previously mentioned.

Nonetheless, I suspect that with some additional work the self-similar-income-inequality-is-no-inequality result will go through in these more general contexts.


  1. Personally I'm quite sympathetic to arguments in favor of wealth inequality as a superior measure of social dispersion. That being said, I know where to find income data; I'm less clear on reliable wealth data. [back]
  2. The data I've loaded understates the Gini measure of inequality — as reported by the government — by a significant amount. This is due to a number of factors. First, I'm not an expert with Census data and the sort of “standard” adjustments that some researchers might make are not a factor here. Second, the general consensus is that the best figures come from the March Annual Social and Economic Supplement, which to the best of my knowledge is not publicly available. Third, depending on the time period 2-3% of survey responses are top-coded — that is, their reported value is replaced with an imposed maximum value — which will squish down inequality due to high earners. [back]
  3. This seems to have only a minimal effect on the output generated, and things are relatively stable over time. This is not entirely true over larger date ranges, at least with respect to the Gini cofficient, but local to this dataset and the general shape of the Lorenz curve it seems to be true enough. [back]
  4. This raises a defined-ness issue: when everyone's income is the same, the Lorenz curve is not necessarily well-defined. Nonetheless, it is clear what this implies in a limiting sense. [back]

Included \(\LaTeX\) graphics are generated at LaTeX to png or by MathJax.

contemporary entries


there are no comments on this post