Originally Published March 3, 2024 by Paul Rayburn updated March 04, 2024 with additional graphics and further explanations.

I thought I would take this opportunity to highlight one of the most common issues I have seen with visual representations of data. There is no doubt that Generative AI, like ChatGPT, is a significant current event that is re-shaping our future, so I like to talk about that as often as possible. I also like to try and keep things in perspective whenever possible. 

Earlier today, I came across a post with a compelling video showing an animated image of the trends and relationship between Nvidia and Intel based on the demand for Nvidia GPU chips for Crypto Currency mining and subsequently shifting to Generative AI. It is a visually striking presentation, but when you drill down on the data and how it’s presented, you may be surprised. I like to point out that these types of line graphs can shock, move, and possibly deceive viewers by the visualization.

Link to the video

This sometimes happens in Real Estate when comparing two markets, and I have seen cases where individuals misinterpret this and assume that one market may have been a better (or worse) investment mainly based on the visual scale. Maybe you have experienced this.

Here’s the issue: if the baseline starting point for two separate datasets (markets), presented on the same graph, is significantly apart in percentages, or worse if they are orders of magnitude differential, then the resulting graphs can be easily misinterpreted visually. In this case, INTL has roughly 15 times the market cap of NVDA at about the point when the markets begin to diverge in 2015, with INTL at 10 Billion and NVDA at 150 Billion. However, when we take a moment to measure the actual change, it is not so dramatic.

Visually, on screen, it is. Measured on my screen, it increased by 57 pixels in height. If that were printed on 8.5 x 11 paper, it would probably be less than an inch. Yet, from 10 Billion to 250 Billion is 2500%, but from 250 billion to 2 Trillion is only about 800%, but it’s almost 10 times the number of pixels in height increase, which would probably be closer to 5 inches on the sheet of paper.

In my annotated graph below, we can see NVDA increased 2500% from 2014 to 2020 and then only 800% to 2024.

Source image captured from the youtube video Source https://www.eeagli.com/#ultimate-wordpress

In these types of cases, I prefer to either use or include a log scale, which helps to eliminate this visual scale issue. The following chart is simplified, as I only use three data points for each set for Intel or Nvidia, but it should help to highlight my point and how the log scale can be helpful for viewing this type of data.

Created using ChatGPT Excel

There is no one correct way to plot this and either plots have some relevance. Typically, the linear progression of y-axis values increasing in regular intervals (in this case, 250 billion per interval shown on the left hand side) is generally easier to read, making estimating points between equally spaced intervals also more intuitive. On the other hand, for visualizing changes in value over fixed date intervals between these types of datasets, equal spacing by percentage on the y-axis can be often helpful. In this instance, I applied a 10x base, where every equal increment of height increase represents an equal increase in percentage (as shown on the right-hand side of the chart above). Because we do not typically measure time in percentages for date series models, this method is not usually used or applied to the date axis, leaving the dates on the x-axis in the ordinary fixed increments.

To further illustrate this and make it more specific to the real estate market, I created a dataset from a reasonably conforming market and have shown three different visualizations of the same data. These are all exactly the same sales; they are just presented differently. The first chart is the standard scale, and visually, it would appear the market is increasing almost exponentially in the more recent sales with what appears to be significant outliers.

The second set is shown with the log scale, with the base value trimmed to $100,000.

The third shows the full scale untrimmed. Once again, these are unedited. It is merely the effect of the scale that creates the illusion of apparent dispersion in the data. This also prompts us to contemplate what is more relevant: dollars or percentages. I will argue for percentages, but let’s examine a few more charts and recap in the summary.

Here, I’ve created a few more graphics specifically for an actual real estate market. I compiled two datasets: one focuses on the higher-end areas with more predominant lake views, typically considered executive areas, and selected newer, larger homes. The other dataset focuses on areas without lake views, in generally lower-priced value areas, although I never used price as a search criterion. Once again, the visualization is striking, and one might be led to believe that the higher-priced “executive” market increased at a much more dramatic rate with more significant outliers than the “average” market. However, I’m sure you are now either guessing that this is not the case, or you are drilling down into the numbers and ignoring the graphical influence.

To save you the trouble of doing the calculations, here is the annotated graph. At both the beginning and the end, the trends are approximately 250 percent apart, or to put it another way, they both increased by about 460 percent.

This is not a straightforward answer regarding the difference between these two markets, and I would advise caution when using these linear trends for specific calculations beyond this type of broad analysis. However, it is fairly obvious that there was no significant trend difference between the two markets, and the following log scale more accurately reflects this.

These log scales help clarify the rate of change in the market by adjusting the vertical scale in relation to the horizontal time axis. As I previously stated, the percentage relationship versus dollars represents a more relevant measurement of markets. The dollar value difference between the two markets in 2000 has no relevance today. The fact that in 2000 the average executive home was $135,000 more expensive than homes in the average to lower-end market is not relevant without converting that difference to a current value, which can only logically be done using a percentage relationship. What about 2012 or 2020? The answer would be the same: the percentage relationship is logical and scalable, which is why I make this statement and highlight the importance of the logarithmic scale for visualizing markets with apparent dispersion, best described as heteroskedastic characteristics.

Leave a Reply

Your email address will not be published. Required fields are marked *