
Update Feb 2, 2025 by Paul Rayburn
Here is a video using a new data set, focusing on a similar time period and highlighting the same issue.
Original Jan 29, 2025 post follows
I will try to make this as simple as possible while retaining the critical context. You may prefer a somewhat more detailed version previously published on April 1, 2023. CLICK HERE
The subject of how we select data for analysis came up in the context of removing outliers in price from our subject market analysis. The point was that we should be thinking more like a buyer, and if the subject was worth just under a million dollars, buyers probably were not looking at 2 million dollar houses. The easy or most apparent answer to avoid irrelevant data may be to filter by price, or is it?
Market condition adjustments, also referred to as date of sale adjustments, can be one of the most reliable adjustments when appropriately developed. Whether you have reasonable amounts of recent data or need to look years back in history for more complex assignments, this issue is equally important to get it right.
Filtering by price alone can introduce analytical bias.
“Most anybody, even remotely familiar with a market, can come up with a probable price range for a subject competing market.”
Would that million-dollar buyer look at $750,000 to $2,000,000 dollar homes or would they narrow in as close as between $900,000 and $1,100,000? well, that’s reasonably probable.
Seems logical so far, or maybe not if it is the market we are trying to solve for. You probably couldn’t visualize what is occurring unless you have seen the data plotted out. I’d be surprised if anyone could. But once you’ve seen the issue, hopefully, you will never unsee it.
This data set was generated purely for this discussion and not specifically for how I would analyze a particular subject market. Generally, the more data points available, the more consistent the trends.
For this example, to avoid complex issues of market trend trajectories, I chose a market date range where the values were consistently trending to keep this example reasonably simple. *I am also only showing linear trends for the sake of simplicity, and based on the intentionally chosen period and data, best-fit trend lines may not provide any more relevant results.
In the following, Chart 1 is all of the data even remotely indicative of a market. The formula in the lower right corner of the charts is the price per day from the slope of the data. For example, in chart 1 is $693.03 per day, and our data date range is 850 days, so that’s 82%. That indicates an 82% increase over the period, but that is a very broad market which may not really represent a conforming market; therefore, we want to refine that market.

In the search field boxes, the simple input of max price $1,100,000 and min price $900,000 would be innocent. Nobody is trying to bias the results, are they? We can see the issue when presenting the results visually. In plot 1a, The triangles contain significant and relevant data, which would be excluded by initially searching this tight price range that arbitrarily results in a nearly level linear data set. You wouldn’t see this unless you plotted it out in a scatterplot, such as below.

In Chart 2 we see the results of that filtering by price graphically presented, and the calculation is 2%, over the 850 days, which is completely inaccurate and not an actual reflection of the market.

If we filter the data first by those parameters of physical similarity, we can get a more reliable initial data set and then, if need be, remove outliers with a more surgical approach. Chart 3 is filtered by building size and acres, and the linear plot of that is a 75% increase over the 850 days.

The final set in chart 4 is filtered to the year built between 2010 and 2017, bringing the adjustment down to 73%. This now gives us a more manageable set for individual analysis of specific properties and similarities with the subject. The outliers were also eliminated by the filtering in this case and did not need to be removed individually, although, on occasion, that may still be required.

The HPI for this same period is 60%. An appraiser may be challenged to decide if the HPI is reasonably accurate for the subject market. The average HPI price for this broad market is $971,000, but does it represent the relevant market for the subject? Looking at more refined markets may be too limiting. The HPI is not adequately adapted for quality, condition and other relevant components not reported within the aggregated data, and it only provides an adjustment for the most common homes in that market. If you were analyzing a $600,000 property, a 2 million dollar lakeshore home, or vacant land, that would not be well represented by the HPI and on occasion, the HPI can produce erroneous results; after all, it is a program designed by humans, and errors or inappropriate model applications are possible. Therefore, we should have our own additional methods for comparison.

With advances in Automated Value Models (AVM’s) taking away much of the lending business for typical properties, appraisers face the increasing challenge of analyzing more complex markets. However, we also have or should have, increased access to data. Our ability to meaningfully interact with that data has become increasingly important.
As always, I encourage everyone to take George Dell’s Stats Graphs and Data Science SGDS courses and join the (CAA) Community of Asset Analysts.
You might like my YouTube channel; you can check it out here. If you want to start out with a bit of a blooper outtake featuring “not Shiny Psaul” and my CAA friends, you could start at about the 22-minute mark HERE