The Black Knight property dataset provides information on each property’s assessed value, number of bedrooms and bathrooms, building area, age of property, and etc. These variables help us gauge the household characteristics of the subscribers and, in the difference-in-difference analysis, these will serve as control variabls that make sure our control group is similar to the subscribers.
Any systematic bias or vacancy in this data is thus particularly worrisome. Though we expected a certain percentage of missing records, it was to our surprise that in our North Dakota sampled data, all properties saw their number of bedrooms, bathrooms, and square footage missing.
To facillitate the analysis for our projects, we experimented using web scraping to find the missing information using search engines. Our script can be described as performing the following checklist: for each address,
Bing is used as our primary search engine for its relative friendliness to frequent requests, and we included Zillow and Realtor as the two sources whose search results will be taken for further analysis.
To ensure the accuracy of our analysis, we only accept the properties whose number of bedrooms, bathrooms, and square footage can all be scrapped from the sources listed above. Under this standard, we managed to get information for 455 properties out of the total of 2963 properties in the North Dakota project buffer zone.
The success rate of 15.3% in this particular sample may appear to be too low, but this information is otherwise missing completely. Moreover, in other areas where the assessors’ office publicize a more detailed record of properties, our scrapping script also tend to perform significantly better.
![]() |
Program Contacts: Joel Thurston and Cesar Montalvo |