Implications

With our model we predicted that as of today, 41.75 % of Fairfax County Businesess are Minority Owned. We hope that with this model, community programs and policymakers across the entire National Capital Region can foster healthier and more diverse business climates within their communities by using accurate information to help encourage initiatives for underserved businesses. These improvements would not only help better our economy but promote representation for underserved groups and prioritize inclusion for all.

Limitations

Throughout our research, we experienced several limitations and challenges. One challenge was our inability to produce an extensive list of businesses despite web scraping from various sites. This resulted in a smaller list of businesses compared to our main sources. To mitigate this challenge, we carefully balanced the training set, ensuring it comprised 50% of businesses included in the listings and 50% not included. This allowed for a balanced training set, but it is still smaller than it should be.

Additionally, we acknowledge a potential limitation in our approach, wherein census demographic information is based on the number of people living in a particular area. By using this data to infer information about business owners’ locations, we assumed they reside in the same area as their businesses are.

Lastly, another notable limitation is that we can not have full confidence that our model can be applied to the entirety of the data provided by Mergent Intellect. This is because we still need to do more extensive web-scraping to find businesses that are likely to be both minority and non-minority owned. After we do this we can fully test the efficiency and correctness of our model and eventually apply this to Mergent again to get better results after evaluation.

Next Steps

The next steps of this would be to grow our training set and our data sets to gain more confidence in our model. We also hope to explore how minority-ownership is distributed geographically and across specific industries. And finally, we hope to apply our classifier model to other geographies covered by the Social Impact Data Commons, such as the National Capital Region.


Program Contacts: Joel Thurston and Cesar Montalvo