Table of Contents

Project Background

The Business Diversity project is a significant part of the Social and Impact Data Commons project. The Data Commons is an open knowledge repository that compiles data from trusted open access sources to provide tools designed to track issues over time and geographical locations. Our project aims to help policy makers and outreach programs track economic diversity—focusing on minority-owned businesses—within Fairfax County, Virginia.

How is a minority-owned business defined? A minority-owned business is a US-based enterprise predominantly owned (51% or more) by one or more members of a socially and economically disadvantaged minority group based on race.

Figure 1: Fairfax County Census Tracts

Figure 1: Fairfax County Census Tracts

Microdata enables the study of minority-owned business activities at small geography levels. Current business microdata sources do not adequately identify minority-owned businesses. A case study conducted in Fairfax County revealed that the Annual Business Survey (ABS). reported approximately 38% of minority-owned businesses in 2017. During that same period, Mergent Intellect., our primary datasource, reported only 7%. Although we do not have Mergent Intellect’s methodology for identifying minority-owned businesses, our preliminary findings suggest that Mergent Intellect includes solely registered minority-owned businesses, underrepresenting those not registered. The inconsistency across these sources leads us to ask this question.

Motivation Question: How are minority-owned businesses distributed across Fairfax County geographically?

Project Goals

To help answer these research questions, our goal was to create a binary classification model that can reduce the error in predicting and tracking minority business ownership in Fairfax County, thus, accounting for the underrepresentation in Mergent Intellect’s data. Our classification model consists of three inputs:

We also kept ethical considerations in mind, which is why we decided to employ a binary classification model, and not disclose any business owner's racial identifiers. This is to safeguard the model's intended application.

Summary of Findings

We applied our final classification model to the non-flagged businesses. By doing this, we increased the reported percentage of minority-owned businesses for Mergent Intellect, to 41.75%. We also reduced the error of misclassified businesses by Mergent Intellect by 12%.

Sources

Decoding State-County Census Tracts versus Tribal Census Tracts: https://www.census.gov/newsroom/blogs/random-samplings/2012/07/decoding-state-county-census-tracts-versus-tribal-census-tracts.html

Yelp Business Review Company:

https://www.yelp.com/

Mergent Intellect (MI) Database:

https://www.mergentintellect.com/index.php/search/index

The Virginia Small Business Supply Directory (SBSD):

https://sbsd.virginia.gov/

Chamber of Commerce:

Hispanic: https://www.novahispanicchamber.com/

Black: https://www.northernvirginiabcc.org/

Asian: https://www.aabac.org/

Fairfax County ACS (American Community Survey) census:

https://www.census.gov/quickfacts/fact/table/fairfaxcountyvirginia/POP010220#POP010220

North Carolina Voter Registration Data (Statewide and County Level data available): https://www.ncsbe.gov/results-data/voter-registration-data

Natural Language Processing Documentation:

RaceBert: https://pypi.org/project/racebert/

Rethnicity: https://www.sciencedirect.com/science/article/pii/S2352711021001874

Ethnicolr: https://ethnicolr.readthedocs.io/ethnicolr.html#install

SpaceY: https://spacy.io/

LangDetect: https://pypi.org/project/langdetect/

1964 Information: https://storymaps.arcgis.com/stories/f74a8fbad837435b8e901cc9c04aa345

1964 information: https://www.richmondfed.org/-/media/richmondfedorg/publications/research/econ_focus/2004/winter/pdf/economic_history.pdf