Sectoring Open Source Software
Symposium Presentation of the 2020 DSPG Project
Sectoring Open Source Software:
Where Do GitHub Contributions Come From?
Crystal Zang, Morgan Klutzke, Daniel Bullock,
Brandon Kramer, Gizem Korkmaz, and José Bayoán Santiago Calderón
Sponsors: Carol Robbins (NCSES) and Ledia Guci (NCSES)
Why Study Open Source Software?
Current NCSES and other economic indicators do not measure
the scope and impact of OSS developed outside the business sector
Sectoring Open Source Software on GitHub
Our two main goals for the 2020 DSPG Summer Project were to:
(1) Classify GitHub users into one of five economic sectors
(Academic, Business, Household, Government and Non-Profit)
(2) Examine where GitHub users are located around the world
Methods
We relied on aspects of computational text analysis to standardize entries
(regular expressions, list matching, and bigrams)
Sectoring Results
Twenty percent of the GHTorrent data (~2.1 million) provides email address or work
affiliation for sectoring, which gives us ~420,000 GitHub users.
Most users fall into the business sector followed by
the academic, household and government sectors.
Business Sector
Most OSS producing companies are large tech companies based in Silicon Valley
Academic Sector
US-based academic institutions are the largest producers of OSS
Most of the top OSS-producing universities are close to
major tech hubs in CA, MA, NY, TX and WA
Geographic Analyses
Countries Where Github Users are Located
Most GitHub users based in the US are around 4 times higher than in China
U.S. States Where Github Users are Located
Within the US, most GitHub users are based on the coasts and near major tech hubs
Cities Where Github Users are Located
Silicon Valley is the world’s most prominent OSS hotspot
followed by London, NYC, Moscow and Beijing
City-Level GitHub Users Collaboration Network
Main Findings
The majority of users come from the business sector followed
by the academic, government and household sectors
Most OSS production seems to be coming from the business sector
Most GitHub users are based in the US (both in general and in the academic sector)
Major universities in California may also be benefitting
from the proximity of Silicon Valley’s OSS production
Challenges & Future Directions
Hoping to scrape more user data to improve classification accurary
Improving the government, non-profit and household classification systems
Determine how to classify contributions at the intersection of multiple sectors
Conducting network analysis within and across sectors to understand collaboration tendencies