The goal of the first part of the project was to use Army documents to create a dataset of officer assignments that could assist the Social and Decision Analytics Division (SDAD) and other Army analysts in modeling soldier career progression.




A sample instance of the desired output from this portion of the project might look something like this:


Desired Output Format of Data From Army Documents


The task of extracting a simple dataset from complex source material prompted us to explore a variety of methods for extracting data from a corpus of documents with inconsistent information presentation and formatting. Learn more about the documents and their challenges by viewing our data source page.


Program Contacts: Joel Thurston and Cesar Montalvo