257,123 abstracts were identified as big data when using TF-IDF to vectorize the abstracts and KNN as a classifier.
Above can be seen a sample of the abstracts that were identified as big data using TF-IDF. Information such as their department, agency, project dates, and costs were recorded. This information will later be used to model further trends in the abstracts.484362 abstracts were identified as relating to big data by using Doc2Vec to classify the documents and KNN to classify them.
Above can be seen a sample of the abstracts that were identified as big data using Doc2Vec. Information such as their department, agency, project dates, and costs were recorded. This information will later be used to model further trends in the abstracts.