Business insights from Multimedia data: Text and Audio The project objective is to gain business insights from firms’ unstructured data: textual and audio, using big data technology and machine learning method.
Dep: python 3.6 unidecode pandas edger
Usage: python request_edgar.py
The notebook file "data collection" is for dev and test the download of 10-k firms The notebook file "read pickles" showed how to use the data done loaded
The data of snp500 companies is around 3G, which is too large to upload to github
The Matching and Segmentation rules may not be perfect, you can change the matching rules to adapt to your project