Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 647 Bytes

README.md

File metadata and controls

18 lines (13 loc) · 647 Bytes

BIMD

Business insights from Multimedia data: Text and Audio The project objective is to gain business insights from firms’ unstructured data: textual and audio, using big data technology and machine learning method.

Dep: python 3.6 unidecode pandas edger

Usage: python request_edgar.py

The notebook file "data collection" is for dev and test the download of 10-k firms The notebook file "read pickles" showed how to use the data done loaded

The data of snp500 companies is around 3G, which is too large to upload to github

The Matching and Segmentation rules may not be perfect, you can change the matching rules to adapt to your project