The aim - is to develop a model that will give accurate predictions for the customer's feedback (in russian language) on electronic goods. There is no trining set. It should be collected.
The first step was to build a model with the given data about on films review. At this step, I learnt how to analyze the text, what metrics to use, how the classifiers and their settings are better able to cope with this task. Then, the second step is sentiment analysis of product reviews in the store. This task is divided into the following sub-tasks:
- Data collection / parsing of a real online store
- Data processing and cleaning
- Model selection, cross-validation tests
- Make an interactive demonstration for your algorithm
- parsingModelDev - steps lelated to data collection and model development:
- data parsing (BeautifulSoup bs4)
- data processing
- feature extraction (CountVectorizer and TfidfVectorizer from sklearn.feature_extraction.text)
- creating Pipeline
- model selection (cross_val_score and GridSearchCV from sklearn.model_selection)
- considered models: SVC, SGD and Logistic
- Final Jupyter Nonebook with all steps
- webServer - steps lelated to creating small server on Flask