ADAProject

Project Structure:

Swiss products - Our main notebook, were we work in pandas, and perform all the stats

Project - Swiss products on Amazon - Spark notebook with some exploratory analysis of the whole dataset

Companies list xtractor - Helper notebook to extract wikipedia lists

Product comparisons - Helper notebook with some experiments

Interactive viz - Helper notebook with some visualisations in bokeh

Results - PDF with our interpretation of the results

companies - Folder with the extracted company lists

Spark_scripts - Folder with the final scripts we used on the cluster

Datasets - the extracted datasets we used

Charts - Folder with some static visualisations used in the posters

Posters - Folder with the posters

Abstract:
The goal of our project is to study the perception of Swiss products around the world. To achieve this goal we will use a dataset of Amazon reviews all of which assign a rating of the product. Furthermore, we will take in consideration only relevant ratings based on the fact of whether people find them helpful or not. Finally, we will use statistical tests to show if there is a significant difference between Swiss products and products from other countries in the same category.

Data description:
The data set consists of around 550 thousand products from amazon. For each product, the data set contains the following information: Product id, Amazon Standard Identification Number, Title, Product group, Amazon sales rank, ASINs of co-purchased products, Location in product category hierarchy and Reviews. For the Reviews specifically, we have the average number of stars, the individual number of stars for all reviews and the number of people that found a review helpful.

Feasibility and Risks:
The project should be feasibly completed within the required time frame as the data set is relatively small and all the manipulations we will perform are not computationally expensive. However, there is a very great risk that the data set does not contain sufficient data samples of Swiss products. In this case, we would possibly need to recrawl Amazon to extract more data samples or try to scrape the data from an existing crawl like the common crawl. In either case, this will probably take a substantial amount of time.

Deliverables:
All the code used for the processing of the data. Written report of the findings Web site with a visualisation

Time plan:
1.Processing the data set, additional crawling and dada preparation (Mid November - Mid December)
2.Performing data analysis and statistical test on clean data (Mid December - End of December)
3.Project website and summary (Beginning of January)

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Charts		Charts
Datasets		Datasets
Poster		Poster
Spark_scripts		Spark_scripts
companies		companies
.gitignore		.gitignore
Companies list xtractor.ipynb		Companies list xtractor.ipynb
Interactive viz.ipynb		Interactive viz.ipynb
Product comparisons.ipynb		Product comparisons.ipynb
Project - Swiss products on Amazon.ipynb		Project - Swiss products on Amazon.ipynb
README.md		README.md
Results.pdf		Results.pdf
swiss products.ipynb		swiss products.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADAProject

About

Releases

Packages

Contributors 3

Languages

bojanp94/ADAProject

Folders and files

Latest commit

History

Repository files navigation

ADAProject

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages