GitHub - coderjolly/news-recommender: This is a news recommender system that uses beautiful-soup to scrape news articles, their categories and descriptions to create a data dump. It then uses word embedding techniques such tf-idf, word2vec for content based news recommendation and LightRF, LightFM to explore hybrid and collaborative filtering based recommender models.

News Recommender

You have been recruited as data scientists by a start-up, JhakaasNewsVala, based out of Mumbai. The company is developing an app that promises to deliver a unique news experience to its app users.

The company has identified it target market as working professionals. Recognising the fact that retention (defined here as a visit after the first visit) is a huge issue for apps, they understand the need to make an impact on the first visit itself. The problem however is that they know nothing about the user interests or demographics at the time to personalise the news feed to them.

What to do ?

The task requires us to make a news article recommender that utilises a user profile (rating given by user or interests selected by user). But the company, JhakaasNewsVala hasn't provided any such data nor a texual news corpus.

So, using beautiful soup, selenium and python data-wrangling techniques for web-scraping, a news articles corpus is generated with their categories and descriptions to create data dump.

As shown in the above figure, a csv data dump namely 0_news_articles.csv is generated after scrapping news articles from a famous Indian News website. The csv dump is as follows:

Data Dump	Characteristic Fields or Columns
0_news_articles.csv	`Articles ID`, `Title`, `Description`, `Date`, `Category` and `URL`

In order to understand the imapct after first visit, variables like User ID, Article Ranks, Clickstream, Time Spent etc need to be provided from the company but unfortunately, the problem statement doesn't. So, these variables have been generated using random functional generators for populating the dataset and a user_interests.csv is generated.
To make recommendations based on Ratings, we need to generate ratings as the problem statement doesn't provide any. So, a user_ratings.csv is generated using random functional generators for populating the dataset. The method followed for generating ratings encompases a range of [0-5] where "0" for least interesting article and 5 for the most interesting article. This ratings dataframe is added to the user_interests.csv dataframe.

Next, using news articles and user_interests, a Content Based Recommendation notebook is prepared by applying word embedding techniques such tf-idf, word2vec for performing content based news recommendation.

Finally, using user_rated_articles and news articles, a Collaborative Filtering Recommendation notebook is prepared by incorporating used defined ratings and ranking to further LightRF and LightFM for performing collaborative filtering based news recommendation.

Credis and Guidance

Karanjot Vilkhu

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
figures		figures
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Recommender

What to do ?

Credis and Guidance

About

Releases

Packages

Languages

License

coderjolly/news-recommender

Folders and files

Latest commit

History

Repository files navigation

News Recommender

What to do ?

Credis and Guidance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages