spiderBet

Scrapy spider to crawl and scrap betting tips from the biggest betting community website in Portugal (PT-PT: www.academiadasapostas.com | EN | ES)

Introduction

The betting tips are provided everyday to anybody who access the website but, in order to check all the available tips you must follow a tedious process of opening one page per tip/game and also get past some annoying ads.

This spider checks the webapp on the homepage which provides game information such as: game identifier id, tip available, starting hour, teams playing and live/ending score.

Not every match in the webapp has a betting tip available so it checks only the ones who have, redirecting itself to the game page and finally extracting only the tip from that new page. Quick and simple.

Process

Scraps tips every morning for the day matches
Saves them in a txt file
Uploads txt file to a Google Sheets using Google Drive and Google Sheets API
Scraps games final scores every night for the tipped matches
Update the online worksheet with the final scores (Not working 100%)

This last scraping to get the final score is part of a interesting feature yet to be implemented.

This feature would be an automatic way of checking if the tip was sucessful in predicting or not. With this information, every night it would calculate the day profit balance if one got to follow every tip given.

SPREADSHEET: Google Sheets spreadsheet with the daily scraping (text in Portuguese)

(Server is not running anymore, so it isn't up to date.)

Usage

$ scrapy crawl bets

$ scrapy crawl scores

There are 2 different spiders, one to scrap the tips (bets) and the other to scrap the scores (scores). I have them being called everyday on a cronjob running in my server: the bets one at 10am and the scores one at 11pm after the games are finished.

Afterwards, I call a python script to upload/update the google sheets:

$ python3 sheets.py

In the cronscript I also include this python script after the scrapy executions.

Issues

Scores spider scraping with sucess but broken when run from cron
Sometimes one scrapy crawl isn't enough to get all available data, most likely for requests failure, my workaround is crawling 3 times in the cron script with 1 minute interval
Data gathering only works for the day matches even if the webapp has tips for the next days that the user can see with some interaction
Spreadsheet/txt file order is from older to newer games/tips scraped, for an easier view it should show last gathered data on top

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
spiderbet		spiderbet
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
scrapy.cfg		scrapy.cfg
sheets.py		sheets.py
sheets_quickstart.py		sheets_quickstart.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spiderBet

Introduction

Process

Usage

Issues

About

Releases

Packages

Languages

License

amsimoes/spiderBet

Folders and files

Latest commit

History

Repository files navigation

spiderBet

Introduction

Process

Usage

Issues

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages