Skip to content

An docker app allowing you to automate process of extracting images or text from websites. REST API is documented using Swagger.

Notifications You must be signed in to change notification settings

Matixo55/web-scraper

Repository files navigation

Text and images downloader

Description

A Docker application consisting of Flask server and PostgreSQL database. Using REST API create requests for text or images from website. Includes automatic tests.

Prerequisites

  • Docker (What is Docker)
  • All other requirements are automatically installed when creating Docker image.
    You can check pip requirements in requirements.txt and pytest_requirements.txt files.

Installing

  • Download and install Docker (Download)
  • Navigate in console to downloaded folder
  • Create applications with:
docker-compose up -d --build
  • WARNING - after first build test might fail (depends on users' hardware). You can rerun them to ensure everything is correct.
  • If modified database/tables format or properties, before rebuilding use:
    (this will delete previous requests from database)
docker-compose down -v

Usage

Flask application will be listening to requests on localhost:5000 or 0.0.0.0:500.
Available methods, usage and responses are described in swagger.yaml (see swagger.io)

Methods

POST

/get/text/

Create request for text from website. Returns request ID.

/get/images/

Create request for images from website. Returns request ID.

GET

/

Test page to check if server is running.

/download/text/<ID>

Download requested text to app/Text folder.

/download/images/<ID>

Download requested text to app/Images folder.

/list

Get selected number of requests from database

Testing

After building applications, automatic tests will run. You can check results in tests container.

Info

Tests might fail after first start (database won't create on time). You can rerun them manually to ensure everything is working properly.
Having more than one instance running at the same time might cause problems.

Built With

Docker - microservice host
Flask - local server
PostgreSQL - database
Pytest - automatic tests
swagger.io - API documentation
Postman - use for REST API testing

About

An docker app allowing you to automate process of extracting images or text from websites. REST API is documented using Swagger.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published