webcrawling

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-w…

crawler csharp dotnetcore scraping crawling webscraper scrapy entity-framework-core webcrawler webscraping scrapy-crawler ddd-architecture htmlagilitypack webcrawling webcrawler-htmlagilitypack

Updated Dec 20, 2022
C#

DedSecInside / gotor

Star

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

go docker cli golang osint command-line service rest-api tor information-extraction http-server command-line-tool webcrawler webscraping hacktoberfest golang-server webcrawling torbot osint-tools

Updated Apr 21, 2024
Go

feddelegrand7 / ralger

Star

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

r rstats webscraping webcrawling webscraper-website dataextraction

Updated Jul 16, 2024
R

DwarfThief / Raspagem-de-dados-para-iniciantes

Star

Raspagem de dados para iniciante usando Scrapy e outras libs básicas

python opensource web-crawler jupyter-notebook scrapy hacktoberfest spyder estudo datascraping webcrawling raspagem-de-dados

Updated Jun 5, 2024
Python

voliveirajr / seleniumcrawler

Star

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

python scraper scraping selenium scrapy selenium-webdriver asp-net webcrawler scrapper scraping-websites webcrawling

Updated Feb 28, 2019
Python

andersonkrs / malheatmap

Star

An extension for tracking your activities on myanimelist.net

ruby rails myanimelist webcrawling

Updated Nov 23, 2024
Ruby

scrapyman / data-api

Star

Scrapyman数据接口服务。提供：淘宝、小红书、京东、抖音（电商）、抖音（视频）、快手、蒲公英、星图、拼多多、微信公众号、大众点评、哔哩哔哩、知乎、微博、贝壳、Bigo、Temu、Lazada、Shopee、SHEIN、百度指数、携程、Boss直聘、智联招聘、拉钩、今日头条、Facebook、Youtube、Instgram、Twitter。爬虫、采集、scrapy、接口、API。

api data crawl taobao jingdong webcrawling kuaishou douyin pinduoduo xiaohongshu taobao-api xiaohongshu-api pugongying

Updated Nov 28, 2024

datawizard1337 / ARGUS

Star

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

python scraping crawling scrapy webscraping scrapyd webcrawling

Updated Jan 13, 2022
Python

Aavache / LLMWebCrawler

Star

A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.

python nlp api machine-learning raylib distributed-computing transformer ray webcrawler webcrawling rag pydantic fastapi huggingface milvus vector-database large-language-models llm

Updated Oct 15, 2023
Python

kafagy / fifa-FUT-Data

Star

Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB

mysql python csv database video-game soccer dataset webscraping fifa fifa-ultimate-team webcrawling fifa18 futhead fifa19 futbin-prices futbin player-data

Updated Nov 26, 2019
Python

flickz / newspaperjs

Star

News extraction and scraping. Article Parsing

nodejs crawler scraper news news-aggregator webscraping webcrawling

Updated Mar 4, 2023
HTML

Skumarr53 / Stock-Fundamental-data-scraping-and-analysis

Star

Project on building a web crawler to collect the fundamentals of the stock and review their performance in one go

automation selenium python3 web-scraping webcrawling datacollection stock-fundamentalplots

Updated Mar 8, 2021
Jupyter Notebook

spieredd / Ultimate-Guide-to-Sneaker-Bot-Creation

Star

The Ultimate Guide to Sneaker Bot 🤖 Creation using JavaScript and NodeJS ☣️ . Learn how to get the most out of tools like the Chrome devTools, and JS Libraries like Puppeteer or Axios.