This document contains a list of libraries and resources for web scraping in PHP.
- Libraries
- Popular Web Scraping Stacks
- Guides and Tutorials
Note: All selected libraries are either actively maintained or widely used.
- PHP cURL: A built-in PHP library based on libcurl to connect and communicate to many different types of servers with many different types of protocols
- Guzzle: A PHP HTTP client that makes it easy to send HTTP requests and trivial to integrate with web services [Guzzle proxy integration]
- HttpClient: A Symfony component implementing a low-level HTTP client
- Buzz: A lightweight (<1000 lines of code) PHP 7.1 library for issuing HTTP requests
- amphp/http-client: An advanced async HTTP client library for PHP, enabling efficient, non-blocking, and concurrent requests and responses
- Requests: A humble HTTP request library. It simplifies how you interact with other sites and takes away all your worries.
- HTTPFul: A Chainable, REST-friendly, PHP HTTP client. A sane alternative to cURL. with support for both PHP stream wrappers and cURL
- Sockets: A built-in PHP extension that implements a low-level interface to the socket communication functions based on the popular BSD sockets, providing the possibility to act as a socket server as well as a client
- Ratchet: A PHP library for asynchronously serving WebSockets
- Pawl: An asynchronous WebSocket client in PHP
- Simple Html Dom Parser for PHP: A modern simple HTML DOM parser for PHP
- HTML5-PHP: An HTML5 parser and serializer for PHP
- DiDOM: A simple and fast HTML and XML parser
- QueryPath : A PHP library for HTML(5)/XML querying (CSS 4 or XPath) and processing (like jQuery) with PHP8.3 support
- DomCrawler: A Symfony component that eases DOM navigation for HTML and XML documents
- PHP Html Parser: An HTML DOM parser. It allows you to manipulate HTML. Find tags on an HTML page with selectors just like jQuery
- DOM: A built-in PHP extension that allows operations on XML and HTML documents through the DOM API with PHP
- XML Document Parser PHP: A framework-agnostic package that provide a simple way to parse XML to array without having to write a complex logic
- ParseCsv: A CSV data parser for PHP
- JSON Parser: A zero-dependencies lazy parser to read JSON of any dimension and from any source in a memory-efficient way
- JsonMachine: An efficient, easy-to-use, and fast PHP JSON stream parser
- PdfParser: A standalone PHP library, provides various tools to extract data from a PDF file
- EmailReplyParser: A PHP library for parsing plain text email content
- CommonMark PHP: An highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs
- PHP Markdown: A parser for Markdown and Markdown Extra derived from the original Markdown.pl by John Gruber
- Parsedown: A better Markdown parser in PHP
- Dallgoot : YAML library for PHP: A PHP library to load and parse YAML file to PHP datatypes equivalent
- PHP-SQL-Parser: A pure PHP SQL (non validating) parser w/ focus on MySQL dialect of SQL
- SQL Parser: A validating SQL lexer and parser with a focus on MySQL dialect
- SimpleXLSX: A PHP library to parse and retrieve data from Excel XLSx files
- PHP Domain Parser: Public suffix list based domain parsing implemented in PHP
- RSS & Atom Feeds for PHP: A small and easy-to-use library for consuming RSS and Atom feeds
- PHP CSS Parser: A Parser for CSS Files written in PHP. Allows extraction of CSS files into a data structure, manipulation of said structure and output as (optimized) CSS
- SitemapParser: An XML sitemap parser class compliant with the Sitemaps.org protocol
- robots-txt-parser: A PHP class for parse all directives from robots.txt files according to specifications
- Crawler: A library for rapid (web) crawler and scraper development
- Roach: A complete web scraping toolkit for PHP
- PHP-Spider: A configurable and extensible PHP web spider
- Embed: A PHP library to get info from any web service or page
- PHPScraper: A versatile web-utility for PHP
- Bright Data's proxy services: A proxy network with over 72 million IPs offering premium residential, datacenter, mobile, and ISP proxies. Supports state, country, ZIP, and ASN level targeting across 195 countries. Works with any HTTP client or scraping library [Bright Data's solution]
- CAPTCHA Solver: A rapid and automated CAPTCHA solver that can solve challenges from reCAPTCHA, hCaptcha, px_captcha, SimpleCaptcha, GeeTest CAPTCHA, and more [Bright Data's solution]
- PHP Module for 2Captcha API: A PHP package for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, funcaptcha, geetest and solve any other captchas
- captcha-solver-php: A PHP-based easy implementation for solving any type of captcha by Metabypass
- Panther: A browser testing and web crawling library for PHP and Symfony
- php-webdriver: A PHP client for Selenium/WebDriver protocol
- chrome-php/chrome: A library to instrument headless chrome/chromium instances from PHP
- Mink: PHP web browser emulator abstraction
- PHP JSON library: A PHP simple library for managing JSON files
- CSV: A library to ease parsing, writing and filtering CSV in PHP
- mPDF: A PHP library generating PDF files from UTF-8 encoded HTML
- PhpSpreadsheet: A pure PHP library for reading and writing spreadsheet files
- PHPWord: A A pure PHP library for reading and writing word processing documents
- PHPPowerPoint: A pure PHP library for reading and writing presentations documents
- Stringy: A PHP string manipulation library with multibyte support
- ANSI to HTML5: An ANSI to HTML5 converter
- Brick\DateTime: A date and time library for PHP
- Brick\Money: A money and currency library for PHP
- PHP Prices: A simple PHP library for complex monetary prices management
- LibPhoneNumber for PHP: A PHP version of Google's phone number handling library
- Brick\PhoneNumber: A phone number library for PHP
- PhpUnitsOfMeasure: A library for handling physical quantities and the units of measure in which they're represented
- Urlify: A fast PHP slug generator and transliteration library that converts non-ascii characters for use in URLs
- Slugify: A PHP library to convert a string to a slug. Includes integrations for Symfony, Silex, Laravel, Zend Framework 2, Twig, Nette and Latte
- Purl: A a simple Object Oriented URL manipulation library for PHP 7.2+
- Uri: A PHP package that provides simple and intuitive classes to manage URIs in PHP
- Url: A Swiss Army knife for URLs
- amphp/parallel: An advanced parallelization library for PHP, enabling efficient multitasking, optimizing resource use, and application responsiveness through multiple CPU threads
- React - A library for event-driven, non-blocking I/O with PHP
- Evenement: a very simple event dispatching library for PHP
- Event - An event library with a focus on domain events
- Salt: Software to automate the management and configuration of any infrastructure or application at scale
- Puppet: A server automation framework and application
- PHP Cron Scheduler: A PHP cron job scheduler
- Crunz: A PHP-based job scheduler
-
HTTP Client: PHP cURL, Guzzle, or Symfony's HttpClient
-
HTML Parser: Simple Html Dom Parser for PHP, DomCrawler, HTML5-PHP, or DiDOM
- Crawler, Roach, or PHP-Spider
- Panther or php-webdriver