Module 'scrapy_selenium' doesn't define any object named 'SeleniumDownloadHandler' #3

tumregels · 2020-03-15T18:54:23Z

In the docs you mention

# You need also to change the default download handlers, like so:
DOWNLOAD_HANDLERS = {
    "http": "scrapy_selenium.SeleniumDownloadHandler",
    "https": "scrapy_selenium.SeleniumDownloadHandler",
}

but in this plugin there is no such SeleniumDownloadHandler.

The text was updated successfully, but these errors were encountered:

tumregels · 2020-03-16T00:17:14Z

Probably you forgot to update the docs. Also tried

DOWNLOAD_HANDLERS = {
        "http": "scrapy_headless.HeadlessDownloadHandler",
        "https": "scrapy_headless.HeadlessDownloadHandler",
}

It still fails

Traceback (most recent call last):
  File "venv/scrapex/src/scrapy-selenium/scrapy_headless/downloader.py", line 82, in get_driver
    driver = self._data.driver
AttributeError: '_thread._local' object has no attribute 'driver'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "venv/scrapex/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "venv/scrapex/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "venv/scrapex/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
  File "venv/scrapex/lib/python3.6/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "venv/scrapex/lib/python3.6/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "venv/scrapex/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "venv/scrapex/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "venv/scrapex/src/scrapy-selenium/scrapy_headless/downloader.py", line 65, in process_request
    driver = self.get_driver(spider)
  File "venv/scrapex/src/scrapy-selenium/scrapy_headless/downloader.py", line 85, in get_driver
    command_executor=self.grid_url, desired_capabilities=self.capabilities
  File "venv/scrapex/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "venv/scrapex/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "venv/scrapex/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "venv/scrapex/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 208, in check_response
    raise exception_class(value)
selenium.common.exceptions.WebDriverException: Message: <!DOCTYPE html>
...

Mikhail010 · 2020-03-24T01:32:08Z

@tumregels Did you found a solution for this?

tumregels · 2020-03-24T15:10:07Z

@Mikhail010 tried my best but failed

CatSirSir · 2020-04-02T08:05:18Z

excuse me =-= I don't want to use splash,and I found this plugin,how does it work well? because I see the latest commit is one years ago, a little afraid to use……

Mikhail010 · 2020-04-05T00:23:02Z

@tumregels I got it working with your suggestion and setting everything inside the spider class like custom settings.

class MySpider(scrapy.Spider):
name = 'myspider'

custom_settings = {
    'SELENIUM_GRID_URL': 'http://127.0.0.1:4444/wd/hub',  # Example for local grid with docker-compose
    'SELENIUM_NODES': 1,  # Number of nodes(browsers) you are running on your grid
    'SELENIUM_CAPABILITIES': {
        "browserName": "chrome",
        "version": "",
        "platform": "ANY",
        "acceptInsecureCerts": True
    },
    'DOWNLOAD_HANDLERS': {
        "http": "scrapy_headless.HeadlessDownloadHandler",
        "https": "scrapy_headless.HeadlessDownloadHandler",
    },
    'SELENIUM_PROXY': 'http://docker.for.mac.host.internal:24000'
}

...

Mikhail010 · 2020-04-05T00:27:24Z

@CatSirSir I struggled to get it working but after that it has being working fine. Take into account that I started using it last week so I cannot give you a broad opinion. If I were you I would give it a try, I also use splash but It is giving me troubles with sites with angular and other with the ability to detect headless browsers.

BruceDone · 2021-04-28T07:46:16Z

@Mikhail010 hi , i fix it by this pr #6

kuzovkov · 2021-10-13T12:22:49Z

Thanks @Mikhail010

I get it working with such configs:
docker-compose.yml:

  selenium-hub:
    image: selenium/hub
    networks:
      - back
    ports:
      - 4444:4444

  chrome:
    image: selenium/node-chrome
    links:
      - selenium-hub:hub
    environment:
      - HUB_PORT_4444_TCP_ADDR=selenium-hub:4444/grid/register/
      - GRID_TIMEOUT=180 # Default timeout is 30s might be low for Selenium
    volumes:
      - /dev/shm:/dev/shm
    networks:
      - back


networks:
  back:
    driver: bridge

In spider:

from scrapy_headless import HeadlessRequest
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities


class TestSpider(scrapy.Spider):
    name = 'test'
    custom_settings = {
        'SELENIUM_GRID_URL': 'http://selenium-hub:4444/wd/hub',  # Example for local grid with docker-compose
        'SELENIUM_NODES': 1,  # Number of nodes(browsers) you are running on your grid
        'SELENIUM_CAPABILITIES': DesiredCapabilities.CHROME,
        'DOWNLOAD_HANDLERS': {
            "http": "scrapy_headless.HeadlessDownloadHandler",
            "https": "scrapy_headless.HeadlessDownloadHandler",
        }
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module 'scrapy_selenium' doesn't define any object named 'SeleniumDownloadHandler' #3

Module 'scrapy_selenium' doesn't define any object named 'SeleniumDownloadHandler' #3

tumregels commented Mar 15, 2020

tumregels commented Mar 16, 2020 •

edited

Loading

Mikhail010 commented Mar 24, 2020

tumregels commented Mar 24, 2020

CatSirSir commented Apr 2, 2020

Mikhail010 commented Apr 5, 2020

Mikhail010 commented Apr 5, 2020

BruceDone commented Apr 28, 2021 •

edited

Loading

kuzovkov commented Oct 13, 2021 •

edited

Loading

Module 'scrapy_selenium' doesn't define any object named 'SeleniumDownloadHandler' #3

Module 'scrapy_selenium' doesn't define any object named 'SeleniumDownloadHandler' #3

Comments

tumregels commented Mar 15, 2020

tumregels commented Mar 16, 2020 • edited Loading

Mikhail010 commented Mar 24, 2020

tumregels commented Mar 24, 2020

CatSirSir commented Apr 2, 2020

Mikhail010 commented Apr 5, 2020

Mikhail010 commented Apr 5, 2020

BruceDone commented Apr 28, 2021 • edited Loading

kuzovkov commented Oct 13, 2021 • edited Loading

tumregels commented Mar 16, 2020 •

edited

Loading

BruceDone commented Apr 28, 2021 •

edited

Loading

kuzovkov commented Oct 13, 2021 •

edited

Loading