Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user.py 中 script.string 有bug 导致 mysql数据库表user_relation一直是空 #201

Open
myrainbowandsky opened this issue Mar 26, 2020 · 0 comments

Comments

@myrainbowandsky
Copy link

myrainbowandsky commented Mar 26, 2020

报错bug

[2020-03-25 19:07:12,388: ERROR/ForkPoolWorker-1] Task tasks.user.crawl_follower_fans[23f3c1fd-fc6e-4c5b-b0cc-5d5c6a9ad068] raised unexpected: TypeError('expected string or bytes-like object',)
Traceback (most recent call last):
  File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/wentao/programming/weibospider/tasks/user.py", line 19, in crawl_follower_fans
    rs = get_fans_or_followers_ids(uid, 1, 1)
  File "/home/wentao/programming/weibospider/page_get/user.py", line 159, in get_fans_or_followers_ids
    urls_length = public.get_max_crawl_pages(page)
  File "/home/wentao/programming/weibospider/page_parse/user/public.py", line 223, in get_max_crawl_pages
    m = re.search(pattern, script.string)
  File "/usr/lib/python3.6/re.py", line 182, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

user.py 中 script.string 有bug
script.string 这里有bug无法判断是哪个类型一会nontype,一会是<class 'bs4.element.NavigableString'>
无论哪种类型都无法用re 模块抓取

 for script in scripts:
        #print('i am in '+dir_path,'script is '+script)
        #print('script.string:',script.string)
        
        print('type pattern',pattern)
        print('pattern', pattern)
        print('type:',type(script.string))
        
        m = re.search(pattern, script.string)


        if m and 'pl.content.followTab.index' in script.string:
            all_info = m.group(1)
            cont = json.loads(all_info).get('html', '')
            soup = BeautifulSoup(cont, 'html.parser')
            pattern = 'uid=(.*?)&'

            if 'pageList' in cont:
                urls2 = soup.find(attrs={'node-type': 'pageList'}).find_all(attrs={
                    'class': 'page S_txt1', 'bpfilter': 'page'})
                length += len(urls2)
    return length

77535265-b1eabc00-6edd-11ea-8659-d8cee8ef510b

@myrainbowandsky myrainbowandsky changed the title mysql数据库里user_relation这样表 一直是空,是哪里有问题? user.py 中 script.string 有bug 导致 mysql数据库里user_relation这样表 一直是空 Mar 26, 2020
@myrainbowandsky myrainbowandsky changed the title user.py 中 script.string 有bug 导致 mysql数据库里user_relation这样表 一直是空 user.py 中 script.string 有bug 导致 mysql数据库表user_relation一直是空 Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants