-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Rawkuma #28
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contrib, I took a pass over the code and this looks fine. I'm not actually familiar with rawkuma so I'll trust that the parsing is fine.
proxy/sources/rawkuma.py
Outdated
try: | ||
title = soup.title.text.split('– Rawkuma')[0].strip() | ||
except AttributeError: | ||
return None | ||
try: | ||
author = "None" | ||
artist = "" | ||
for element in soup.select('div.infox > div > div.fmed > b'): | ||
if element.text == 'Author': | ||
author = element.next_sibling.next_sibling.text.strip() # Need two next_siblings because one gives the '\n' between elements | ||
elif element.text == 'Artist': | ||
artist = element.next_sibling.next_sibling.text.strip() | ||
|
||
if not artist: artist = author | ||
except AttributeError: | ||
author = "None" | ||
try: | ||
description = "" | ||
paragraphs = soup.select_one('div.infox > div > div[itemprop="description"]').descendants | ||
for element in paragraphs: | ||
if type(element) == NavigableString and element.parent.name != 'a': | ||
description += str(element) | ||
except AttributeError: | ||
description = "No description." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can probably simplify these statements by declaring the defaults up-front rather than setting them in the rescue, then having these handlers try to parse more specific metadata from bs4. Would make this a bit easier to follow as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work without removing the try-except blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have both, the try-except block will catch the AttributeError
and not set whatever value is being assigned if it fails. eg:
title = "Default title"
description = "Default description"
try:
title = "Set title"
except AttributeError:
pass
try:
def foo():
raise AttributeError("foo")
description = foo()
except AttributeError:
pass
# Expected result:
# Set title Default description
print(title, description)
Alternatively, if you wanted to be creative you can have a method that takes a lambda or function and a default param so you can do something like:
def try_parse(lambda, default):
try:
return lambda()
except:
return default
title = try_parse(lambda: soup.title.text.split('– Rawkuma')[0].strip(), "Default title")
description = try_parse(..., "Default description")
# etc
This is a weakly held opinion though, so feel free to ignore.
static_global/js/main.js
Outdated
case /rawkuma\.com/.test(text): | ||
result = /rawkuma\.com\/(manga\/)?[A-Za-z0-9-]+/i.exec(text); | ||
if(!result) return message('Reader could not understand the given link.', 1); | ||
result = '/rk/' + text; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you change the indentation to use tabs here? I think the original file used tabs so it's probably best if we keep it consistent.
proxy/sources/rawkuma.py
Outdated
r = re.compile(r'"images"\s?:\s?\[[^]]*\]') | ||
m = re.search(r, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooc: any reason you chose to compile the regexes here but not in the other methods where you used regexes?
proxy/sources/rawkuma.py
Outdated
def rk_scrape_common(self, meta_id): | ||
series_url = self.get_series_url(meta_id) | ||
resp = get_wrapper(series_url) | ||
if resp.status_code == 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I realize that this is cargo-culted from the other proxies so feel free to ignore) but you could simplify the indentation of these guards by checking the inverse and returning if it's true. Eg.
if resp.status_code != 200:
return
It's just a bit cleaner this way.
Thanks for the feedback. A lot of this is because I copied from other proxies as a starting point and didn't touch anything I didn't need to. I can definitely clean it up a bit. |
Yeah I realized, there's definitely a lot of code here that I recognize from years ago. It's not great code though, so it'd be nice if we can clean it up going forward. I'm fine with pushback if you're good with this as-is though. |
Adds a proxy for Rawkuma and adds logic to main page to recognize Rawkuma links.