Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Rawkuma #28

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

Conversation

plax-00
Copy link
Contributor

@plax-00 plax-00 commented Dec 12, 2023

Adds a proxy for Rawkuma and adds logic to main page to recognize Rawkuma links.

Copy link
Collaborator

@funkyhippo funkyhippo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contrib, I took a pass over the code and this looks fine. I'm not actually familiar with rawkuma so I'll trust that the parsing is fine.

Comment on lines 63 to 86
try:
title = soup.title.text.split('– Rawkuma')[0].strip()
except AttributeError:
return None
try:
author = "None"
artist = ""
for element in soup.select('div.infox > div > div.fmed > b'):
if element.text == 'Author':
author = element.next_sibling.next_sibling.text.strip() # Need two next_siblings because one gives the '\n' between elements
elif element.text == 'Artist':
artist = element.next_sibling.next_sibling.text.strip()

if not artist: artist = author
except AttributeError:
author = "None"
try:
description = ""
paragraphs = soup.select_one('div.infox > div > div[itemprop="description"]').descendants
for element in paragraphs:
if type(element) == NavigableString and element.parent.name != 'a':
description += str(element)
except AttributeError:
description = "No description."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can probably simplify these statements by declaring the defaults up-front rather than setting them in the rescue, then having these handlers try to parse more specific metadata from bs4. Would make this a bit easier to follow as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work without removing the try-except blocks?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have both, the try-except block will catch the AttributeError and not set whatever value is being assigned if it fails. eg:

title = "Default title"
description = "Default description"

try:
    title = "Set title"
except AttributeError:
    pass

try:
    def foo():
        raise AttributeError("foo")
    description = foo()
except AttributeError:
    pass

# Expected result:
# Set title Default description
print(title, description)

Alternatively, if you wanted to be creative you can have a method that takes a lambda or function and a default param so you can do something like:

def try_parse(lambda, default):
    try:
        return lambda()
    except:
        return default

title = try_parse(lambda: soup.title.text.split('– Rawkuma')[0].strip(), "Default title")
description = try_parse(..., "Default description")
# etc

This is a weakly held opinion though, so feel free to ignore.

Comment on lines 66 to 70
case /rawkuma\.com/.test(text):
result = /rawkuma\.com\/(manga\/)?[A-Za-z0-9-]+/i.exec(text);
if(!result) return message('Reader could not understand the given link.', 1);
result = '/rk/' + text;
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you change the indentation to use tabs here? I think the original file used tabs so it's probably best if we keep it consistent.

Comment on lines 168 to 169
r = re.compile(r'"images"\s?:\s?\[[^]]*\]')
m = re.search(r, data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooc: any reason you chose to compile the regexes here but not in the other methods where you used regexes?

def rk_scrape_common(self, meta_id):
series_url = self.get_series_url(meta_id)
resp = get_wrapper(series_url)
if resp.status_code == 200:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I realize that this is cargo-culted from the other proxies so feel free to ignore) but you could simplify the indentation of these guards by checking the inverse and returning if it's true. Eg.

if resp.status_code != 200:
  return

It's just a bit cleaner this way.

@plax-00
Copy link
Contributor Author

plax-00 commented Dec 23, 2023

Thanks for the feedback. A lot of this is because I copied from other proxies as a starting point and didn't touch anything I didn't need to. I can definitely clean it up a bit.

@funkyhippo
Copy link
Collaborator

Thanks for the feedback. A lot of this is because I copied from other proxies as a starting point and didn't touch anything I didn't need to. I can definitely clean it up a bit.

Yeah I realized, there's definitely a lot of code here that I recognize from years ago. It's not great code though, so it'd be nice if we can clean it up going forward. I'm fine with pushback if you're good with this as-is though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants