Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Don't parse dates with more than 4 digits for the year #556

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

heinrich5991
Copy link

The regex was broken before, using (?!…) instead of (?<=…).

The regex was broken before, using `(?!…)` instead of `(?<=…)`.
skius
skius previously approved these changes Aug 19, 2019
@MasterofJOKers
Copy link
Contributor

Why do we need to lookbehind and lookahead? Can't we get away with something like this?

r = re.compile(r'(?:\b|[_-])('\
               r'(?:[0-9]{1,2}[./-][0-9]{1,2}[./-](?:[0-9]{4}|[0-9]{2}))|'\
               r'(?:(?:[0-9]{4}|[0-9]{2})[./-][0-9]{1,2}[./-][0-9]{1,2})|'\
               r'(?:[0-9]{1,2}\. +[^\W\d_]{3,9} (?:[0-9]{4}|[0-9]{2}))|'\
               r'(?:[^\W\d_]{3,9}(?: [0-9]{1,2},)? [0-9]{4})'\
               r')(?:\b|[_-])')

In some manual testing, it seems to match everything matched in the unit tests. We can then use m.group(1) for the date-part of the matched string.

👍 for the additional tests.

@heinrich5991
Copy link
Author

Updated with the suggestion to not use lookahead/lookbehind.

@heinrich5991
Copy link
Author

Removed all the superfluous (?:).

@MasterofJOKers
Copy link
Contributor

Removed all the superfluous (?:).

Great, could you also remove the superfluous \ in the [], while you're at it?

MasterofJOKers
MasterofJOKers previously approved these changes Nov 2, 2019
@heinrich5991
Copy link
Author

Removed the superfluous backslashes in the regex.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants