-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize_address_record()
raises unparseable address error when using full street directionals
#31
Comments
I have a similar issue with this address: 1345 Towne Lake Hills South Drive, Woodstock, GA, 30189 |
Unfortunately, this is an issue with the
(OrderedDict([('AddressNumber', '38350'),
|
@fablet, I appreciate your input. Like you, I also encountered the parsing error using the Parserator API at https://parserator.datamade.us/usaddress. However, I've successfully used import usaddress
address = "38350 40TH ST EAST 100 PALMDALE CA 93552"
print(usaddress.parse(address))
# [('38350', 'AddressNumber'), ('40TH', 'StreetName'), ('ST', 'StreetNamePostType'), ('EAST', 'StreetNamePreDirectional'), ('100', 'StreetName'), ('PALMDALE', 'PlaceName'), ('CA', 'StateName'), ('93552', 'ZipCode')] It seems the latest version of usaddress might have resolved this pre- vs post-directional issue, however, I'm uncertain about the usaddress version utilized by the Parserator API, Unfortunately, import usaddress
address = "38350 40TH ST EAST 100 PALMDALE CA 93552"
usaddress.tag(address)
# Traceback (most recent call last):
# File "/home/user/usaddress_parse_error/usaddress_parse_error.py", line 5, in <module>
# usaddress.tag(address)
# File "/home/user/.cache/pypoetry/virtualenvs/usaddress-parse-error-aadNbsKj-py3.10/lib/python3.10/site-packages/usaddress/__init__.py", line 177, in tag
# raise RepeatedLabelError(address_string, parse(address_string),
# usaddress.RepeatedLabelError:
# ERROR: Unable to tag this string because more than one area of the string has the same label
# ORIGINAL STRING: 38350 40TH ST EAST 100 PALMDALE CA 93552
# PARSED TOKENS: [('38350', 'AddressNumber'), ('40TH', 'StreetName'), ('ST', 'StreetNamePostType'), ('EAST', 'StreetNamePreDirectional'), ('100', 'StreetName'), ('PALMDALE', 'PlaceName'), ('CA', 'StateName'), ('93552', 'ZipCode')]
# UNCERTAIN LABEL: StreetName
# When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly
# To report an error in labeling a valid name, open an issue at https://github.com/datamade/usaddress/issues/new - it'll help us continue to improve probablepeople!
# For more information, see the documentation at https://usaddress.readthedocs.io/ So it seems that we are trading one parsing error for another. That being said, the newest version of Do you know if there are plans to update usaddress-scourgify's dependency on usaddress from 0.5.9 to 0.5.10 in the near future? I hoping that this would avoid the error I'm seeing with Thank you again for assisting with this issue. |
Ok, I just tried forking this repo and updating its usaddress dependency to 0.5.10. Unfortunately, this did not resolve my issues: from scourgify import normalize_address_record
address = "38350 40TH ST EAST 100 PALMDALE CA 93552"
normalize_address_record(address)
# Traceback (most recent call last):
# File "/home/user/usaddress_parse_error/usaddress_parse_error.py", line 5, in <module>
# normalize_address_record(address)
# File "/home/user/.cache/pypoetry/virtualenvs/usaddress-parse-error-aadNbsKj-py3.10/lib/python3.10/site-packages/scourgify/normalize.py", line 159, in normalize_address_record
# return normalize_addr_str(
# File "/home/user/.cache/pypoetry/virtualenvs/usaddress-parse-error-aadNbsKj-py3.10/lib/python3.10/site-packages/scourgify/normalize.py", line 267, in normalize_addr_str
# raise UnParseableAddressError(None, None, addr_rec)
# scourgify.exceptions.UnParseableAddressError: UNPARSEABLE ADDRESS: Unable to break this address into its component parts, OrderedDict([('address_line_1', '38350 40TH ST EAST 100 PALMDALE CA 93552'), ('address_line_2', None), ('city', None), ('state', None), ('postal_code', None)]) It probably makes the most sense to open a new issue within the usaddress repo and try to address the error with |
@fablet, I've opened this issue to address the root of the problem. Thanks again for the support. |
The below example raises an unparseable address error:
Abbreviating the street directional value (changing
EAST
toE
) avoids this error and produces the expected results:Is it possible to look into this and ensure that full directional names do not raise unparseable address errors? The USPS prefers abbreviated directionals, but still considers full names acceptable.
Please let me know if you have any questions about this. Thank you in advance for your help troubleshooting this!
The text was updated successfully, but these errors were encountered: