`normalize_address_record()` raises unparseable address error when using full street directionals #31

philiporlando · 2023-07-31T23:15:41Z

The below example raises an unparseable address error:

from scourgify import normalize_address_record

address = "38350 40TH ST EAST 100 PALMDALE CA 93552"

normalize_address_record(address)
# scourgify.exceptions.UnParseableAddressError: UNPARSEABLE ADDRESS: Unable to break this address into its component parts, OrderedDict([('address_line_1', '38350 40TH ST EAST 100 PALMDALE CA 93552'), ('address_line_2', None), ('city', None), ('state', None), ('postal_code', None)])

Abbreviating the street directional value (changing EAST to E) avoids this error and produces the expected results:

from scourgify import normalize_address_record

address = "38350 40TH ST E 100 PALMDALE CA 93552"

normalize_address_record(address)
# OrderedDict([('address_line_1', '38350 40TH ST E'), ('address_line_2', 'UNIT 100'), ('city', 'PALMDALE'), ('state', 'CA'), ('postal_code', '93552')])

Is it possible to look into this and ensure that full directional names do not raise unparseable address errors? The USPS prefers abbreviated directionals, but still considers full names acceptable.

Please let me know if you have any questions about this. Thank you in advance for your help troubleshooting this!

The text was updated successfully, but these errors were encountered:

zak-flex · 2023-08-29T18:39:34Z

I have a similar issue with this address: 1345 Towne Lake Hills South Drive, Woodstock, GA, 30189
This variation is parseable: 1345 Towne Lake Hills S Dr, Woodstock, GA, 30189'=

fablet · 2023-12-14T22:18:00Z

Unfortunately, this is an issue with the usaddress package. You can check tagging behaviors in their UI: https://parserator.datamade.us/usaddress/
The usaddress.tag results are this:

PARSED TOKENS:    [('38350', 'AddressNumber'), ('40TH', 'StreetName'), ('ST', 'StreetNamePostType'), ('EAST', 'StreetNamePreDirectional'), ('100', 'StreetName'), ('PALMDALE', 'PlaceName'), ('CA', 'StateName'), ('93552', 'ZipCode')]
UNCERTAIN LABEL:  StreetName```

You can see usaddress is incorrectly identifying the post-directional as a pre-directional, which is causing it to identify the street name a second time.

VS `38350 40TH ST E 100 PALMDALE CA 93552`

(OrderedDict([('AddressNumber', '38350'),
('StreetName', '40TH'),
('StreetNamePostType', 'ST'),
('StreetNamePostDirectional', 'E'),
('OccupancyIdentifier', '100'),
('PlaceName', 'PALMDALE'),
('StateName', 'CA'),
('ZipCode', '93552')]),
'Street Address')

This issue needs to be resubmitted to that package: https://github.com/datamade/usaddress/issues

philiporlando · 2023-12-30T22:30:14Z

@fablet, I appreciate your input. Like you, I also encountered the parsing error using the Parserator API at https://parserator.datamade.us/usaddress.

However, I've successfully used usaddress.parse() with the address "38350 40TH ST EAST 100 PALMDALE CA 93552" with usaddress version 0.5.10:

import usaddress

address = "38350 40TH ST EAST 100 PALMDALE CA 93552"

print(usaddress.parse(address))

# [('38350', 'AddressNumber'), ('40TH', 'StreetName'), ('ST', 'StreetNamePostType'), ('EAST', 'StreetNamePreDirectional'), ('100', 'StreetName'), ('PALMDALE', 'PlaceName'), ('CA', 'StateName'), ('93552', 'ZipCode')]

It seems the latest version of usaddress might have resolved this pre- vs post-directional issue, however, I'm uncertain about the usaddress version utilized by the Parserator API,

Unfortunately, usaddress.tag() now raises a duplicate street name error when using the latest version:

import usaddress

address = "38350 40TH ST EAST 100 PALMDALE CA 93552"

usaddress.tag(address)

# Traceback (most recent call last):
#   File "/home/user/usaddress_parse_error/usaddress_parse_error.py", line 5, in <module>
#     usaddress.tag(address)
#   File "/home/user/.cache/pypoetry/virtualenvs/usaddress-parse-error-aadNbsKj-py3.10/lib/python3.10/site-packages/usaddress/__init__.py", line 177, in tag
#     raise RepeatedLabelError(address_string, parse(address_string),
# usaddress.RepeatedLabelError: 
# ERROR: Unable to tag this string because more than one area of the string has the same label

# ORIGINAL STRING:  38350 40TH ST EAST 100 PALMDALE CA 93552
# PARSED TOKENS:    [('38350', 'AddressNumber'), ('40TH', 'StreetName'), ('ST', 'StreetNamePostType'), ('EAST', 'StreetNamePreDirectional'), ('100', 'StreetName'), ('PALMDALE', 'PlaceName'), ('CA', 'StateName'), ('93552', 'ZipCode')]
# UNCERTAIN LABEL:  StreetName

# When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly

# To report an error in labeling a valid name, open an issue at https://github.com/datamade/usaddress/issues/new - it'll help us continue to improve probablepeople!

# For more information, see the documentation at https://usaddress.readthedocs.io/

So it seems that we are trading one parsing error for another. That being said, the newest version of usaddress.parse() is working for me, which is the function that I need for my business case.

Do you know if there are plans to update usaddress-scourgify's dependency on usaddress from 0.5.9 to 0.5.10 in the near future? I hoping that this would avoid the error I'm seeing with normalize_address_record().

Thank you again for assisting with this issue.

philiporlando · 2023-12-30T23:17:41Z

Ok, I just tried forking this repo and updating its usaddress dependency to 0.5.10.

Unfortunately, this did not resolve my issues:

from scourgify import normalize_address_record

address = "38350 40TH ST EAST 100 PALMDALE CA 93552"

normalize_address_record(address)

# Traceback (most recent call last):
#   File "/home/user/usaddress_parse_error/usaddress_parse_error.py", line 5, in <module>
#     normalize_address_record(address)
#   File "/home/user/.cache/pypoetry/virtualenvs/usaddress-parse-error-aadNbsKj-py3.10/lib/python3.10/site-packages/scourgify/normalize.py", line 159, in normalize_address_record
#     return normalize_addr_str(
#   File "/home/user/.cache/pypoetry/virtualenvs/usaddress-parse-error-aadNbsKj-py3.10/lib/python3.10/site-packages/scourgify/normalize.py", line 267, in normalize_addr_str
#     raise UnParseableAddressError(None, None, addr_rec)
# scourgify.exceptions.UnParseableAddressError: UNPARSEABLE ADDRESS: Unable to break this address into its component parts, OrderedDict([('address_line_1', '38350 40TH ST EAST 100 PALMDALE CA 93552'), ('address_line_2', None), ('city', None), ('state', None), ('postal_code', None)])

It probably makes the most sense to open a new issue within the usaddress repo and try to address the error with usaddress.tag().

philiporlando · 2023-12-30T23:27:20Z

@fablet, I've opened this issue to address the root of the problem. Thanks again for the support.

philiporlando mentioned this issue Dec 30, 2023

ERROR: Unable to tag this string because more than one area of the string has the same label datamade/usaddress#359

Open

philiporlando closed this as completed Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`normalize_address_record()` raises unparseable address error when using full street directionals #31

`normalize_address_record()` raises unparseable address error when using full street directionals #31

philiporlando commented Jul 31, 2023

zak-flex commented Aug 29, 2023

fablet commented Dec 14, 2023

philiporlando commented Dec 30, 2023 •

edited

Loading

philiporlando commented Dec 30, 2023 •

edited

Loading

philiporlando commented Dec 30, 2023

normalize_address_record() raises unparseable address error when using full street directionals #31

normalize_address_record() raises unparseable address error when using full street directionals #31

Comments

philiporlando commented Jul 31, 2023

zak-flex commented Aug 29, 2023

fablet commented Dec 14, 2023

philiporlando commented Dec 30, 2023 • edited Loading

philiporlando commented Dec 30, 2023 • edited Loading

philiporlando commented Dec 30, 2023

`normalize_address_record()` raises unparseable address error when using full street directionals #31

`normalize_address_record()` raises unparseable address error when using full street directionals #31

philiporlando commented Dec 30, 2023 •

edited

Loading

philiporlando commented Dec 30, 2023 •

edited

Loading