-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How would you parse this line? (bol.com) #359
Comments
I'm facing the same issue, product's description lays on 2 rows but there's nothing on the row between these two. How can you capture the description in this case? |
I think the problem is that the parser doesn't support line breaks in the regex. It would be a great start to at least have the first line of the description. If somebody knows a workaround or fix, it's highly appreciated. |
Struggling with the same issue on aliexpress invoices. |
This issue has not been solved yet, the code you're asking for is in the first post, it's just a default template otherwise. As stated earlier, when line breaks are supported it should start to work, but someone should program that. |
Just tested the code with description on a (regex101.com) (This is were im at on aliexpress invoices 80/20 rule) Im running into the limitations of the debug website. Will look into this when i have acces to an install. As the module handles multi line differentially |
Have you tried replacing all the line breaks? I've had some luck with that on gasstation invoices. The parser spreads the actual description on multiple lines so the output look like:
Which makes it impossible to extract:
The replacement of line breaks made it go on my invoices to something like:
used this code to replace the linebreaks
|
Forget my previous statement about removing linebreaks. try someting like:
or Might still need some work on the desciption part. |
Like i said, the technical code is at the top. The extracted string from the PDF first, my attempt for a regex second. My regex for multiple lines works fine, it's just that this program can't deal with such a regex apparently. The solutions is not in the template, it's in fixing the source code. |
Sorry, but without the template en input file, I am unable to help. just to be clear.
As wierdly as it may sound from my experience working with this module. The debug window shows different strings. example of mulltiline extraction: Oddly, with your regexcode I do get pattern errors |
There is really no easy/clean way to parse such lines. The problem is vertical alignment of table cells content. Ideally why should ask |
Template:
This above mess is the line, I can grab everything except the description of the product using this:
I tried capturing the first line of the description using this:
However, no lines are found at all anymore then.
Is it possible to capture the description and amounts at the same time? Or how would I approach this situation?
The text was updated successfully, but these errors were encountered: