-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculating text width and incorrect result array paths #619
Comments
It is usually a good practice to create one issue per problem, otherwise it is harder to provide help. (1) + (2) are important to fix, but it seems its PDF-dependent and not shown in general. |
@k00ni : Noted re issue being split. Will make sure I stick with that approach in future. Thanks for the steer. |
1. problem
It's because we use 2. problem
Your code: $pdf = $parser->parseFile($pdfFile);
$data = $pdf->getPages()[0]->getDataTm();
$fonts = $pdf->getFonts();
$font_id = $data[0][2]; //R7
$font = $fonts[$font_id]; // <------- line 310 Without the PDF I can't really help here. Maybe 3. problem
I created a hotfix to avoid the warning, but I could not find the origin of the error in time. #645 contains a hot fix which should suppress the warning in case the Widths-key is not set. 4. problemPlease open a new issue for that and provide an example PDF. @GreyWyvern revamped a huge chunk of the library in #634, you should try if that helps you. I will close this, because most of your problems should be solved (to some extent). For problem 4 open a new issue. In case I forgot something, don't hesitate to comment here. |
Hey there
Thanks so much for writing this wrapper. It's probably the only decent one out there for pdf parsing.
I've come across a few issues from the readme and usage against the behaviour of the project.
Firstly - I've been trying to work out how to get width and height of the bounding box for text (not the page width/height).
I came across your 'calculateTextWidth' function, trying it per your Usage instructions and attempting to do it my own way, but nothing works.
Btw, my pdfFile is a standard PDF created from an MSWord docx (text based), using the SaveAs pdf with MSWord. It comprises of 4 lines of text only. A very basic industry standard PDF.
I've also configed the parser
use Smalot\PdfParser\Config;
$config = new Config();
$config->setDataTmFontInfoHasToBeIncluded(true);
use Smalot\PdfParser\Parser;
$parser = new Parser([], $config);
/////////////////////////////////////////////////////////////////////////////////////////////////////////////
////1. When running your usage approach (https://github.com/smalot/pdfparser/blob/master/doc/Usage.md), I get a resulting error:
$pdf = $parser->parseFile($pdfFile);
$data = $pdf->getPages()[0]->getDataTm();
$text = $data[0][1];
$font = reset($pdf->getFonts()); <------- line 299
$width = $font->calculateTextWidth($text);
A PHP Error was encountered
Severity: Notice
Message: Only variables should be passed by reference
Line Number: 299
/////////////////////////////////////////////////////////////////////////////////////////////////////////////
////2. When running your usage approach (https://github.com/smalot/pdfparser/blob/master/doc/Usage.md), I get a resulting error:
$pdf = $parser->parseFile($pdfFile);
$data = $pdf->getPages()[0]->getDataTm();
$fonts = $pdf->getFonts();
$font_id = $data[0][2]; //R7
$font = $fonts[$font_id]; <------- line 310
$text = $data[0][1];
$width = $font->calculateTextWidth($text);
A PHP Error was encountered
Severity: Warning
Message: Undefined array key "F1"
Line Number: 310
/////////////////////////////////////////////////////////////////////////////////////////////////////////////
////3. When I adjust all the approach based on the arrays I'm actually getting out of the parser
$pdf = $parser->parseFile($pdfFile);
$data = $pdf->getPages()[0]->getDataTm();
$fonts = $pdf->getPages()[0]->getFonts();
$font_id = $data[0][2]; //R7
$font = $fonts[$font_id];
$text = $data[0][1];
$width = $font->calculateTextWidth($text);
A PHP Error was encountered
Severity: Warning
Message: Undefined array key "Widths"
Filename: PdfParser/Font.php
Line Number: 279
Backtrace:
File: C:\xampp81\php\vendor\smalot\pdfparser\src\Smalot\PdfParser\Font.php
Line: 279
Function: _error_handler
File: C:\xampp81\htdocs\villg.life\application\views\tabsSecurity\listImage.php
Line: 286
Function: calculateTextWidth
/////////////////////////////////////////////////////////////////////////////////////////////////////////////
When I call ->getDetails() manually on the page (in the same way your function calculateTextWidth(string $text, array &$missing = null) does at https://github.com/smalot/pdfparser/blob/master/src/Smalot/PdfParser/Font.php, I get the following array from the page - which is page related data, not the bounding box for the text snippet.
Array (
[Type] => Page
[Parent] => Array (
[Type] => Pages
[Count] => 1
)
[Resources] => Array (
[ExtGState] => Array (
[GS5] => Array ( [Type] => ExtGState [BM] => Normal [ca] => 1 )
[GS11] => Array ( [Type] => ExtGState [BM] => Normal [CA] => 1 )
)
[Font] => Array (
[F1] => Array ( [Name] => ArialMT [Type] => Type0 [Encoding] => Identity-H [Subtype] => Type0 [BaseFont] => ArialMT)
[F2] => Array ( [Name] => ArialMT [Type] => TrueType [Encoding] => WinAnsiEncoding [Subtype] => TrueType [BaseFont] => ArialMT [FirstChar] => 32 [LastChar] => 32 ) )
[ProcSet] => Array ([0] => PDF [1] => Text [2] => ImageB [3] => ImageC [4] => ImageI )
)
[MediaBox] => Array ( [0] => 0 [1] => 0 [2] => 595.32 [3] => 841.92 ) [Contents] => Array ( [Filter] => FlateDecode [Length] => 683 ) [Group] => Array ( [Type] => Group [S] => Transparency [CS] => DeviceRGB ) [Tabs] => S [StructParents] => 0 )
I'm not not sure what's going..
Also, I've had a look into the code and it looks like there's a few hidden cool functions that aren't clearly documented in the readMe.. Is that the case or am I overthinking it?
The text was updated successfully, but these errors were encountered: