What's Changed
- fix(pdf_parse): Move the logic for filling text content into spans before the discarded_block recognition to fix the issue of empty text blocks in discarded_block. by @myhloli in #1082
- refactor(txt_spans_extract_v2): optimize span processing and OCR logic by @myhloli in #1086
- feat(ocr): filter out low confidence ocr results by @myhloli in #1088
- feat(pdf_parse): add OCR score to span data by @myhloli in #1089
- fix: test_rag by @icecraft in #1105
- perf(image_processing): reduce maximum image size for analysis by @myhloli in #1106
- fix: test_tools unittest by @icecraft in #1104
- refactor(libs): remove unused imports and functions by @myhloli in #1112
- Feat/add s3 read write example by @icecraft in #1117
Full Changelog: magic_pdf-0.10.1-released...magic_pdf-0.10.2-released