Release magic_pdf-0.10.2-released · opendatalab/MinerU

What's Changed

fix(pdf_parse): Move the logic for filling text content into spans before the discarded_block recognition to fix the issue of empty text blocks in discarded_block. by @myhloli in #1082
refactor(txt_spans_extract_v2): optimize span processing and OCR logic by @myhloli in #1086
feat(ocr): filter out low confidence ocr results by @myhloli in #1088
feat(pdf_parse): add OCR score to span data by @myhloli in #1089
fix: test_rag by @icecraft in #1105
perf(image_processing): reduce maximum image size for analysis by @myhloli in #1106
fix: test_tools unittest by @icecraft in #1104
refactor(libs): remove unused imports and functions by @myhloli in #1112
Feat/add s3 read write example by @icecraft in #1117

Full Changelog: magic_pdf-0.10.1-released...magic_pdf-0.10.2-released