v2 - Support for JBIG2 (port of https://github.com/apache/pdfbox-jbig2) #631

BobLd · 2023-05-26T13:33:24Z

@EliotJones, I took @kasperdaff commit and added the changes from your comments in a 2nd commit.

I've also added the JBig2 notice inside the current NOTICE file.

As a side note, it seems some part of the code will be useful for JPX and DTC decode.

Also, to keep it clean, I'd would really prefer the original PR from @kasperdaff being accepted and merge instead of merging this PR (some conflicts will need to be resolved though). If the original PR is merge, we can just then cherry peek my changes and merge them.

…and replace System.Drawing.rectangle by Jbig2Rectangle and update NOTICE with JBig2

EliotJones · 2023-06-04T10:22:25Z

Thanks @BobLd, I'm thinking of stabilising the current master in order to release 0.1.8 before merging this. Once I've addressed a few remaining issues for the 0.1.8 version I think I'll merge this one directly since GitHub preserves @kasperdaff s authorship in the author field of the source commits?

BobLd · 2023-06-04T10:32:59Z

@EliotJones yes makes sense, sounds good to me

EliotJones

Unfortunately, as with PDF2.0 (historically), the deeply irritating ISO organization yet again paywalls the spec, this seems to be the closest to a freely available JBIG2 spec if you don't have $100s to give to those parasites: https://www.itu.int/rec/T-REC-T.88-201808-I/en

I have reviewed the code, a lot of it is a direct 1:1 port of https://github.com/apache/pdfbox-jbig2/blob/master/LICENSE.txt as mentioned in the original PR. Since I don't have book learning about copyright I'm not sure the position with respect to the NOTICES.txt vs copyright headers being in every file. Since the source repository uses the NOTICES.txt file in the same way we might be fine here, but I'm going to open a StackExchange question to confirm the right approach here (no doubt the question will be summarily closed).

If there was more work to create our implementation here I'd feel more comfortable but it is such a huge amount of code for such a rare/pointless image format (edit: actually since the common usage was document scans potentially not so pointless...) so I'm conflicted. It's also a huge code-debt to take on.

@BobLd I'd like to understand what is needed for continuing work on your image rendering branch. How much of this is blocking for further work on image rendering. I have some associated JPG related code here https://github.com/EliotJones/BigGustave/tree/add-jpg-support/src/BigGustave/Jpgs for things like Huffman trees, not sure if we could use that instead to unblock DCT decode? (Assuming we just skip JBIG2 images for now)

While I'm not opposed to getting this in I think I'd need to work on it in this branch to get it to a state I'm happier with from a code-ownership perspective. Especially since JBIG parsing comes with some (recent) vulnerabilities https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html

Let me know what you think

(please take my slightly annoyed tone as residual annoyance at ISO, they ruin my day every time, this work is great and thank you both for your efforts here)

BobLd · 2023-06-07T22:14:41Z

thanks a lot @EliotJones for the response.

Unfortunately, as with PDF2.0 (historically), the deeply irritating ISO organization yet again paywalls the spec, this seems to be the closest to a freely available JBIG2 spec if you don't have $100s to give to those parasites: https://www.itu.int/rec/T-REC-T.88-201808-I/en

I've not reviewed the whole logic of the initial commit and just saw that it was closely following the java version so was confident it wored. Test also looked quite complete. Thanks for sending over the link, I'll have a look. I've only looking at how DCT decoding would work and saw Huffman tables involved, I thought it would be useful to not re-invent the weel

I have reviewed the code, a lot of it is a direct 1:1 port of https://github.com/apache/pdfbox-jbig2/blob/master/LICENSE.txt as mentioned in the original PR. Since I don't have book learning about copyright I'm not sure the position with respect to the NOTICES.txt vs copyright headers being in every file. Since the source repository uses the NOTICES.txt file in the same way we might be fine here, but I'm going to open a StackExchange question to confirm the right approach here (no doubt the question will be summarily closed).

I agree and would be very curious about the StackExchange answer 😄

If there was more work to create our implementation here I'd feel more comfortable but it is such a huge amount of code for such a rare/pointless image format (edit: actually since the common usage was document scans potentially not so pointless...) so I'm conflicted. It's also a huge code-debt to take on.

Makes a lot of sense too, I don't think there's a real rush to merge that. I think it's important to be confident with what's merged into the code

@BobLd I'd like to understand what is needed for continuing work on your image rendering branch. How much of this is blocking for further work on image rendering. I have some associated JPG related code here https://github.com/EliotJones/BigGustave/tree/add-jpg-support/src/BigGustave/Jpgs for things like Huffman trees, not sure if we could use that instead to unblock DCT decode? (Assuming we just skip JBIG2 images for now)

Image rendering is not a blocking point so far, I'll create an issue soon to explain how I view the next steps for image rendering and possible solutions. I will definitely check your repos as it will certainly be useful for DCT

While I'm not opposed to getting this in I think I'd need to work on it in this branch to get it to a state I'm happier with from a code-ownership perspective. Especially since JBIG parsing comes with some (recent) vulnerabilities https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html

As above, important to be confident with what's merged so no rush. And we can even find another way to process JBIG images down the line. I was very amazed to read these zero click vulnerabilities are linked to JBIG, fascinating read - thanks for that!

(please take my slightly annoyed tone as residual annoyance at ISO, they ruin my day every time, this work is great and thank you both for your efforts here)

I'm as annoyed with these ISO paywal haha, no problem! Seems JPX2000 is also behind paywall by the way...

EliotJones · 2023-06-13T21:07:11Z

@BobLd from this answer I think we're ok with the Notices attribution only and the license matching the source project. https://opensource.stackexchange.com/questions/14050/notices-txt-or-copyright-file-header-when-porting-code

Interestingly another JBIG post made the rounds today https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning though less to do with JBIG being tricky and more the wrong usage of it.

BobLd · 2023-06-17T11:06:26Z

@EliotJones thanks for taking care of getting an answer for the NOTICE question, good to know it's straightforward.

Regarding JBIG, I've read somewhere that one should never use it to compress images with text (even if it seems it was originally created for that) exactly for the issue discussed in the article... crazy

sbruyere · 2023-11-13T23:19:22Z

Hello,

I've been following the progress of the PR for integrating JBIG2 support into PdfPig, and I appreciate the effort to port the Java pdfbox-jbig2 project for this purpose. It's an exciting development for the .NET community working with PDFs and JBIG2.

I'd like to suggest a slightly different approach that might bring additional benefits both to the PdfPig project and the wider .NET community. Given the specialized nature of JBIG2 support, it could be beneficial to develop this feature first as a standalone package or library, targeting .NET Standard/Core.

Advantages of creating a separate Package/Library:

Wider Applicability: As a separate library, JBIG2 support could be used by other projects beyond PdfPig, fostering broader adoption and contribution across the .NET community.
Focused Development and Testing: An independent library allows for dedicated development and testing of JBIG2 features, improving the overall quality and performance.
Ease of Distribution via NuGet: Distributing this as a standalone package on NuGet would make it more accessible and encourage its use within the .NET ecosystem.
License Compliance Checks: Developing this feature separately initially allows for a thorough review of licensing, ensuring compliance and mitigating potential legal issues for PdfPig.
Future Integration Possibilities: Once the library is stable and its licensing is clear, it could be seamlessly integrated into PdfPig as a NuGet dependency. This ensures the addition of a well-supported, community-approved feature to PdfPig.

Creating a standalone library for JBIG2 aligns with the principles of open-source development and can stimulate more community involvement. It also ensures independent evolution of this feature, benefiting a broader audience within the .NET community while maintaining the focus and integrity of the PdfPig project.

mvantzet · 2023-12-19T10:34:08Z

I ran into JBIG2 related issues today and ended up here.
@EliotJones / @BobLd: Any thoughts about the suggestion from @sbruyere?

EliotJones · 2024-01-07T16:05:59Z

I'm not opposed to a separate library for this. Though I generally aim to keep PdfPig dependency free (outside of Microsoft/.NET dependencies) this is somewhere where code re-use makes sense and the cost of ownership for PdfPig is high for implementation.

BobLd · 2024-01-11T16:20:06Z

I kind of like the approach to use external existing libraries to handle missing filters and color space (ICC color) as these represent a non-negligible amount of work.

Regarding using external libraries (with their own licenses), I think we can start with the following:

DCT filter: https://github.com/yigolden/JpegLibrary/tree/main (MIT) - I already have a working example with this repos
JPX filter: https://github.com/cureos/csj2k (~~BSD~~) EDIT: I did a quick test and it seems to work almost out of the box
JBIG2 filter: Use this PR or something similar as a base
ICC profile handling: https://github.com/cureos/csj2k/tree/master/CSJ2K/Icc - For later on I think, haven't tested but seems to be suitable

Now the question is how to actually make it possible to use these NuGet packages from the code, i.e. adding these filters/color space (coming from different NuGet packages) in external code. I'm not sure what is the best approach to take.

@EliotJones let me know what you think

iamcarbon · 2024-03-14T02:28:09Z

What about just exposing opaque bytes for non-supported images, and allow the user to take a dependency on their own decoder of choice?

BobLd · 2024-03-14T08:08:50Z

@iamcarbon not sure what you mean here, I might be missing sthing though

Pdfpig already exposes the bytes, even without decoding. Also keep in mind that once these bytes are decoded, pdfpig applies the color space.

If you want to do a proof of concept of your idea, I'm happy to look into it

sbruyere · 2024-05-27T20:35:52Z

By reading the code I just realized the solution is almost already in our hands...

A third solution could be to implement a "plug-in architecture" within PdfPig. This would allow JBIG2 support to be added as an optional "plug-in" rather than a built-in feature. And this plug in architecture almost already exist thanks to existing interfaces.

This approach combines the benefits of modularity and ease of integration. Users who need JBIG2 support can add the plug-in, while those who don't need it can keep their deployments lightweight. This also allows for independent updates / libraries and maintenance of the JBIG2 component (and possibly other filters).

Explanation:

Filter implements the public interface IFilter ;
Filter are managed by a filter provider (a class that implements the public interface IFilterProvider), the default one is DefaultFilterProvider.

Why not just adding the possibility to add custom filters to IFilterProvider with a new void AddFilter(IFilter filter); method ?

With that, anyone could be able to add extended filter or their own version of an existing filter, including JBIG2.

What do you think ? @BobLd @EliotJones

BobLd · 2024-05-28T17:37:52Z

@sbruyere makes a lot of sense to me, I just it would make more sense to do that at filter provider level, I.e.

void AddFilterProvider(IFilterProvider filterProvider);

sbruyere · 2024-05-31T10:43:33Z

@sbruyere makes a lot of sense to me, I just it would make more sense to do that at filter provider level, I.e.

void AddFilterProvider(IFilterProvider filterProvider);

You mean having more than one filter provider for processing or being able to replace / set the current filter provider ?

If you have multiple filter provider, this could lead to having more than one filter for the same kind of data, not sure it's the expected behavior, and it will make far more complexe from which filter provider you have to get the output data.
If you means to set / replace the current filter provider, there is another issue: default common filters that are defined in DefaultFilterProvider are set "internals", which mean we canno't use them in a custom defined Filter Provider:
internal class Ascii85Filter : IFilter; however it could be interesting to change their visibilities to public for some reasons.

In my opinion, the simplest way is to have the ability to add custom filters to existing (and / or custom) filter providers. This way it will be easy to add or replace a filter to the default filter provider which already define all others default filters like Ascii85Filter.

BobLd · 2024-06-03T22:05:02Z

@sbruyere I think I misunderstood your first comment. Happy for you to do a proof of concept PR (no need to implement the actual filters) to demonstrate your approach, we can then use that as a base for discussion

sbruyere · 2024-06-07T07:33:33Z

@sbruyere I think I misunderstood your first comment. Happy for you to do a proof of concept PR (no need to implement the actual filters) to demonstrate your approach, we can then use that as a base for discussion

It's on my plan indeed, I'll possibly submit a PR before end of month. Thanks for the feedback.

BobLd · 2024-07-21T11:31:39Z

@sbruyere do you still want to do a PoC PR? I might start working on that and ask for your opinion. Let me know

BobLd · 2024-10-18T19:13:26Z

@sbruyere I've updated PdfPig to allow for external filters. I've implemented the DCT filter in a different package https://github.com/BobLd/UglyToad.PdfPig.Filters.Dct.JpegLibrary - still early stage.

@mvantzet I'm planning to release NuGet packages for JBIG2 and JPX shortly

Any feedback welcome

kasperdaff and others added 2 commits May 26, 2023 14:02

Support for JBIG2 (port of Apache's pdfbox-jbig2)

301fdb8

Update with comments @EliotJones, rename Bitmap(s) to Jbig2Bitmap(s) …

1132cfc

…and replace System.Drawing.rectangle by Jbig2Rectangle and update NOTICE with JBig2

BobLd requested a review from EliotJones May 26, 2023 13:37

EliotJones reviewed Jun 6, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2 - Support for JBIG2 (port of https://github.com/apache/pdfbox-jbig2) #631

v2 - Support for JBIG2 (port of https://github.com/apache/pdfbox-jbig2) #631

BobLd commented May 26, 2023 •

edited

Loading

EliotJones commented Jun 4, 2023

BobLd commented Jun 4, 2023

EliotJones left a comment •

edited

Loading

BobLd commented Jun 7, 2023 •

edited

Loading

EliotJones commented Jun 13, 2023

BobLd commented Jun 17, 2023

sbruyere commented Nov 13, 2023 •

edited

Loading

mvantzet commented Dec 19, 2023

EliotJones commented Jan 7, 2024

BobLd commented Jan 11, 2024 •

edited

Loading

iamcarbon commented Mar 14, 2024

BobLd commented Mar 14, 2024

sbruyere commented May 27, 2024 •

edited

Loading

BobLd commented May 28, 2024

sbruyere commented May 31, 2024

BobLd commented Jun 3, 2024

sbruyere commented Jun 7, 2024

BobLd commented Jul 21, 2024

BobLd commented Oct 18, 2024 •

edited

Loading

v2 - Support for JBIG2 (port of https://github.com/apache/pdfbox-jbig2) #631

Are you sure you want to change the base?

v2 - Support for JBIG2 (port of https://github.com/apache/pdfbox-jbig2) #631

Conversation

BobLd commented May 26, 2023 • edited Loading

EliotJones commented Jun 4, 2023

BobLd commented Jun 4, 2023

EliotJones left a comment • edited Loading

Choose a reason for hiding this comment

BobLd commented Jun 7, 2023 • edited Loading

EliotJones commented Jun 13, 2023

BobLd commented Jun 17, 2023

sbruyere commented Nov 13, 2023 • edited Loading

mvantzet commented Dec 19, 2023

EliotJones commented Jan 7, 2024

BobLd commented Jan 11, 2024 • edited Loading

iamcarbon commented Mar 14, 2024

BobLd commented Mar 14, 2024

sbruyere commented May 27, 2024 • edited Loading

BobLd commented May 28, 2024

sbruyere commented May 31, 2024

BobLd commented Jun 3, 2024

sbruyere commented Jun 7, 2024

BobLd commented Jul 21, 2024

BobLd commented Oct 18, 2024 • edited Loading

BobLd commented May 26, 2023 •

edited

Loading

EliotJones left a comment •

edited

Loading

BobLd commented Jun 7, 2023 •

edited

Loading

sbruyere commented Nov 13, 2023 •

edited

Loading

BobLd commented Jan 11, 2024 •

edited

Loading

sbruyere commented May 27, 2024 •

edited

Loading

BobLd commented Oct 18, 2024 •

edited

Loading