Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue w/ new Text instances #9

Open
hvr opened this issue Dec 8, 2019 · 8 comments
Open

Performance issue w/ new Text instances #9

hvr opened this issue Dec 8, 2019 · 8 comments
Labels
bug Something isn't working performance

Comments

@hvr
Copy link
Collaborator

hvr commented Dec 8, 2019

When I tried benchmarking, I noticed a strange and significant performance-penalty effect that is exposed by the Text support recently merged into regex-tdfa. (The String and ByteString instances don't exhibit this performance effect)

It turns out that the regex-tdfa-text package already suffered from this but nobody seems to have reported it yet and it's probably not too apparent in simple regex-matching applications. Since the Text support is fundamentally more or less the same code as the ByteString implementation, I'm suspecting a weird interaction with the fusion-rules in text.

TODO: Provide repro-case

@hvr hvr added the bug Something isn't working label Dec 8, 2019
@Xitian9
Copy link

Xitian9 commented Aug 15, 2020

Knowing nothing about what causes this issue, something to note is that regex-tdfa supplies SPECIALIZE pragmas for String, Seq Char, and both strict and lazy ByteString. Notably, it does not provide SPECIALIZE pragmas for strict or lazy Text.

With that it mind, do you still see this issue after using the code in PR #8?

@hvr
Copy link
Collaborator Author

hvr commented Aug 15, 2020 via email

@Xitian9
Copy link

Xitian9 commented Aug 15, 2020

Hmm, that looks sticky. I have no particular knowledge that makes it likely that I can solve this problem, but if I have time I might have a poke around.

@gwern
Copy link

gwern commented Apr 26, 2022

Is there any update on this? Or easy way I could help?

Background: I originally used text-tdfa on gwern.net for regexp tests & rewrites back in 2019 or so with the Text support, but it was so slow I moved to the basic Posix regex library; unfortunately, over the past 2 years I've been running into ever more segfaults and 'strange closures' (GHC would even segfault while compiling the code!), which generally seemed to be related to the regular regexes, which have a long trail of weird issues being reported over the years. Moving back to tdfa (with the regex-compat-tdfa compat layer for the rewrites) mostly seems to resolve the constant segfaulting, but also slows down site compile time by like 2x (from ~1.5h to >3h). Before I spend a lot of time optimizing by hand, it'd be nice if text-tdfa fixed its lowhanging fruit.

@andreasabel
Copy link
Member

@gwern : I am not planning on working on this, but a high-quality PR is definitely welcome.
My main concern is to keep regex-tdfa working stably in an evolving Haskell ecosystem. But improvements are welcome...

@Xitian9
Copy link

Xitian9 commented Apr 26, 2022

One thing to note is that there are some pretty big changes in the text package which will be landing soon: haskell/text#365, haskell/text#348. The second in particular might have a large effect on any work here. Maybe disabling those rules will solve the problem? Maybe it will not, but will defeat any solution which is derived in the meantime. I think looking at the effect of those would be a good first thing to do.

@gwern
Copy link

gwern commented Apr 26, 2022

Have those already been released? I see on Hackage the last text upload date is 2021-12-24 but the UTF8 change was merged 2021-09-08 and the implicit-fusion disabling 2021-06-21, if I'm reading these bug reports correctly, so shouldn't both of them already be in live text, not just dev HEAD?

@Xitian9
Copy link

Xitian9 commented Apr 27, 2022

It's complicated slightly by the fact that text is a boot package, and so is bundled with ghc. You can access it now on hackage, but it won't be the default version you get when you compile (without some extra contortions) until ghc-9.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
None yet
Development

No branches or pull requests

4 participants