Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The YoutubeEmbed Plugin #7

Closed
097115 opened this issue Feb 14, 2023 · 27 comments
Closed

The YoutubeEmbed Plugin #7

097115 opened this issue Feb 14, 2023 · 27 comments
Labels
upstream upstream issue

Comments

@097115
Copy link

097115 commented Feb 14, 2023

E.g., let's say we have an embedded YouTube iframe. Then this command

echo '<p>Lorem Ipsum:</p><p style="text-align: center;"><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/PifPVQOFyZI" title="YouTube video player" width="560"></iframe></p>' | html2md -i

...will return just:

Lorem Ipsum:

Ma be it could catch the iframe's src and return it instead of iframe? Like:

Lorem Ipsum:
https://www.youtube.com/embed/PifPVQOFyZI
@suntong
Copy link
Owner

suntong commented May 1, 2023

Sorry I had the wrong notice setting and was only able to notice your issue just now.

Extracting attribute values using css selectors is currently impossible --
https://stackoverflow.com/questions/1972428/how-to-extract-attribute-values-using-css-selectors

Will think about it...

@suntong
Copy link
Owner

suntong commented May 1, 2023

...will return just:

Lorem Ipsum:

Hmm... this is what html2md supposed to be doing -- getting the html text, not html attributes. So I'll close it instead, for being out of the scope of this tool.

@suntong suntong closed this as completed May 1, 2023
@suntong
Copy link
Owner

suntong commented May 1, 2023

That being said, I'll try to add the feature to
https://github.com/suntong/cascadia
as it is not supported there as of now...

@suntong
Copy link
Owner

suntong commented May 1, 2023

That being said, I'll try to add the feature to https://github.com/suntong/cascadia as it is not supported there as of now...

Ahh, turned out that it is supported now. See https://github.com/suntong/cascadia/wiki#attribute-selection

echo '<p>Lorem Ipsum:</p><p style="text-align: center;"><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/PifPVQOFyZI" title="YouTube video player" width="560"></iframe></p>' | cascadia -o -i -c 'p' -p 'src=attr[src]:iframe'
src

https://www.youtube.com/embed/PifPVQOFyZI

@097115
Copy link
Author

097115 commented May 2, 2023

@suntong, thank you for your comments!

However, my point was to extract BOTH the text AND the YouTube's iframe's source :) (To make some random web page actually readable in the terminal).

With cascadia, however, I can't seem to find a way to keep BOTH the common <p> entries and the iframe's src: the example you showed above just skips the common text (Lorem ipsum in this example), and if I try something like

echo '<p>Lorem Ipsum:</p>
      <p style="text-align: center;"><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/PifPVQOFyZI" title="YouTube video player" width="560"></iframe></p>' \
| cascadia -o -i -c 'p' -c 'src=attr[src]:iframe'

when it doesn't strips HTML :)

But may be there's some workaround? To keep both the text and the iframe's source?

Thanks again :)

@097115
Copy link
Author

097115 commented May 2, 2023

@suntong

for being out of the scope of this tool.

On a second thought, please, consider this example:

echo '<p>Lorem Ipsum:</p>
      <p><img src="https://some.picture/url.jpg"></p>' | html2md -i

which will produce the following output:

Lorem Ipsum:

![](https://some.picture/url.jpg)

So, if html2md considers HTML attributes for img, then why shouldn't it consider those in other cases? :)

Those src are a part of the content, they logically and consistently connect the currently processed page with the other resources, so stripping those simply breaks the narrative, no?

What do you think?

@suntong
Copy link
Owner

suntong commented May 2, 2023

However, my point was to extract BOTH the text AND the YouTube's iframe's source

In that case, maybe take a look at the
--plugin-youtube, the Plugin of YoutubeEmbed from the html2md lib.

@suntong suntong reopened this May 2, 2023
@097115
Copy link
Author

097115 commented May 3, 2023

In that case, maybe take a look at the --plugin-youtube

May be I'm doing something wrong but I added it to the sample command, and it changes nothing for me?

@suntong
Copy link
Owner

suntong commented May 3, 2023

it changes nothing for me?

Ah, yeah, me neither, and I tried the input from
https://github.com/JohannesKaufmann/html-to-markdown/blob/8eb812b8869e447bb712afdb05d8627da816ecae/testdata/TestRealWorld/golang.org/input.html#L162-L167
and it doesn't work either.

And it turns out to be caused by #1:

All Plugins that are still named with EXPERIMENTAL_ in upstream are commented out at the moment.

I don't have time to look into it further now.
Maybe in a month or two...

@097115
Copy link
Author

097115 commented May 3, 2023

I don't have time to look into it further now. Maybe in a month or two...

No problem, it's nothing urgent. Thank you for your interest anyway :)

@suntong
Copy link
Owner

suntong commented May 3, 2023

Hmm, please

  • report the issue upstream, and
  • discuss in details with him what you want out of the Plugin of YoutubeEmbed
  • if AOK, ask him to remove the EXPERIMENTAL. If you have no way to debug the YoutubeEmbed Plugin yourself, you may want to ask him to remove the EXPERIMENTAL first, and tell me.

thanks

@suntong suntong changed the title Skips iframes The YoutubeEmbed Plugin May 3, 2023
@097115
Copy link
Author

097115 commented May 4, 2023

the YoutubeEmbed Plugin

The funny thing is, it's not even a function. I mean, I'm not a programmer at all, so I'm probably missing the elephant in the room, but any other plugin you do map in prop_html2md.go, and what do you do with this?

@suntong
Copy link
Owner

suntong commented May 4, 2023

and what do you do with this?

That's what I meant by "ask him to remove the EXPERIMENTAL". IE, change the var EXPERIMENTALYoutubeEmbed to var YoutubeEmbed.

Just tell him that you're not a programmer at all, and you have no way to debug the YoutubeEmbed Plugin yourself other than through html2md. Hope that he'd agree with it. But honestly, if you don't know the

go install github.com/suntong/html2md@latest

command to build html2md yourself, you might find it very difficult to discuss in details with him what you want out of the Plugin, and have to accept whatever the plugin is doing as-is.

@suntong
Copy link
Owner

suntong commented May 4, 2023

The funny thing is, it's not even a function.

Ahh, I know what you meant now. Yes, I agree with you, I have no idea how to use it unless it is a function. OK. let me talk to him instead.

@097115
Copy link
Author

097115 commented Dec 29, 2023

So, coming back to this issue, I see that it seems this was fixed upstream, and you have also released a new version :)

However, the example mentioned above:

echo '<p>Lorem Ipsum:</p><p style="text-align: center;"><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/PifPVQOFyZI" title="YouTube video player" width="560"></iframe></p>' | html2md -i --plugin-youtube

still doesn't work for me. Is it the same for you (or have you managed to extract YouTube links from any actual page), or am I missing something?

Thank you :)

@suntong
Copy link
Owner

suntong commented Dec 29, 2023

Per JohannesKaufmann/html-to-markdown#65, there is a new way of doing it -- using iframe plugins, but I haven't got around checking it yet. Have you made any progress using the new version?

@097115
Copy link
Author

097115 commented Dec 30, 2023

Well, from a user's perspective, nothing has changed, hasn't it?

In order to call the YoutubeEmbed() function from plugin/iframe_youtube.go I still have to add --plugin-youtube argument to my html2md call, right?

@suntong
Copy link
Owner

suntong commented Dec 30, 2023

Well, if it works it works, if it doesn't then doesn't, as "nothing has changed" in this project yet.

The key is try to make it work when it doesn't, however I don't have time to look into it (and your other issue #19) now and even in near future, about 3 to 5 month, as I'm suppose to be on holiday and relax since 22th, but the reality was that I had been working days and nights, over 12am midnight each day, and even Christmas and boxing days were not any exception.

I encourage you to reach out to upstream and try to figure out how it suppose to work meanwhile.

@097115
Copy link
Author

097115 commented Dec 30, 2023

So, indeed try to relax, and sorry for bothering you :)

Have a good time, we'll come back to this :)

@097115
Copy link
Author

097115 commented Feb 9, 2024

@suntong, I hope you are doing well!

Speaking of YouTube support, it looks that when releasing the last version, you simply forgot to enable it here:

// if rootArgv.PluginYoutubeEmbed {

(Same goes for PluginVimeoEmbed but differently from PluginYouTubeEmbed, the former takes a parameter, which wasn't exactly obvious to me, and since the last time I saw an embedded Vimeo video was years and years ago, I didn't really bothered :)

So, TL;DR: after un-commenting the aforesaid lines, it seems to work:

echo '<p>Lorem Ipsum:</p><p style="text-align: center;"><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/PifPVQOFyZI" title="YouTube video player" width="560"></iframe></p>' | ./h2m -i --plugin-youtube
Lorem Ipsum:

[![YouTube video player](https://img.youtube.com/vi/PifPVQOFyZI/0.jpg)](https://www.youtube.com/watch?v=PifPVQOFyZI)

I will leave the issue open for now but feel free to close it, since the problem indeed was solved upstream.

@suntong
Copy link
Owner

suntong commented Feb 10, 2024

after un-commenting the aforesaid lines, it seems to work

html2md/prop_html2md.go

Lines 128 to 133 in 423fe1c

if rootArgv.PluginVimeoEmbed {
conv.Use(plugin.VimeoEmbed())
}
if rootArgv.PluginYoutubeEmbed {
conv.Use(plugin.YoutubeEmbed())
}

So I've just un-commented the aforesaid lines, but it's not enough for me, I'm getting

undefined: plugin.VimeoEmbed and undefined: plugin.YoutubeEmbed

How did you make it work please?

@097115
Copy link
Author

097115 commented Feb 11, 2024

Donno, something on your side, I'd say :)

I simply clone this repo, disable Vimeo (since, as I said, it throws an error), then run

  • go mod init html2md_main.go
  • go mod tidy
  • go build -o html2md

And that's it.

@suntong
Copy link
Owner

suntong commented Feb 13, 2024 via email

@suntong
Copy link
Owner

suntong commented Feb 13, 2024 via email

@suntong
Copy link
Owner

suntong commented May 24, 2024

html-to-markdown Release v1.6.0 is out, with update to YoutubeEmbed plugin.

I'll find some time to work on this, to catch up with upstream...

@suntong
Copy link
Owner

suntong commented May 25, 2024

Oh, "disable Vimeo", ok, will try ...

Duh! It's fixed already!! https://github.com/suntong/html2md/releases/tag/v1.5.0

@suntong suntong closed this as completed May 25, 2024
@097115
Copy link
Author

097115 commented May 25, 2024

@suntong, great work! Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream upstream issue
Projects
None yet
Development

No branches or pull requests

2 participants