Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DOMParser option to parse declarative shadow DOM #8759

Closed
mfreed7 opened this issue Jan 19, 2023 · 53 comments
Closed

Add DOMParser option to parse declarative shadow DOM #8759

mfreed7 opened this issue Jan 19, 2023 · 53 comments
Labels
addition/proposal New features or enhancements topic: shadow Relates to shadow trees (as defined in DOM)

Comments

@mfreed7
Copy link
Contributor

mfreed7 commented Jan 19, 2023

I'm splitting this out from the discussion that starts on this comment #5465 (comment) and then jumps to overall issue comments starting here #5465 (comment). This was also previously discussed at length starting at this comment whatwg/dom#912 (comment). That issue has much more context on why there needs to be an opt-in for DSD at all.

The current spec PR has DOMParser.parseFromString(html,"text/html", {declarativeShadowRoots: true}) which is the only API that lets you imperatively parse HTML that contains declarative shadow roots.

There is a question about whether we need to add this, and whether instead we should make Sanitizer's setHTML() the only DSD-aware parser API. That requires some changes to the Sanitizer (#8627), in particular giving setHTML() a sanitizer: "unsafe-none" argument that bypasses all sanitization.

@annevk annevk transferred this issue from whatwg/dom Jan 20, 2023
@annevk annevk added addition/proposal New features or enhancements topic: shadow Relates to shadow trees (as defined in DOM) labels Jan 20, 2023
@annevk
Copy link
Member

annevk commented Jan 20, 2023

To reply to #5465 (comment). I'm not sure I understand. By that logic, we'd need to extend XMLHttpRequest, innerHTML, and various other APIs that would end up prohibiting declarative shadow trees as well. (And yes, if <foobar> had the same issues as declarative shadow trees we should obviously prohibit it by default and make enabling it possible in newish APIs only.)


setHTML() (and proposed friends) is going to be the new HTML parser entry point, replacing innerHTML (and friends). It's intentionally named in a generic fashion so it can be that. It's safe-by-default, but if there are cases where declarative shadow trees need some kind of unsafe entry point we ought to provide for that as well. Either as unsafeSetHTML() or through an argument. How is it not beneficial to provide web developers with a unified API for HTML parsing?

@mfreed7
Copy link
Contributor Author

mfreed7 commented Jan 23, 2023

To reply to #5465 (comment). I'm not sure I understand. By that logic, we'd need to extend XMLHttpRequest, innerHTML, and various other APIs that would end up prohibiting declarative shadow trees as well. (And yes, if <foobar> had the same issues as declarative shadow trees we should obviously prohibit it by default and make enabling it possible in newish APIs only.)

Right. We would. See #6185 for context, but conceptually, the on* event handlers for any new HTML element present an XSS risk for (very) poorly configured sanitizers today, and so by the same logic as for DSD, must be prohibited from use with all parser endpoints without an opt-in. And I'm therefore wondering if you're supportive of only supporting parsing of any new HTML element via setHTML() and friends.

setHTML() (and proposed friends) is going to be the new HTML parser entry point, replacing innerHTML (and friends). It's intentionally named in a generic fashion so it can be that. It's safe-by-default, but if there are cases where declarative shadow trees need some kind of unsafe entry point we ought to provide for that as well. Either as unsafeSetHTML() or through an argument. How is it not beneficial to provide web developers with a unified API for HTML parsing?

I guess I just don't understand the logic of that yet. Is there additional conversation that I can refer to about this decision? For the non-sanitized parsing use case (which is by far the majority of synchronous parser usage on the open Web), I don't see the point of trying to move all parsing to yet another parser entrypoint. I do see the point of creating a new "always safe" sanitized parser API that is guaranteed to produce safe DOM from unsafe HTML. Making that API a Swiss army knife that also produces unsafe DOM from unsafe HTML - that part I don't quite understand yet.

@annevk
Copy link
Member

annevk commented Jan 25, 2023

I'm not sure I follow the thought experiment which makes it hard to comment. In general it doesn't seem like a good idea for new HTML elements to require opt-in all over the place.

I don't see the point of trying to move all parsing to yet another parser entry point.

What other entry point that's also ergonomic did you have in mind?

@Jamesernator
Copy link

setHTML() (and proposed friends) is going to be the new HTML parser entry point, replacing innerHTML (and friends). It's intentionally named in a generic fashion so it can be that.

One limiting thing about this is that setHTML presumes the content is specifically an HTML fragment rather than a full document. i.e. A full document with doctype, root element, etc do not make sense for setHTML.

How is it not beneficial to provide web developers with a unified API for HTML parsing?

Encouraging general parsing to go through Sanitizer seems highly reasonble, so it would make sense if Sanitizer provided a parseDocument method to replace the capability of DOMParser.

@annevk
Copy link
Member

annevk commented Jan 26, 2023

I think for parsing a whole document we need a streaming API as we have oft-discussed, but thus far haven't made much progress on. The other thing that's needed for that is use cases as those are quite a bit less fleshed out.

@mfreed7 mfreed7 added the agenda+ To be discussed at a triage meeting label Feb 7, 2023
@mfreed7
Copy link
Contributor Author

mfreed7 commented Feb 7, 2023

Agenda+ to discuss on Thursday. A preview of what I'd like to discuss:

  • This question (whether to add DSD capability to DOMParser) seems to hinge on whether there will be a setHTML() functionality that a) parses DSD, and b) allows "unsafe-dont-sanitize" behavior. This is primarily being discussed in #185 (and a bit in #184).
  • There seems to be some uncertainty about the "grand plan" for parser entrypoints vis-a-vis these new APIs. E.g. there are requests for other new parser entrypoints, as discussed on #184. @annevk I think it would help move the conversation forward if you could put together a document/comment/something with more of your vision for these APIs.

Overall, if all of the above questions were resolved, I would feel much better about not adding a DOMParser entrypoint for DSD. So I think we should discuss those first and try to come to some agreements.

@otherdaniel @jyasskin @mozfreddyb

@annevk
Copy link
Member

annevk commented Feb 8, 2023

I think based on your input (in particular with regards to 2 below) the current idea is as follows:

  1. We add six safe-by-default HTML tree mutation methods for parity with existing (HTML) tree mutation features. Proposed names are in positioned counterparts of setHTML? WICG/sanitizer-api#184 (comment).
  2. All of them support declarative shadow trees by default, just like the HTML parser does. (This is fine due to the next point.)
  3. To guarantee safe-by-default all of them will use the default sanitization behavior, as currently being worked on in the Sanitizer API repository. That's always in effect as long as only a single argument is passed.
  4. Within the realms of safe-by-default that sanitization behavior will be configurable, as currently being worked on in the Sanitizer API repository.
  5. The sanitizer can also be disabled through an explicit opt-out that indicates you are entering unsafe-please-know-what-you-are-doing territory.
  6. Potentially: a CSP flag or equivalent through which disabling the sanitizer or using the legacy HTML tree mutation methods such as innerHTML ends up throwing.

cc @rniwa @smaug----

@pygy
Copy link

pygy commented Feb 8, 2023

Are these methods intended to work in SVG and MathML as well? If so setMarkup etc... would be better semantically, if a bit more verbose. svgElement.innerHTML always irks me...

@annevk
Copy link
Member

annevk commented Feb 8, 2023

It's good that you bring that up. These methods would be available where their non-HTML counterparts are available. Now one thing that the innerHTML setter does is branch on the document's type ("xml" or "html") and use a different fragment parser based on that.

I think for these methods it would be preferable if we always used the HTML fragment parser. The XML-parser is contextless so doesn't benefit as much from having these kind of positional methods. And a document-based mode switch is also not a great API. Based on that I think the names are still good.

@pygy
Copy link

pygy commented Feb 8, 2023

Few people know about these parser subtleties...

MathML and SVG are not HTML though, even when embedded. They are distinct languages and their DOM interface differs from HTML, even when they've been parsed by the HTML parser...

Equating HTML with "any markup parsed as HTML" seems like a loss of precision that could be avoided.

That being said, the URI/URL/URN was also meaningful but ignored in the wild (except for data: URIs which had some traction)... and thus URIs and URNs were retired.

A broader survey may be a good thing before settling on these names.

@mfreed7
Copy link
Contributor Author

mfreed7 commented Feb 8, 2023

From a purely-DSD point of view, a single setHTML() method would meet the developer need, provided that it:

  1. parses declarative shadow DOM by default, and
  2. allows either safe/sanitized or unsafe/unsanitized behavior.

The above aligns with your vision as you've described it. However, it sounds like part (2) is not acceptable to some of the Sanitizer folks (see comments on WICG/sanitizer-api#185), since it eliminates the benefits of having a "guaranteed safe" output. I don't want to speak for that group, but I do see their points.

If, due to that discussion, setHTML() needs to be broken into two methods like setHTMLAlwaysSafe() and setHTMLUnsafe(), then I do not think we should build setHTMLUnsafe(). It would be almost exactly like innerHTML, but subtly different, and this would be actively bad for developers because it's confusing and unclear which one to use. And adding CSP to any situation does not improve confusion levels.

@annevk
Copy link
Member

annevk commented Feb 9, 2023

Could you explain their points? They are not clear to me.

Anyway, assuming there is some compelling reason to duplicate the set of methods rather than have the sanitize member accept "unsafe-none", setUnsafeHTML() would essentially be innerHTML, but with declarative shadow root support, right? Still seems like we'd have a reasonably clear migration story for web developers.

I'm not sure what your jab at CSP is supposed to mean or how it relates to this.

@pygy
Copy link

pygy commented Feb 9, 2023

Having slept on it I think I see our point re. naming.

.setHTML/.setUnsafeHTML has the advantage of making code review and static code analysis easier.

Regardless of the details of unsafe parsing, please don't delay the specification and implementation of the safe subset which provides clear benefits in terms of functionality and safety.

@otherdaniel
Copy link
Contributor

Anyway, assuming there is some compelling reason to duplicate the set of methods rather than have the sanitize member accept "unsafe-none", [...]

I've explained this aspect in more detail here, where this was raised seperately.

@mozfreddyb
Copy link
Contributor

I've also commented in the issue that @otherdaniel mentioned about the safety expectations from setHTML(). (Oh no. Forked threads!)

Something that I didn't note was that if we want to tell people to "Use setHTML() instead of innerHTML= and you fixed XSS" then defaulting the new method to allow shadow roots will be quite a surprise. In terms of authoring expectations, I'd lean towards making that opt-in rather than opt-out.

@zcorpan
Copy link
Member

zcorpan commented Feb 9, 2023

@annevk

It's good that you bring that up. These methods would be available where their non-HTML counterparts are available. Now one thing that the innerHTML setter does is branch on the document's type ("xml" or "html") and use a different fragment parser based on that.

I think for these methods it would be preferable if we always used the HTML fragment parser. The XML-parser is contextless so doesn't benefit as much from having these kind of positional methods. And a document-based mode switch is also not a great API. Based on that I think the names are still good.

I think mode switches based on the document's type is not great. If we want to have an XML equivalent, it could be a separate method e.g. setXML() which always parses as XML.

For serialization, we don't have an HTML-only way to serialize, as far as I know. The innerHTML getter depends on the document. XMLSerializer always serializes as XML. Maybe this is a separate issue, though.

@otherdaniel
Copy link
Contributor

Something that I didn't note was that if we want to tell people to "Use setHTML() instead of innerHTML= and you fixed XSS" then defaulting the new method to allow shadow roots will be quite a surprise. In terms of authoring expectations, I'd lean towards making that opt-in rather than opt-out.

Is this based on some security requirement for the Sanitizer, or merely on a "least surprise" principle when trying to substitute setHTML for innerHTML? I don't think a new setHTML API should be constraint by a legacy API.

I think sanitizing shadow roots is easily resolved; and I thought the Sanitizer requirements were based on the no-XSS threat model.

@past past added agenda+ To be discussed at a triage meeting and removed agenda+ To be discussed at a triage meeting labels Feb 9, 2023
@mfreed7
Copy link
Contributor Author

mfreed7 commented Feb 21, 2023

Something that I didn't note was that if we want to tell people to "Use setHTML() instead of innerHTML= and you fixed XSS" then defaulting the new method to allow shadow roots will be quite a surprise. In terms of authoring expectations, I'd lean towards making that opt-in rather than opt-out.

Is this based on some security requirement for the Sanitizer, or merely on a "least surprise" principle when trying to substitute setHTML for innerHTML? I don't think a new setHTML API should be constraint by a legacy API.

I think sanitizing shadow roots is easily resolved; and I thought the Sanitizer requirements were based on the no-XSS threat model.

I'd agree that this new API shouldn't be constrained by legacy. And in that case, the most surprising thing would be if it didn't parse DSD. Fundamentally, DSD doesn't introduce any new XSS vectors, right? It's just another container that needs to be examined for XSS-containing content.

@mfreed7
Copy link
Contributor Author

mfreed7 commented Mar 2, 2023

That would argue for changing XMLHttpRequest as well.

My original proposal added an opt-in to XHR also. This was discussed, but since there's not a great place to put the opt-in option, it was vetoed.

I still think it's reasonable to wait with this for v0 until we see more of a concrete demand.

Ok. Can you comment on what form of demand would convince you?

@mfreed7
Copy link
Contributor Author

mfreed7 commented Mar 9, 2023

From the March 9 spec triage meeting, here are the notes:


  1. Carryovers from last time
    1. [Mason] Add DOMParser option to parse declarative shadow DOM
      1. Both Mozilla and Chromium representatives believe adding such an option to DOMParser is a reasonable path forward for the whole-document case.
      2. No need to support XHR since it's a legacy API. DOMParser has no replacement now.
  2. New topics
    1. [Connected to DOMParser discussion] setHTML: entry point to the sanitizer, or a generic HTML setter?
      1. Question: are we OK adding a sanitizing method where sanitizing cannot be turned off?
        1. Daniel: this is very important
        2. Olli: no strong opinions. Simon: OK with this. Freddy seemed to be warming up to this idea.
      2. Question: do we want to add a non-sanitizing method?
        1. Mason: This is necessary for key DSD use cases, like script-containing custom elements.
        2. Olli: This is also important for performance.
      3. Question: do we want to add the variant methods (e.g. [unsanitized, sanitized] x [prepend|append|set] x [XML|HTML])?
        1. Mason/Daniel: the [unsanitized] branch of this seems like adding more dangerous things, but OK to compromise if other people really want them
        2. Some discussion of using options bags to reduce the number of methods, but nobody really in favor.
        3. Simon: we can defer the [XML] branch, especially since we don't know how DSD will work there anyway.
      4. Future question: how do we support sanitization in a future streaming HTML parser API? (First we need to design the streaming HTML parser API...)
        1. Simon: This would need fundamental changes to the sanitizer spec, working on the tree builder instead.
        2. Daniel: This seems doable, even if a lot of work. Pointers from Simon appreciated.
      5. Discussion will continue async!!

Action Items

  1. Simon to open an issue about streaming sanitized parsing, with pointers to help Daniel understand how we could build that in the future.

@mfreed7
Copy link
Contributor Author

mfreed7 commented Mar 9, 2023

Per the above notes, we had fairly general agreement in the room that it was "ok" and likely the right path forward to extend DOMParser with an opt-in for DSD, especially given that there is no alternative. Based on that, I'm wondering if we are ok resolving this issue and landing the spec PR?


There was also some good discussion in the "new parser entrypoints" category, which mostly relates to WICG/sanitizer-api#184. My take was that there was rough consensus on these:

  1. If both "safe" and "unsafe" parsing is to be added, it needs to be in separate methods, rather than configurable via an optional argument.
  2. Generally, there is a need for "unsafe" parsing for multiple reasons.
  3. There is probably appetite to add "variant" methods like prepend/append/set, but they should be individual methods for ergonomics, rather than all done through one function with an options bag.
  4. We don't need to add XML variants for now.
  5. Streaming variants are an open question.

@zcorpan
Copy link
Member

zcorpan commented Mar 10, 2023

  1. Simon to open an issue about streaming sanitized parsing, with pointers to help Daniel understand how we could build that in the future.

WICG/sanitizer-api#190

@annevk
Copy link
Member

annevk commented Mar 15, 2023

Can you comment on what form of demand would convince you?

E.g., userland workarounds/packages/libraries, highly-upvoted Stack Overflow questions.

@annevk
Copy link
Member

annevk commented Mar 22, 2023

Question: are we OK adding a sanitizing method where sanitizing cannot be turned off?

Could someone give more context about this? Previously I've understood this to be about static analysis where you also have to trust the web developer. But if you have to trust the web developer, can't the static analysis require particular things about the syntax being used? I have a hard time wrapping my head around this requirement.

To add some more context this would be the first time that I know of where static analysis (plus trusting the web developer to not try to fool the static analysis) forms the basis of web platform API design. I think that needs a fair bit of justification.

@otherdaniel
Copy link
Contributor

Question: are we OK adding a sanitizing method where sanitizing cannot be turned off?

Could someone give more context about this? Previously I've understood this to be about static analysis where you also have to trust the web developer. But if you have to trust the web developer, can't the static analysis require particular things about the syntax being used? I have a hard time wrapping my head around this requirement.

To add some more context this would be the first time that I know of where static analysis (plus trusting the web developer to not try to fool the static analysis) forms the basis of web platform API design. I think that needs a fair bit of justification.

From the Sanitizer & security perspective, having a method that actually sanitizes is a requirement in its own right. A security function should not have a built-in footgun with which it disables itself. The Sanitizer API group has, for more than two years, held to the security principle that the Sanitizer API should defend a baseline security guarantee, regardless of how it is configured.

From a more general perspective, we find it questionable to add new methods that add to the XSS surface, and that seem to add little over the legacy methods they replace. XSS is a problem which has plagued developers for decades and we shouldn't lightly add new XSS surfaces to the platform. The legacy use cases seem to be adequately handled by the legacy APIs.

The extensible web manifesto calls for new low-level primitives that can explain higher-level capabilities, which also suggests that we should expose sanitization as a primitive, and not only as part of a larger function.

 

On static analysis: I'm not sure what "static analysis where you also have to trust the web developer" means.

We (and many others) use static analysis as part of our code presubmit checks. An analysis that checks whether certain methods are called are much easier to do than one which has to establish the possible runtime values of a variable. It's also fairly easy to discover all constructor invocations and to either allow-list them based on some criteria, or to manually inspect them and have the static analysis confirm that there are only a set of known instances. A sanitizer entry point that optionally doesn't sanitize, based on an options bag, makes those analyses a lot harder, since establishing the set of runtime values of an options bag isn't easy to statically determine.

What you can demand of the developer depends a lot on the project state. If you're starting a project from scratch, or the project is small enough that a rewrite is feasible, then you can quite easily make demands on how to use APIs, and then use static analysis to enforce this usage. (E.g., only pass constants to a sanitizer options bag.) That makes things easy. If the project has a large existing codebase, which e.g. occurs when we acquire a company, or use a 3rd-party library, or try to elsehow upgrade an existing codebase, then you'll have to deal with whatever is there, and try to iteratively move the codebase to the desired goal. If in these cases the sanitizer calls are wrapped in code that dynamically constructs the options; or passes around closures or function references that sanitize, then the analysis has to pessimize any such call, which usually makes the analysis useless. I think this case of a pre-existing code base is the more common case, especially outside of large companies like ours.

 

IMHO, the by far easiest way to resolve all of this would be to disentangle these requirements: Give the DOMParser a 'DSD' option; give the Sanitizer the sanitization method back; and then discuss separately which merits setHTML brings to the developer community, and which API shape would follow from that. That matches established design principles, like the extensible web manifesto, and established user demand, like every HTML sanitizer I know of. It seems the current discussion presumes setHTML as the sole solution to all of these problems, for reasons that elude me.

@annevk
Copy link
Member

annevk commented Mar 23, 2023

That's disappointing to hear. The reason we settled on setHTML() (and friends) is due to the HTML parser requiring context, innerHTML being inadequate for declarative shadow trees, and easily abused with +=, etc.

If you look at this from the extensible web perspective you'd end up with two new operations:

  • parseIntoDocumentFragment(Element, DOMString) -> DocumentFragment
  • sanitize(Node)

Only the latter under the control of the sanitizer. I proposed going in that direction, but you were against that too.

@jyasskin
Copy link
Member

I'd like to remind everyone of the "No feigning surprise" guideline from the Recurse Center. We're all trying to get to a reasonable design here, and I think we're having trouble at least partly because the design constraints and rejected designs aren't written down in an explainer anywhere. Instead, they're scattered across several issues on several repositories, so we all wind up forgetting that the question we're about to ask or objection we're about to raise has already been answered. And we feel like we've answered a question when the answer we're thinking of from some other thread doesn't actually quite answer it.

@pygy
Copy link

pygy commented Mar 24, 2023

Re. static analysis, why is it harder to reject a non-static option bag than non-static method invocation?

Twice bad:

const bypass = "potentially" + "dangerous"
const options = {unsafe: true}
element[bypass](options)

Twice safe:

element.potentiallyUnsafe({...whatever, unsafe: false})

The analysis is marginally more complex, but this is a one time cost when writing the validation logic.

The option bag must be designed with that scenario in mind of course.

@koto
Copy link
Contributor

koto commented Mar 28, 2023

Re. static analysis, why is it harder to reject a non-static option bag than non-static method invocation?

It's only marginally harder to write such rejection logic, yes. But the point is that any rejection logic catches both vulnerabilities and benign code, and we think it's best if it can precisely target vulnerabilities. There is usually nothing wrong with passing dynamic values to functions unless one of the function variants has different capabilites, it which case the security analysis (automatic or manual) needs to cover all forms that can potentially trigger this capability.

If the code that is vulnerable looks similar (or even identical) to one that is not:

  • it's harder to teach developers to code securely
  • it's less obvious to the developer that they are introducing a vulnerability
  • you miss vulnerabilities during manual code review
  • you need to deal with a large number of false-positive findings from static analysis

All of these are ongoing costs for your own code, but the effect explodes with adding dependencies (think: reviewing 100 libraries that liberally use innerHTML because it was the idiomatic way to parse & insert strings into DOM).

There will always be odd cases that still allow you to "bypass" static analysis (e.g. eval, dynamic examples you pointed out, or prototype pollution), the point is however that API choices affect how a vulnerability appears in the code - I think it's better if it stands out, for aforementioned reasons.

@annevk
Copy link
Member

annevk commented Mar 29, 2023

Well it does stand out. I guess the question and point of contention is whether it's our job to ensure that it still stands out when someone creates an abstraction or if that is the responsibility of the person creating the abstraction.

@koto
Copy link
Contributor

koto commented Mar 29, 2023

I agree. Passing dynamic arguments to API functions hardly counts as abstraction though. I think it's beneficial if different security capabilities are not coupled within one API function (e.g. it's clear what .textContent does and what .innerHTML does, and why both exist).

@annevk
Copy link
Member

annevk commented Mar 29, 2023

I think what I don't understand is that if you have to do static analysis anyway and web developers are expected to cooperate, why the web developers wishing to perform static analysis cannot have a safe wrapper and forbid the unsafe variant. Instead this seems to impose certain static analysis requirements on all web platform API designers.

E.g., is window.open(url, "_blank", "noopener") acceptable given these requirements? Presumably it would have had to be window.noopenerOpen(url) so you can forbid the unsafe one more easily.

@koto
Copy link
Contributor

koto commented Mar 29, 2023

It's not about static analysis alone, rather about modelling API that makes it easy to reason about the API (for the author, for the manual reviewer, for trainers, for integrators, for dynamic scanners, linters etc.). Here, for example, I don't think relying on a safe wrapper scales well, given that web applications use 3p dependencies.

Could window.open be better? I guess, Element.setAttribute is also causing a lot of headaches for our security team (script.src can cause XSS, but most other usages do not). But we have to stay practical, hence the opinion that for a new function that is modelled to be endemic, replaces a very known XSS sink, and can trigger the worst kind of web application vulnerability, the bar is a bit higher.

@annevk
Copy link
Member

annevk commented Mar 29, 2023

If you use a dependency outside of your control you also cannot rely on static analysis. In that case you would need some kind of runtime enforcement of your policies, especially as this is not the only API that can do this kind of thing.

Forever doubling the number of parser methods over this seems undesirable. Especially as the proposed API shape seems quite clear and in most use cases should not be hard to evaluate. When a second argument is passed, more scrutiny is needed. (And when it becomes harder, static analysis and other tools can help, just as they can with existing APIs.)

@jyasskin
Copy link
Member

I'm worried about an API shape where

  • setHTML(string) is safe,
  • setHTML(string, {sanitizer: "unsafe-none"}) is unsafe,
  • setHTML(string, {sanitizer: computedSanitizer}) might or might not be safe: I find @koto's argument convincing that this is hard for static analysis tools to check, and that even in third-party codebases, it's easier to check for uses of setUnsanitizedHTML(string) and less breaking to ban uses of element[computedField](string).
  • setHTML(string, {newOption: itsValue}) is safe but might be banned by lint checks that took the shortcut of insisting on the single-argument form for safety: I'm really worried about this one, because it sacrifices the future-compatibility we normally get from using an options bag.

I'm not so worried about doubling the number of parsing methods over this. We might not even need the beforeUnsanitizedHTML(), etc. variants, since we don't want XSS vulnerabilities to be common. Those could wait until framework authors actually ask for them. But even if they're necessary, as @domenic argued in w3ctag/design-principles#426 (comment), new function names aren't obviously worse than namespaced groups of functions, and I think that's equally true when the namespacing is done with an options bag.

Apologies if I've misunderstood the API shape @annevk has in mind; I couldn't find anywhere it was written down.

@keithamus
Copy link
Contributor

If the concern is around safety aspects of setHTML, can the feature set be determined by CSP headers? require-trusted-types-for could be extended to ensure that setHTML is passed a sanitizer that correctly returns a trusted type, no?

@jyasskin
Copy link
Member

@annevk suggested the CSP option in #8759 (comment), and I'd have sworn that @koto answered the suggestion somewhere, but I can't find that answer. My understanding, which the security folks might correct, is that CSP is great for enforcing that a policy is followed (so it'd be good to also have that check), but it's not great for either adopting that policy incrementally or for preventing regressions without user-visible breakage. If presubmit checks can run static analysis to identify regressions before they're committed, we can migrate a single component at a time and avoid failures in production after the migration's finished.

@annevk
Copy link
Member

annevk commented Mar 31, 2023

@jyasskin isn't that what report-only is for?

@jyasskin
Copy link
Member

@annevk, say you're migrating a large app with a bunch of components. You turn on report-only and get a bunch of reports, and start fixing them. When you finish a component, you want to make sure it doesn't regress. You can't turn on CSP enforcement for any particular page, though, because they all include other components that aren't finished migrating yet. So CSP by itself doesn't let you ratchet forward compliance. Static analysis does, since you can turn it on a directory at a time.

@keithamus
Copy link
Contributor

I just want to clarify your concern by attempting to summarise:

  • CSP is great for enforcing policy constrains and preventing regressions but...
  • CSP is difficult/impractical to partially enforce, and so all callsites must comply before turning on, which means...
  • One must rely on static analysis in the interim, which allows for enforcement of first-party code but...
  • It leaves the gap open for third party dependencies which can't reasonably participate in static analysis.
  • Additionally static analysis may not be robust enough depending on how the API is designed.

@koto
Copy link
Contributor

koto commented Apr 3, 2023

I think it's the right summary when it comes to the static analysis angle, my concerns about the API shape are however a bit more generic. I find it beneficial if identifying obviously XSSy string->DOM calls from obviously safe ones was possible with little grey area in between - for browsers, humans and all kinds of tools. An additional cognitive load of an extra function name seems better than one introduced by the option bag field.

@mfreed7
Copy link
Contributor Author

mfreed7 commented Aug 21, 2023

For visibility, I just want to mention that this issue is (in my opinion) closed by #9538. That is a PR that describes two new parsing APIs, Element.setHTMLUnsafe() and Document.parseHTMLUnsafe(), both of which parse declarative shadow DOM content natively. They don't suffer from the compat issues that cause problems for DSD content with innerHTML or DOMParser, and they're a future direction for the platform for parsing and sanitization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements topic: shadow Relates to shadow trees (as defined in DOM)
Development

No branches or pull requests