-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DOMParser option to parse declarative shadow DOM #8759
Comments
To reply to #5465 (comment). I'm not sure I understand. By that logic, we'd need to extend
|
Right. We would. See #6185 for context, but conceptually, the
I guess I just don't understand the logic of that yet. Is there additional conversation that I can refer to about this decision? For the non-sanitized parsing use case (which is by far the majority of synchronous parser usage on the open Web), I don't see the point of trying to move all parsing to yet another parser entrypoint. I do see the point of creating a new "always safe" sanitized parser API that is guaranteed to produce safe DOM from unsafe HTML. Making that API a Swiss army knife that also produces unsafe DOM from unsafe HTML - that part I don't quite understand yet. |
I'm not sure I follow the thought experiment which makes it hard to comment. In general it doesn't seem like a good idea for new HTML elements to require opt-in all over the place.
What other entry point that's also ergonomic did you have in mind? |
One limiting thing about this is that
Encouraging general parsing to go through |
I think for parsing a whole document we need a streaming API as we have oft-discussed, but thus far haven't made much progress on. The other thing that's needed for that is use cases as those are quite a bit less fleshed out. |
Agenda+ to discuss on Thursday. A preview of what I'd like to discuss:
Overall, if all of the above questions were resolved, I would feel much better about not adding a |
I think based on your input (in particular with regards to 2 below) the current idea is as follows:
cc @rniwa @smaug---- |
Are these methods intended to work in SVG and MathML as well? If so |
It's good that you bring that up. These methods would be available where their non-HTML counterparts are available. Now one thing that the I think for these methods it would be preferable if we always used the HTML fragment parser. The XML-parser is contextless so doesn't benefit as much from having these kind of positional methods. And a document-based mode switch is also not a great API. Based on that I think the names are still good. |
Few people know about these parser subtleties... MathML and SVG are not HTML though, even when embedded. They are distinct languages and their DOM interface differs from HTML, even when they've been parsed by the HTML parser... Equating HTML with "any markup parsed as HTML" seems like a loss of precision that could be avoided. That being said, the URI/URL/URN was also meaningful but ignored in the wild (except for data: URIs which had some traction)... and thus URIs and URNs were retired. A broader survey may be a good thing before settling on these names. |
From a purely-DSD point of view, a single
The above aligns with your vision as you've described it. However, it sounds like part (2) is not acceptable to some of the Sanitizer folks (see comments on WICG/sanitizer-api#185), since it eliminates the benefits of having a "guaranteed safe" output. I don't want to speak for that group, but I do see their points. If, due to that discussion, |
Could you explain their points? They are not clear to me. Anyway, assuming there is some compelling reason to duplicate the set of methods rather than have the I'm not sure what your jab at CSP is supposed to mean or how it relates to this. |
Having slept on it I think I see our point re. naming.
Regardless of the details of unsafe parsing, please don't delay the specification and implementation of the safe subset which provides clear benefits in terms of functionality and safety. |
I've explained this aspect in more detail here, where this was raised seperately. |
I've also commented in the issue that @otherdaniel mentioned about the safety expectations from Something that I didn't note was that if we want to tell people to "Use |
I think mode switches based on the document's type is not great. If we want to have an XML equivalent, it could be a separate method e.g. For serialization, we don't have an HTML-only way to serialize, as far as I know. The |
Is this based on some security requirement for the Sanitizer, or merely on a "least surprise" principle when trying to substitute setHTML for innerHTML? I don't think a new setHTML API should be constraint by a legacy API. I think sanitizing shadow roots is easily resolved; and I thought the Sanitizer requirements were based on the no-XSS threat model. |
I'd agree that this new API shouldn't be constrained by legacy. And in that case, the most surprising thing would be if it didn't parse DSD. Fundamentally, DSD doesn't introduce any new XSS vectors, right? It's just another container that needs to be examined for XSS-containing content. |
My original proposal added an opt-in to XHR also. This was discussed, but since there's not a great place to put the opt-in option, it was vetoed.
Ok. Can you comment on what form of demand would convince you? |
From the March 9 spec triage meeting, here are the notes:
Action Items
|
Per the above notes, we had fairly general agreement in the room that it was "ok" and likely the right path forward to extend There was also some good discussion in the "new parser entrypoints" category, which mostly relates to WICG/sanitizer-api#184. My take was that there was rough consensus on these:
|
|
E.g., userland workarounds/packages/libraries, highly-upvoted Stack Overflow questions. |
Could someone give more context about this? Previously I've understood this to be about static analysis where you also have to trust the web developer. But if you have to trust the web developer, can't the static analysis require particular things about the syntax being used? I have a hard time wrapping my head around this requirement. To add some more context this would be the first time that I know of where static analysis (plus trusting the web developer to not try to fool the static analysis) forms the basis of web platform API design. I think that needs a fair bit of justification. |
From the Sanitizer & security perspective, having a method that actually sanitizes is a requirement in its own right. A security function should not have a built-in footgun with which it disables itself. The Sanitizer API group has, for more than two years, held to the security principle that the Sanitizer API should defend a baseline security guarantee, regardless of how it is configured. From a more general perspective, we find it questionable to add new methods that add to the XSS surface, and that seem to add little over the legacy methods they replace. XSS is a problem which has plagued developers for decades and we shouldn't lightly add new XSS surfaces to the platform. The legacy use cases seem to be adequately handled by the legacy APIs. The extensible web manifesto calls for new low-level primitives that can explain higher-level capabilities, which also suggests that we should expose sanitization as a primitive, and not only as part of a larger function.
On static analysis: I'm not sure what "static analysis where you also have to trust the web developer" means. We (and many others) use static analysis as part of our code presubmit checks. An analysis that checks whether certain methods are called are much easier to do than one which has to establish the possible runtime values of a variable. It's also fairly easy to discover all constructor invocations and to either allow-list them based on some criteria, or to manually inspect them and have the static analysis confirm that there are only a set of known instances. A sanitizer entry point that optionally doesn't sanitize, based on an options bag, makes those analyses a lot harder, since establishing the set of runtime values of an options bag isn't easy to statically determine. What you can demand of the developer depends a lot on the project state. If you're starting a project from scratch, or the project is small enough that a rewrite is feasible, then you can quite easily make demands on how to use APIs, and then use static analysis to enforce this usage. (E.g., only pass constants to a sanitizer options bag.) That makes things easy. If the project has a large existing codebase, which e.g. occurs when we acquire a company, or use a 3rd-party library, or try to elsehow upgrade an existing codebase, then you'll have to deal with whatever is there, and try to iteratively move the codebase to the desired goal. If in these cases the sanitizer calls are wrapped in code that dynamically constructs the options; or passes around closures or function references that sanitize, then the analysis has to pessimize any such call, which usually makes the analysis useless. I think this case of a pre-existing code base is the more common case, especially outside of large companies like ours.
IMHO, the by far easiest way to resolve all of this would be to disentangle these requirements: Give the DOMParser a 'DSD' option; give the Sanitizer the sanitization method back; and then discuss separately which merits setHTML brings to the developer community, and which API shape would follow from that. That matches established design principles, like the extensible web manifesto, and established user demand, like every HTML sanitizer I know of. It seems the current discussion presumes |
That's disappointing to hear. The reason we settled on If you look at this from the extensible web perspective you'd end up with two new operations:
Only the latter under the control of the sanitizer. I proposed going in that direction, but you were against that too. |
I'd like to remind everyone of the "No feigning surprise" guideline from the Recurse Center. We're all trying to get to a reasonable design here, and I think we're having trouble at least partly because the design constraints and rejected designs aren't written down in an explainer anywhere. Instead, they're scattered across several issues on several repositories, so we all wind up forgetting that the question we're about to ask or objection we're about to raise has already been answered. And we feel like we've answered a question when the answer we're thinking of from some other thread doesn't actually quite answer it. |
Re. static analysis, why is it harder to reject a non-static option bag than non-static method invocation? Twice bad: const bypass = "potentially" + "dangerous"
const options = {unsafe: true}
element[bypass](options) Twice safe: element.potentiallyUnsafe({...whatever, unsafe: false}) The analysis is marginally more complex, but this is a one time cost when writing the validation logic. The option bag must be designed with that scenario in mind of course. |
It's only marginally harder to write such rejection logic, yes. But the point is that any rejection logic catches both vulnerabilities and benign code, and we think it's best if it can precisely target vulnerabilities. There is usually nothing wrong with passing dynamic values to functions unless one of the function variants has different capabilites, it which case the security analysis (automatic or manual) needs to cover all forms that can potentially trigger this capability. If the code that is vulnerable looks similar (or even identical) to one that is not:
All of these are ongoing costs for your own code, but the effect explodes with adding dependencies (think: reviewing 100 libraries that liberally use There will always be odd cases that still allow you to "bypass" static analysis (e.g. eval, dynamic examples you pointed out, or prototype pollution), the point is however that API choices affect how a vulnerability appears in the code - I think it's better if it stands out, for aforementioned reasons. |
Well it does stand out. I guess the question and point of contention is whether it's our job to ensure that it still stands out when someone creates an abstraction or if that is the responsibility of the person creating the abstraction. |
I agree. Passing dynamic arguments to API functions hardly counts as abstraction though. I think it's beneficial if different security capabilities are not coupled within one API function (e.g. it's clear what |
I think what I don't understand is that if you have to do static analysis anyway and web developers are expected to cooperate, why the web developers wishing to perform static analysis cannot have a safe wrapper and forbid the unsafe variant. Instead this seems to impose certain static analysis requirements on all web platform API designers. E.g., is |
It's not about static analysis alone, rather about modelling API that makes it easy to reason about the API (for the author, for the manual reviewer, for trainers, for integrators, for dynamic scanners, linters etc.). Here, for example, I don't think relying on a safe wrapper scales well, given that web applications use 3p dependencies. Could |
If you use a dependency outside of your control you also cannot rely on static analysis. In that case you would need some kind of runtime enforcement of your policies, especially as this is not the only API that can do this kind of thing. Forever doubling the number of parser methods over this seems undesirable. Especially as the proposed API shape seems quite clear and in most use cases should not be hard to evaluate. When a second argument is passed, more scrutiny is needed. (And when it becomes harder, static analysis and other tools can help, just as they can with existing APIs.) |
I'm worried about an API shape where
I'm not so worried about doubling the number of parsing methods over this. We might not even need the Apologies if I've misunderstood the API shape @annevk has in mind; I couldn't find anywhere it was written down. |
If the concern is around safety aspects of |
@annevk suggested the CSP option in #8759 (comment), and I'd have sworn that @koto answered the suggestion somewhere, but I can't find that answer. My understanding, which the security folks might correct, is that CSP is great for enforcing that a policy is followed (so it'd be good to also have that check), but it's not great for either adopting that policy incrementally or for preventing regressions without user-visible breakage. If presubmit checks can run static analysis to identify regressions before they're committed, we can migrate a single component at a time and avoid failures in production after the migration's finished. |
@jyasskin isn't that what report-only is for? |
@annevk, say you're migrating a large app with a bunch of components. You turn on |
I just want to clarify your concern by attempting to summarise:
|
I think it's the right summary when it comes to the static analysis angle, my concerns about the API shape are however a bit more generic. I find it beneficial if identifying obviously XSSy string->DOM calls from obviously safe ones was possible with little grey area in between - for browsers, humans and all kinds of tools. An additional cognitive load of an extra function name seems better than one introduced by the option bag field. |
For visibility, I just want to mention that this issue is (in my opinion) closed by #9538. That is a PR that describes two new parsing APIs, |
I'm splitting this out from the discussion that starts on this comment #5465 (comment) and then jumps to overall issue comments starting here #5465 (comment). This was also previously discussed at length starting at this comment whatwg/dom#912 (comment). That issue has much more context on why there needs to be an opt-in for DSD at all.
The current spec PR has
DOMParser.parseFromString(html,"text/html", {declarativeShadowRoots: true})
which is the only API that lets you imperatively parse HTML that contains declarative shadow roots.There is a question about whether we need to add this, and whether instead we should make Sanitizer's
setHTML()
the only DSD-aware parser API. That requires some changes to the Sanitizer (#8627), in particular givingsetHTML()
asanitizer: "unsafe-none"
argument that bypasses all sanitization.The text was updated successfully, but these errors were encountered: