Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify-able config. #237

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

otherdaniel
Copy link
Collaborator

@otherdaniel otherdaniel commented Aug 13, 2024

[This isn't quite ready yet. Using a PR for tool support and discussion.]

This re-writes the config logic, based on discussions in the recent meetings, and with the intent to address issues #228, #229, and #233.

The corner stones are:

  • The Sanitizer (and thus its config) become mutable. It has methods to add stuff to any of the allow or deny lists.
  • The actual config processing is now defined in terms of those methods.
  • There is only one "safe-ify" operation, and it gets called once (for safe methods). This is the only different between safe and unsafe versions.
    • (Well, this is only a half-truth. The handling of javascript:-navigation URLs remains an odd-ball that is magically different between "safe" and "unsafe" versions.)
  • The configs have an "other" setting, which allows them to express "everything" and "nothing" filters, which in turn allows us to treat the unsafe defaults as just another regular case.
  • The entire "known" logic is gone. But in turn, the "baseline" is now back, and is a deny-list.

Preview | Diff

@mozfreddyb
Copy link
Collaborator

We discussed this in a synchronous meetings. I'm copying the notes verbatim so we can continue async.

  • Document.parseHTML / Document.parseHTMLUnsafe - How does it handle safety?
  • We should not need otherMarkup
  • get/getUnsafe should not be a method
  • Some method for turning unsafe into safe
  • Don’t want to call it “safe”, because we don’t know what safe is
  • Eg. “WithBaseline” (baseline is applied to config)
  • Element vs allowAttribute?
  • setComment why set-method instead of an attribute-setter?
  • Should we use allow instead of allowElement (or element)
  • Anne: How to undo replaceWithChildren
  • Tom: element() currently removes from the replaceWithChildren list
  • Guillaume: “Other name for something captured by allow or block list”
  • We have a baseline know
  • Annevk: baseline remove all unsafe stuff
  • Default “safe list” of allowed elements
  • Annevk: Expose the default list with a static method
  • Anne: Name the method removeUnsafe to make a config safe (instead of safeify, because we can’t easily guarantee safeness for all use cases)
  • It shouldn't be replaceWithChildren but replaceWithChildrenElements (as per earlier group consensus)
  • Anne: Rename methods to addElement, addRemoveElement, addReplaceWithChildrenElement, etc. (prefix with add to make it clear there is a modification of the instance)
  • No group concensus for the add prefix. We need to discuss this further.
  • Tom: Make setComment a property like comment
  • Anne: This would imply introspection for the properties and not for other like elements, removeElements etc.
  • Anne: Remove getUnsafe, instead a method that makes sanitizer safe (removeUnsafe)
  • Guillaume: get() returns whatever the current state is
  • Anne: Could we use toJSON instead of get?
  • Guillaume/Frederik: We need to worry about having functions that are not JSON compatible in the future (e.g. if we add a fallback feature in v2)
  • all: Let’s keep get then.

@mozfreddyb mozfreddyb self-requested a review August 21, 2024 08:48
Copy link
Collaborator

@mozfreddyb mozfreddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great progress so far. We have some nits (and some larger questions) in the review notes, please see above.

@otherdaniel
Copy link
Collaborator Author

  • Added issue links to issues for naming and other markup
  • Added "unsafe" handling to parseHTMLUnsafe.
  • Replaced getUnsafe query method with removeUnsafe modifier.
  • Naming mostly deferred. But made sure all method names start with a verb.
  • Finished handling of element-dependent attributes.
  • Various editorial / link-ification improvements.

index.bs Outdated
Comment on lines 262 to 272
ISSUE(238): Final naming TBD.

ISSUE(240): "other markup" TBD.

ISSUE: Should these be setter methods -- particularly the setXXX(boolean) --
or setters or properties or somesuch?

ISSUE: Should the modifier methods return a reference to [=this=], so that you
can 'chain' methods?
(e.g. `sanitizer.allowElement("a").allowElement("span")`).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file issues for those that are only in the file

index.bs Outdated Show resolved Hide resolved
index.bs Outdated
@@ -297,6 +418,7 @@ dictionary SanitizerConfig {
};
</pre>

ISSUE: Sould members be required, or have declared defaults?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same as above, pls file ISSUE for those that area only in the document)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (Also removed issues that had been resolved, like this one, or otherMarkup.

Copy link
Collaborator

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo naming the general setup here looks really solid to me. I mainly have editorial concerns. Thanks a lot for your work on this!

index.bs Outdated
<div algorithm>
The <dfn for="Sanitizer" export>allowElement</dfn>(|element|) method steps are:

1. Let |name| be the result of [=canonicalize a sanitizer name=] |element| with [=HTML namespace=] as the default namespace.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be cleaner to canonicalize the entire element.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Canonicalize a sanitizer name is used for both elements and attributes. I could add a "canonicalize an elemnt" and a "canonicale an attribute", both of which would then canonicalize their name, but it seems like that merely adds an indirection. (At least as no other canonicalization steps are performed.)

[Leaving open for now.]

index.bs Outdated
The <dfn for="Sanitizer" export>allowElement</dfn>(|element|) method steps are:

1. Let |name| be the result of [=canonicalize a sanitizer name=] |element| with [=HTML namespace=] as the default namespace.
1. [=list/Remove=] |name| from [=this=]'s [=internal slot=]'s {{SanitizerConfig/elements}}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording needs to be more elaborate to make it clear how this removing actually works. Keep in mind that there's no identity match here and also that the entry in elements may contain other keys. (This applies quite a few times in this PR and maybe also in the existing draft text.)

I think it's also somewhat confusing that the internal slot holds the same dictionary as is used for input, although I guess it can work here. But then you need to address them as dictionary members so it would have to be

[=internal slot=]["{{SanitizerConfig/elements}}"] I think.

It also seems weird to call this "internal slot". Typically we would call it "configuration" or some such.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a [=SanitizerConfig/remove=], to explain removal of a name structure.

Removed "internal slot" in favour of configuration; and systematically added dictionary syntax to reference its members.

(I think I cargo-culted the "internal slot" from ECMAScript, which defines an "internal slot" as a hook that the embedder can use to attach data to a script object. I agree that doesn't actually make any sense over here.)

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated
{{SanitizerConfig/replaceWithChildrenElements}}, or
{{SanitizerConfig/attributes}}.
1. Let |result| be a copy of |config|.
1. [=list/For each=] |elem| in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes you use elem, sometimes, item, and sometimes element. I would settle on element for all of these. I'd generally prefer less abbreviations in prose, so also attribute and configuration instead of attr and config.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly done. I consistently use element, attribute, and configuration now. I still use item now and then, but only for contexts where it could be either an element or an attribute. Happy to rename that, but I couldn't really come up with a nice name for element_or_attribute.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@otherdaniel otherdaniel mentioned this pull request Nov 26, 2024
…lot, in favour of an associated config object. Use long names. Define list removeal.
@mozfreddyb
Copy link
Collaborator

copying over from #238

change

  • replaceWithChildrenElements to replaceElementWithChildren()
  • setComments() to plural.

keep:

  • removeElement()
  • allowElement()
  • removeAttribute()
  • allowAttribute()
  • setDataAttributes()

@otherdaniel
Copy link
Collaborator Author

Updated the PR with feedback from today's meeting:

  • Adopted the updated naming.
  • Introduced canonicalization methods for element with attributes; elements; and attributes. Rewrote the caller algorithms so that they canonicalize in the first step, and then only use the canonicalized value.

Copy link
Collaborator

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I would be okay with landing this to get the public draft in a better state, but we'll need follow-up issues for setHTML() defaulting and such (and any comments in this PR that you decide to postpone addressing).

@@ -217,48 +227,86 @@ dictionary SetHTMLOptions {
</pre>

The {{Sanitizer}} configuration object encapsulates a filter configuration.
The same config can be used with both safe or unsafe methods. The intent is
The same configuration can be used with both safe or unsafe methods, where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The same configuration can be used with both safe or unsafe methods, where
The same configuration can be used with both safe and unsafe methods, where

Perhaps we should add quotation marks around "safe"?

@@ -217,48 +227,86 @@ dictionary SetHTMLOptions {
</pre>

The {{Sanitizer}} configuration object encapsulates a filter configuration.
The same config can be used with both safe or unsafe methods. The intent is
The same configuration can be used with both safe or unsafe methods, where
the safe methods perform an implicit {{removeUnsafe}} operation. The intent is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the safe methods perform an implicit {{removeUnsafe}} operation. The intent is
the safe methods perform an implicit {{removeUnsafe}} operation on the passed in configuration and have a default configuration when none is passed. The intent is


<pre class=idl>
[Exposed=(Window,Worker)]
interface Sanitizer {
constructor(optional SanitizerConfig config = {});
constructor(optional SanitizerConfig configuration = {});

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a TODO here for [NewObject] static Sanitizer getDefault()? (I suppose it could also return a SanitizerConfig, but an instance seems better.)


<div algorithm>
The <dfn for="Sanitizer" export>replaceElementWithChildren</dfn>(|element|) method steps are to [=replace an element=] with |element| and [=this=]'s [=Sanitizer/configuration=].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also call the internal operation "replace an element with children"?

|config| and |safe|.
Then return |options|["`sanitizer`"].
1. [=Assert=]: |options|["`sanitizer`"] is a [=dictionary=].
1. Return new {{Sanitizer}}(|options|["`sanitizer`"]).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public API.

1. Let |configuration| be the value of |sanitizer|'s [=Sanitizer/configuration=].
1. If |safe| is true, then set |configuration| to the result of calling [=remove unsafe=] on |configuration|.
1. Call [=sanitize core=] on |node|, |configuration|, and |safe| (as value for
handling javascript navigation urls).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this parenthetical or switch to named parameters as outlined in the Infra standard.

<div algorithm>
The <dfn>sanitize core</dfn> operation,
using a {{ParentNode}} |node|, a {{SanitizerConfig}} |configuration|, and a
[=boolean=] |handle javascript navigation urls|, iterates over the DOM tree
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually better to camelCase variable names as otherwise it's unclear if they are separate variables or one long one.

</div>

<div algorithm>
To <dfn for="Sanitizer">allow comments</dfn> with |allow| on a {{SanitizerConfig}} |configuration|, do:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set comments? and set data attributes below?

ISSUE(233): Determine if this actually holds.


The <dfn>built-in unsafe default config</dfn> is meant to allow anything.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?

{ name: "script", namespace: "http://www.w3.org/2000/svg" }
],
removeAttributes: [....],
comments: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments here isn't needed.

I suspect that we simply want to inline what this is doing in "remove unsafe", but that can probably wait. (As otherwise we'd have to define this microsyntax and such.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants