You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Any datafield in the database that holds data that is captured with the regex '&.+$' breaks the output of all XSAMS files. This regex fits to all HTML entities e.g. ä. If one of these is outputted through regex, they are not escaped, therefore breaking most browsers and the validator (if it doesn't happen to be a html entity). Browsers expect a semicolon as the sixth character after the ampersand.
The url field of this scan contains the following characters (within the link):
52fed736-74fc-11e2-9a8e-00000aacb35f&acdnat=1360663964_abbc8fd43c6ff547c477bb7648e5250d
Since this is a rather common pattern for URLs this is a problem.
The text was updated successfully, but these errors were encountered:
This is indeed a problem, and related to #83 . However, the NoseSoftware cannot know if the database content is already escaped or not and we certainly do not want to escape twice. Therefore the node needs to make sure itself to not deliver things that break validation. This can either be done in the database itself (make an escaped copy of the column in question) or in the models.py by a small method that applies the escape function to the field.
I agree - we cannot just escape by default. Sometimes even I as a database provider don't know what content a field has - e.g. a comment field for one piece of data. I can't rule out that somebody puts a series of ampersands there...
However, we could check whether the content of URL in a Source is already encoded. The escaping function used (xml.sax.saxutils.escape) seems to be rather intelligent. My workaround will be to unescape and escape all content for the URL field. This should leave all content in an escaped state behind.
Any datafield in the database that holds data that is captured with the regex '&.+$' breaks the output of all XSAMS files. This regex fits to all HTML entities e.g. ä. If one of these is outputted through regex, they are not escaped, therefore breaking most browsers and the validator (if it doesn't happen to be a html entity). Browsers expect a semicolon as the sixth character after the ampersand.
Testcase:
http://ideadb.uibk.ac.at/view/107/
The url field of this scan contains the following characters (within the link):
52fed736-74fc-11e2-9a8e-00000aacb35f&acdnat=1360663964_abbc8fd43c6ff547c477bb7648e5250d
Since this is a rather common pattern for URLs this is a problem.
The text was updated successfully, but these errors were encountered: