<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to bugs</title><link>https://sourceforge.net/p/htmlcleaner/bugs/</link><description>Recent changes to bugs</description><atom:link href="https://sourceforge.net/p/htmlcleaner/bugs/feed.rss" rel="self"/><language>en</language><lastBuildDate>Tue, 30 Sep 2025 16:40:11 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/htmlcleaner/bugs/feed.rss" rel="self" type="application/rss+xml"/><item><title>Whitespaces enclosed in their own tags are sometimes dropped</title><link>https://sourceforge.net/p/htmlcleaner/bugs/242/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Consider the following HTML text:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;hello&lt;span class="nt"&gt;&amp;lt;strong&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/strong&amp;gt;&lt;/span&gt;world
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When using htmlcleaner-gui-2.29 on a file with this content, the output will be:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="utf-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;head&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;helloworld&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That is, the whitespace enclosed in the &lt;code&gt;strong&lt;/code&gt; tag is dropped. In my Java application, the space is dropped from the tree likewise.&lt;/p&gt;
&lt;p&gt;(It is worth noting that in some constellations, the rule of which I could not quite determine, the htmlcleaner-gui-2.29 output will contain a newline instead of the space because it chooses a different formatting rule. Unfortunately I could not determine a good way of resolving the issue yet.)&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Fynn Godau</dc:creator><pubDate>Tue, 30 Sep 2025 16:40:11 -0000</pubDate><guid>https://sourceforge.netfb7bb76ddb42f4a7f50ecb4f20a71136f4ec0865</guid></item><item><title>#227 Running out of memory cleaning HTML</title><link>https://sourceforge.net/p/htmlcleaner/bugs/227/?limit=25#5161</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hi &lt;a class="user-mention" href="/u/scottwilson/profile/"&gt;@scottwilson&lt;/a&gt;, this one has been fixed as a result of the changes to prevent stackoverflow in 2.29, so it can be closed. &lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Richard Morley-Smith</dc:creator><pubDate>Thu, 05 Jun 2025 13:56:23 -0000</pubDate><guid>https://sourceforge.netc23f38d1eb134b877c1102e445cf893b55b22482</guid></item><item><title>#241 Wrong children of dl incorrectly wrapped in div</title><link>https://sourceforge.net/p/htmlcleaner/bugs/241/?limit=25#0f94</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I forgot to mention, this is with HTML 5 tag definitions, with HTML 4 the example input is kept as-is.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Hamann</dc:creator><pubDate>Fri, 19 Apr 2024 09:18:18 -0000</pubDate><guid>https://sourceforge.net6e77ba2ecc71fc1a4900c904289932ea6cb91875</guid></item><item><title>Wrong children of dl incorrectly wrapped in div</title><link>https://sourceforge.net/p/htmlcleaner/bugs/241/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;When cleaning a dl tag that contains forbidden children like p or br, they are wrapped in a div now since version 2.28. This is wrong, when a div is a child of a dl, it &lt;a class="" href="https://html.spec.whatwg.org/multipage/grouping-content.html#the-div-element" rel="nofollow"&gt;only allows&lt;/a&gt; "one or more dt elements followed by one or more dd elements, optionally intermixed with script-supporting elements.". Removing div as preferred child again doesn't fully fix this, for example the following example shows that plain text is still kept as child which also isn't allowed:&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dl&amp;gt;&amp;lt;p&amp;gt;&lt;/span&gt;Paragraph1&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&amp;lt;dt&amp;gt;&lt;/span&gt;Term&lt;span class="nt"&gt;&amp;lt;/dt&amp;gt;&amp;lt;br&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&amp;lt;dd&amp;gt;&lt;/span&gt;Definition&lt;span class="nt"&gt;&amp;lt;/dd&amp;gt;&amp;lt;p&amp;gt;&lt;/span&gt;Paragraph2&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&amp;lt;/dl&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;is cleaned as &lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;p&amp;gt;&amp;lt;br&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;dl&amp;gt;&lt;/span&gt;Paragraph1&lt;span class="nt"&gt;&amp;lt;dt&amp;gt;&lt;/span&gt;Term&lt;span class="nt"&gt;&amp;lt;/dt&amp;gt;&amp;lt;dd&amp;gt;&lt;/span&gt;Definition&lt;span class="nt"&gt;&amp;lt;/dd&amp;gt;&lt;/span&gt;Paragraph2&lt;span class="nt"&gt;&amp;lt;/dl&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael Hamann</dc:creator><pubDate>Fri, 19 Apr 2024 09:17:16 -0000</pubDate><guid>https://sourceforge.netdf5e837df400a4c7159e2a5f3842b6dd4a346b9c</guid></item><item><title>Behaviour on unknown tags depends on capitalization of letters</title><link>https://sourceforge.net/p/htmlcleaner/bugs/240/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Since version 2.19, the behaviour (without any behaviour modificatoins) on unknown tags is different depending on letter capitalization. This was not the case in versions 2.18- and is very counterintuitive.&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;p&amp;gt;&amp;lt;atag&amp;gt;1&amp;lt;/atag&amp;gt;&amp;lt;b&amp;gt;2&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;"&lt;/span&gt;
&lt;span class="n"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HtmlCleaner&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="na"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getElementsByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"p"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="na"&gt;childTagList&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// [atag, b]&lt;/span&gt;

&lt;span class="c1"&gt;// Changed only atag -&amp;gt; aTag&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;p&amp;gt;&amp;lt;aTag&amp;gt;1&amp;lt;/aTag&amp;gt;&amp;lt;b&amp;gt;2&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;"&lt;/span&gt;
&lt;span class="n"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HtmlCleaner&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="na"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getElementsByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"p"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="na"&gt;childTagList&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// [aTag]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Please fix it, so that behaviour on these two samples is consistent.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mikhail Dvorkin</dc:creator><pubDate>Wed, 10 Apr 2024 12:18:04 -0000</pubDate><guid>https://sourceforge.neta19863e45e1d0899bdb99c185c293cc4b8f810e9</guid></item><item><title>#239 CDATA added for any kind of scripts even for application/json ones</title><link>https://sourceforge.net/p/htmlcleaner/bugs/239/?limit=25#b441</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I tried to work on that issue, I think I actually made too much changes: in particular I saw that XmlSerializer#dontEscape is used both for knowing if the content needs to be escaped and to know if CDATA should be added, which is a problem here as we still don't want to escape the content even without a CDATA. &lt;br/&gt;
So I think same problem might apply to DomSerializer, in which case my code is probably wrong and I might miss adding a unit test somewhere. &lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Simon Urli</dc:creator><pubDate>Thu, 28 Mar 2024 17:01:33 -0000</pubDate><guid>https://sourceforge.net204474bc7bf1ba70866e619cb39a119c2651267e</guid></item><item><title>CDATA added for any kind of scripts even for application/json ones</title><link>https://sourceforge.net/p/htmlcleaner/bugs/239/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Right now the API for adding CDATA only allow to define the tags where to add them, and by default it's using script and style. However, it's a problem for cleaning application/json scripts, as it produces invalid JSON: CDATA is indeed produced with a comment and JSON cannot contain comments. &lt;br/&gt;
Ideally the API should be able to define filters for ignoring where specifically to add CDATA. Now by default it should probably ignore any script tags that targets application/json type. &lt;/p&gt;
&lt;p&gt;See also: &lt;a href="https://jira.xwiki.org/browse/XCOMMONS-2487" rel="nofollow"&gt;https://jira.xwiki.org/browse/XCOMMONS-2487&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Simon Urli</dc:creator><pubDate>Thu, 28 Mar 2024 16:13:28 -0000</pubDate><guid>https://sourceforge.netd9d374bd1b7eae7a40764e6af87eb7d8ed7937b0</guid></item><item><title>Various tags incorrectly not marked as phrasing content in HTML5</title><link>https://sourceforge.net/p/htmlcleaner/bugs/238/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hello, &lt;/p&gt;
&lt;p&gt;I'm currently working on upgrading XWiki to use HtmlCleaner 2.29 and it seems &lt;a href="https://sourceforge.net/p/htmlcleaner/bugs/228/"&gt;https://sourceforge.net/p/htmlcleaner/bugs/228/&lt;/a&gt; has been fixed without taking into account Michael Hamann's comment here: &lt;a href="https://sourceforge.net/p/htmlcleaner/bugs/228/#442b"&gt;https://sourceforge.net/p/htmlcleaner/bugs/228/#442b&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So right now it seems that the following tags are not properly handled as phrasing content: &lt;br/&gt;
* data&lt;br/&gt;
* embed&lt;br/&gt;
* iframe&lt;br/&gt;
* img&lt;br/&gt;
* math&lt;br/&gt;
* object&lt;br/&gt;
* picture&lt;br/&gt;
* q&lt;br/&gt;
* template&lt;br/&gt;
* video&lt;/p&gt;
&lt;p&gt;Note that we're taking that list of phrasing tag from the spec here &lt;a href="https://html.spec.whatwg.org/#phrasing-content-2" rel="nofollow"&gt;https://html.spec.whatwg.org/#phrasing-content-2&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You can check the unit test we're using to spot issues there: &lt;a href="https://github.com/xwiki/xwiki-commons/blob/xwiki-commons-15.10.6/xwiki-commons-core/xwiki-commons-xml/src/test/java/org/xwiki/xml/internal/html/HTML5HTMLCleanerTest.java#L148-L184" rel="nofollow"&gt;https://github.com/xwiki/xwiki-commons/blob/xwiki-commons-15.10.6/xwiki-commons-core/xwiki-commons-xml/src/test/java/org/xwiki/xml/internal/html/HTML5HTMLCleanerTest.java#L148-L184&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Simon Urli</dc:creator><pubDate>Fri, 01 Mar 2024 09:36:47 -0000</pubDate><guid>https://sourceforge.net9890b2379a78af6b4843c41e8d1c7c4d51e7a714</guid></item><item><title>#228 svg incorrectly not marked as phrasing content in HTML5</title><link>https://sourceforge.net/p/htmlcleaner/bugs/228/?limit=25#6b3c</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;&lt;a class="user-mention" href="/u/scottwilson/profile/"&gt;@scottwilson&lt;/a&gt; it seems you forgot to close that one: I can see a commit related to it before release 2.28, see &lt;a href="https://sourceforge.net/p/htmlcleaner/code/595/"&gt;https://sourceforge.net/p/htmlcleaner/code/595/&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Simon Urli</dc:creator><pubDate>Fri, 01 Mar 2024 09:29:40 -0000</pubDate><guid>https://sourceforge.net84dcb329be6ac26b16ce9bf69b30ba7f650ea723</guid></item><item><title>#94 General suggestion: copy mutable field values / arguments instead of returning / using them directly</title><link>https://sourceforge.net/p/htmlcleaner/bugs/94/?limit=50#57ea</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Sure, let me see what I can do.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Dave</dc:creator><pubDate>Wed, 21 Jun 2023 03:00:19 -0000</pubDate><guid>https://sourceforge.net44f93331f56e659496b67c8ae77b18f39bd2b559</guid></item></channel></rss>