Menu

#172 Problem with the remove function

open
nobody
None
2025-11-06
2019-06-22
No

This code is supposed to remove the a tag within the span within each div, but this fails on the second iteration ("ARGH!!") Is there something wrong with the remove function?

  $html = str_get_html("
<div>
  <span>
    <a>DELETE</a>KEEP
  </span>
</div>

<div>
  <p>HELLO</p>
  <span>
    <a>DELETE</a>KEEP
  </span>
</div>
");

  foreach($html->find("div") as $item) {
    $foo = $item->find("span", 0);
    if($a = $foo->find("a", 0))
      $a->remove();
    else
      echo "ARGH!!\n";
  }

Discussion

  • LogMANOriginal

    LogMANOriginal - 2019-06-22

    Is there something wrong with the remove function?

    No... slowly walks away

    Dammit, I was hoping to find a solution before anyone noticed.

    The remove function currently takes out elements from an array, which obviously makes it impossible to use indices to iterate over said array, unless you re-index the array using array_values.

    However, unfortunately each node also stores the index of the closing element from the array, which means we can't simply use array_values as all nodes must be updated as well or we need a different solution for indexing.

    This will probably take a while to find a solution for, sorry.

     
  • Anonymous

    Anonymous - 2019-06-22

    Noted. So remove() must be used with caution, or not at all, until further notice.

     
  • LogMANOriginal

    LogMANOriginal - 2019-06-22

    Yes, sorry about that. I'm currently working on getting better insight over the DOM tree to figure out a better solution. Here is an example:

    The document

    <body>
    <p>Hello, World!</p>
    <p class="remove">This <em>will be</em> removed</p>
    <p>PHP Simple HTML DOM Parser</p>
    </body>
    

    The tree before removing .remove

    [00000] * root
            |\ 
    [00001] | * body
            | |\ 
    [00002] | | * p
            | | |\ 
    [00003] | | | * text: "Hello, World!"
            | | |/ 
    [00004] | | * p
            | | |\ 
    [00005] | | | * text: "This"
    [00006] | | | * em
            | | | |\ 
    [00007] | | | | * text: "will be"
            | | | |/ 
    [00008] | | | * text: "removed"
            | | |/ 
    [00009] | | * p
            | | |\ 
    [00010] | | | * text: "PHP Simple HTML DOM Parser"
    

    The tree after removing .remove

    [00000] * root
            |\ 
    [00001] | * body
            | |\ 
    [00002] | | * p
            | | |\ 
    [00003] | | | * text: "Hello, World!"
    [00009] | | | * p
            | | | |\ 
    [00010] | | | | * text: "PHP Simple HTML DOM Parser"
    

    You can see that nesting and indices (numbers on the left) are incorrectly aligned. (the p after text: "Hello, World!" is one level too low. This totally breaks find operations after removal.

    You can still use the remove function but you need to save and load afterwards until this is fixed.

     
  • LogMANOriginal

    LogMANOriginal - 2019-06-25

    Your issue as well as a few others that weren't detected before are fixed in master. Let me know if you find more issues related to this.

     
  • Anonymous

    Anonymous - 2019-06-26

    I can confirm that my enclosed example doesn't scream "ARGH!!" anymore. Sounds like you had a good understanding of the problem. I'll let you know if I run into similar issues.

     
  • atomeek

    atomeek - 2020-03-23

    Hi, I faced the same issue ( 1.9-dev branch ).
    I fixed it locally and then compared it with the master branch.
    As I see it is fixed with array_intersect() but there is one more array_slice() call that will cause the same problems in some cases (for selectors with ~).
    So it's needed to be changed too. :)

     

    Last edit: atomeek 2020-03-23
    • LogMANOriginal

      LogMANOriginal - 2020-05-21

      Can you provide an example that reliably fails in current master and is fixed by this change?
      I would like to add more tests for this to ensure it doesn't break in the future.

       
  • Anonymous

    Anonymous - 2025-11-06
    Post awaiting moderation.

Log in to post a comment.

MongoDB Logo MongoDB