you seem to be vastly overcomplicating things?
a simple regex on the post body should suffice?
Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.
I'm trying to perform find and replace for problematic <a href tags in my posts. I've created a links table. The following code handles ~70% of the links as expected; but skips the rest. Basically, str_replace() refuses to handle some of the links and I can't determine why. My code is as follows -
<?
$link_rows = DB::table('links')->orderBy('id')->get();
foreach ($link_rows as $link_row)
{
// For each of the links_row, we'll get the post from database. We'll process only 1 link from each post at a time. This means
// we'll have to fetch the same post multiple times for multiple links. But that's ok. This is a one-time job.
$post = Thread::where('id', $link_row->post_id)->first();
$post_body = $post->data['body'];
$updated_body = '';
$domDocument = new \DOMDocument();
$domDocument->loadHTML($post_body, LIBXML_NOERROR);
// Pull all the links in the post body
$old_links = $domDocument->getElementsByTagName('a');
foreach ($old_links as $old_link)
{
// We now are dealing with one link at a time among all links found in the post. We'll only update the link in link_row->href
// and revisit the post in the next $link_row.
// The `links` database has `element` column with all the formatted elements, where the element attributes are sorted in alphabetical order.
// In order to match the links, we'll need to create a formatted and 'as_it_is' version of the <a href tag.
$old_link_formatted = $old_link->C14N(); // Sorts the element attributes in ascending order
$old_link_as_it_is = $domDocument->saveHTML($old_link); // Preserves the element attributes
// Match the formatted link with our $new_link->element
if($old_link_formatted == $link_row->element)
{
// We have found a matching link. Now we'll check if the status of the $link_row is 200. If yes, we'll perform the
// replacement. Otherwise, we'll simply delete the link from the post
if($link_row->status == 200)
{
$updated_body = str_replace($old_link_as_it_is, $link_row->replacement_link, $post_body, $did_replace_work);
dd("Status 200: " . $did_replace_work);
} else {
$updated_body = str_replace($old_link_as_it_is, '#-LINK-CLIPPED-#', $post_body, $did_replace_work);
dd("Status 200: " . $did_replace_work);
}
// If the replacement didn't work, we'll skip to the new row
// Write code to update the database below.
}
}
}
?>
I'm confused because even though the $old_link_formatted == $link_row->element works; str_replace() won't find the link in the text. Here's an example of the $old_link_as_it_is and $post_body obtained using dd() -
$old_link_as_it_is = "<a href="https://www.mindviewinc.com/Books/" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fwww.mindviewinc.com%2FBooks%2F&hash=8b24c60bf18b7fdcd9e9d9dbaaef563a" rel="nofollow">Bruce Eckel's Free Electronic Books</a>";
$post_body =""" // app/Console/Commands/FinalReplace.php:68
<b>Re: prgraming with java</b><br />\n
<br />\n
I would also recommend 'Thinking in Java' by Bruce Eckel available for free download at <a href="https://www.mindviewinc.com/Books/" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fwww.mindviewinc.com%2FBooks%2F&hash=8b24c60bf18b7fdcd9e9d9dbaaef563a" rel="nofollow">Bruce Eckel's Free Electronic Books</a><br />\n
<br />\n
-Pradeep
"""
My best guess so far is that str_replace() has issues with text with quotes, special characters etc. I did try encoding entities with htmlentities() but the replacement didn't happen.
Have already spent 2 days over this; bue couldn't find a fix. Would really appreciate your suggestions.
Please or to participate in this conversation.