Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

thebigk's avatar
Level 13

str_replace() not working - What are my options?

I'm trying to perform find and replace for problematic <a href tags in my posts. I've created a links table. The following code handles ~70% of the links as expected; but skips the rest. Basically, str_replace() refuses to handle some of the links and I can't determine why. My code is as follows -

<?
$link_rows = DB::table('links')->orderBy('id')->get();
    
    foreach ($link_rows as $link_row)
    {
        // For each of the links_row, we'll get the post from database. We'll process only 1 link from each post at a time. This means
        // we'll have to fetch the same post multiple times for multiple links. But that's ok. This is a one-time job.
        $post = Thread::where('id', $link_row->post_id)->first();
        $post_body = $post->data['body'];
        $updated_body = '';
        $domDocument = new \DOMDocument();
        $domDocument->loadHTML($post_body, LIBXML_NOERROR);

        // Pull all the links in the post body
        $old_links = $domDocument->getElementsByTagName('a');

        foreach ($old_links as $old_link)
        {
            // We now are dealing with one link at a time among all links found in the post. We'll only update the link in link_row->href
            // and revisit the post in the next $link_row.

            // The `links` database has `element` column with all the formatted elements, where the element attributes are sorted in alphabetical order.
            // In order to match the links, we'll need to create a formatted and 'as_it_is' version of the <a href tag. 
            $old_link_formatted = $old_link->C14N(); // Sorts the element attributes in ascending order
            $old_link_as_it_is = $domDocument->saveHTML($old_link); // Preserves the element attributes

            // Match the formatted link with our $new_link->element
            if($old_link_formatted == $link_row->element)
            {
                // We have found a matching link. Now we'll check if the status of the $link_row is 200. If yes, we'll perform the
                // replacement. Otherwise, we'll simply delete the link from the post
                if($link_row->status == 200)
                {
                    $updated_body = str_replace($old_link_as_it_is, $link_row->replacement_link, $post_body, $did_replace_work);
                    dd("Status 200: " . $did_replace_work);
                } else {
                    $updated_body = str_replace($old_link_as_it_is, '#-LINK-CLIPPED-#', $post_body, $did_replace_work);
                    dd("Status 200: " . $did_replace_work);
                }
                // If the replacement didn't work, we'll skip to the new row
                // Write code to update the database below.
            }
        }
    }
?>

I'm confused because even though the $old_link_formatted == $link_row->element works; str_replace() won't find the link in the text. Here's an example of the $old_link_as_it_is and $post_body obtained using dd() -

$old_link_as_it_is = "<a href="https://www.mindviewinc.com/Books/" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fwww.mindviewinc.com%2FBooks%2F&amp;hash=8b24c60bf18b7fdcd9e9d9dbaaef563a" rel="nofollow">Bruce Eckel's Free Electronic Books</a>";

$post_body =""" // app/Console/Commands/FinalReplace.php:68
<b>Re: prgraming with java</b><br />\n
<br />\n
I would also recommend &#039;Thinking in Java&#039; by Bruce Eckel available for free download at <a href="https://www.mindviewinc.com/Books/" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fwww.mindviewinc.com%2FBooks%2F&amp;hash=8b24c60bf18b7fdcd9e9d9dbaaef563a" rel="nofollow">Bruce Eckel&#039;s Free Electronic Books</a><br />\n
<br />\n
-Pradeep
"""

My best guess so far is that str_replace() has issues with text with quotes, special characters etc. I did try encoding entities with htmlentities() but the replacement didn't happen.

Have already spent 2 days over this; bue couldn't find a fix. Would really appreciate your suggestions.

0 likes
21 replies
Snapey's avatar

you seem to be vastly overcomplicating things?

a simple regex on the post body should suffice?

thebigk's avatar
Level 13

@Snapey - the part above if($old_link_formatted == $link_row->element) may look complex; but it's actually required. I can't do replacement without it.

I'm not comfortable with regex; and preg_match() may not be the right fit for the job. Could you suggest an approach, please?

thebigk's avatar
Level 13

Just to be more specific, I need help with the following -

if($old_link_formatted == $link_row->element)
            {
				// We have an a href element from database that matches with the one extracted from the text.
                    $updated_body = str_replace($old_link_as_it_is, $link_row->replacement_link, $post_body, $did_replace_work);
// But, the str_replace doesn't work because of some reason. $did_replace_work returns 0.
            }

As I mentioned, the str_replace() works only 70% of the times. Even when the = operator matches the two strings, str_replace() cant match it inside the text and replace it.

That's the exact issue I'm trying to solve. Would appreciate responses.

Sinnbeck's avatar

If I select the text in $old_link_as_it_is, and hit ctrl+f in the browser, it does not match the text in $post_body. So they are not the same

This is due to you not encoding special characters the same. So before comparison, you need to ensure they are the same

Example

Bruce Eckel's Free Electronic Books
Bruce Eckel&#039;s Free Electronic Book

You can use htmlspecialchars_decode()

thebigk's avatar
Level 13

@Sinnbeck - I think it's because of the "" or ' . I simply took a dump from the console. What is the right way to output the values? I tried using $this->info($old_link_as_it_is), but it outputs as a plain element.

I used the following to get the output -

$this->info($old_link_as_it_is);
$this->info('-----------------------');
$this->info($post_body);

The output is -

<a href="https://www.mindviewinc.com/Books/" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fwww.mindviewinc.com%2FBooks%2F&amp;hash=8b24c60bf18b7fdcd9e9d9dbaaef563a" rel="nofollow">Bruce Eckel's Free Electronic Books</a>
-----------------------
<b>Re: prgraming with java</b><br />
<br />
I would also recommend &#039;Thinking in Java&#039; by Bruce Eckel available for free download at <a href="https://www.mindviewinc.com/Books/" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fwww.mindviewinc.com%2FBooks%2F&amp;hash=8b24c60bf18b7fdcd9e9d9dbaaef563a" rel="nofollow">Bruce Eckel&#039;s Free Electronic Books</a><br />
<br />
-Pradeep
""" // app/Console/Commands/FinalReplace.php:71
<b>Re: prgraming with java</b><br />\n
<br />\n
I would also recommend &#039;Thinking in Java&#039; by Bruce Eckel available for free download at <a href="https://www.mindviewinc.com/Books/" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fwww.mindviewinc.com%2FBooks%2F&amp;hash=8b24c60bf18b7fdcd9e9d9dbaaef563a" rel="nofollow">Bruce Eckel&#039;s Free Electronic Books</a><br />\n
<br />\n
-Pradeep
"""
thebigk's avatar
Level 13

I think that's a good catch. My database has the text as Bruce Eckel&#039;s Free Electronic Books, while the link that I got from the database has it as Bruce Eckel's ....

How do I take care of the single-quote? Would appreciate suggestion.

Sinnbeck's avatar

@thebigk

You can use htmlspecialchars_decode()

So something like

$updated_body = str_replace($old_link_as_it_is, $link_row->replacement_link, htmlspecialchars_decode($post_body), $did_replace_work); 
thebigk's avatar
Level 13

@Sinnbeck

Had to put htmlspecialchars_decode($old_link_as_it_is) along with for $post_body to make it work. Will now run this for all the links and report back.

Thank you for this!

Sinnbeck's avatar

If it's solved, please mark a best answer to set the thread as solved

thebigk's avatar
Level 13

@Sinnbeck sorry, not solved. Still trying to figure out the right way to make replacements.

thebigk's avatar
Level 13

@Sinnbeck Appreciate your response. As discussed, I changed to -

$updated_body = str_replace(htmlspecialchars_decode($old_link_as_it_is), $link_row->replacement_link, htmlspecialchars_decode($post_body), $did_replace_work);

Using htmlspecialchars_decode allowed me to get more conversions. However, posts like the following still fail . Here's how a post link appears right in my database.

<a href=\"https:\/\/superblog.crazyengineers.com\" target=\"_blank\" class=\"externalLink ProxyLink\" data-proxy-href=\"proxy.php?link=http%3A%2F%2Fsuperblog.crazyengineers.com&amp;hash=9a7aac0cd57767be80d77678a66ad518\" rel=\"nofollow\">The Big K\u00e2\u20ac\u2122s Superblog\u00e2\u201e\u00a2 \u00bb Go Away! Demons Of Stupidity!<\/a>

However, when I fetch it and run it through htmlspecialchars_decode, I get -

<a href="https://superblog.crazyengineers.com" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fsuperblog.crazyengineers.com&hash=9a7aac0cd57767be80d77678a66ad518" rel="nofollow">The Big Kââ¬â¢s Superblogâ⢠» Go Away! Demons Of Stupidity!</a>

While the $post_body, when run through htmlspecialchars_decode gives me-

Let's get started with our blog ids. <br />
I'm sure many of us write blogs. So publish your blog links here. Let others read your blog &#128521;<br />
<br />
I'll set the ball rolling-<br />
<br />
Mine is available at :-<br />
<br />
<a href="https://superblog.crazyengineers.com" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fsuperblog.crazyengineers.com&hash=9a7aac0cd57767be80d77678a66ad518" rel="nofollow">The Big K’s Superblogâ„¢ » Go Away! Demons Of Stupidity!</a> &#128513;<br />
<br />
Yours?<br />
<br />
<b>-The Big K-</b>

The subtle difference is this:-

The Big Kââ¬â¢s Superblogâ⢠» Go Away! Demons Of Stupidity!
Vs
The Big K’s Superblogâ„¢ » Go Away! Demons Of Stupidity!

How do I take care of this?

thebigk's avatar
Level 13

The interesting part is - the same function outputs different text when run through htmlspecialchars_decode(). Not sure why. In general, the special characters seems to be a problem when running str_replace

Sinnbeck's avatar

@thebigk yeah it compares it strictly so you will need to normalize it. I would personally wrote a normalized function I run it all through so it's identical

Sadly I have never worked with a language with those special characters, so I'm not quite sure how to handle them.

thebigk's avatar
Level 13

@Sinnbeck Thanks. Why am I getting different output for htmlspecialchars_decode() for the link and post body?

Sinnbeck's avatar

@thebigk not currently at a computer but perhaps try utf8_decode()

Edit : it's deprecated. Use mb_convert_encoding($string, 'ISO-8859-1', 'UTF-8') perhaps (or Google similar functions)

thebigk's avatar
Level 13

@Sinnbeck - Yes, it's deprecated. But once I use mb_convert_encoding($string, 'ISO-8859-1', 'UTF-8'), can I save the post back to my database? Do I need to make further changes to encoding before saving to database?

Sinnbeck's avatar

@thebigk hard to say. I would need to dig into it for a while to get a proper overview of what works for me. If I were you I would start doing some testing when saving your data and see what works. Also make sure your database encoding is correct

thebigk's avatar
Level 13

@sinnbeck - I've been researching on this and it turns out that 99% of my problem can be solved if I take care of the single quotes and double quotes in my database. Allow me to explain. Here's how the post looks in my database -

{"body":"You may check this out   <a href=\"https:\/\/www.uptucsengineers.blogspot.com\" target=\"_blank\" class=\"externalLink ProxyLink\" data-proxy-href=\"proxy.php?link=http%3A%2F%2Fwww.uptucsengineers.blogspot.com&amp;hash=98aad47adf96ef7363be0e605ff60cdb\" rel=\"nofollow\">Engineering in UPTU... &#039;Aall Is Well&#039;<\/a>"}

The search-string looks like this -

<a href="https://www.uptucsengineers.blogspot.com" target="_blank" class="externalLink ProxyLink" data-proxy-href="proxy.php?link=http%3A%2F%2Fwww.uptucsengineers.blogspot.com&amp;hash=98aad47adf96ef7363be0e605ff60cdb" rel="nofollow">Engineering in UPTU... 'Aall Is Well'</a>"

The only trouble here is the anchor text: 'All Is Well' .

Upon searching further, I found that I could make use of

$post_body = htmlspecialchars($post_body, ENT_QUOTES);
$search_string = htmlspecialchars($old_link_as_it_is, ENT_QUOTES);

However, the output I get is this -

1. &#039;Aall Is Well&#039; 
2. &amp;#039;Aall Is Well&amp;#039;

I am not sure why is this &amp popping up and how can I fix it. Looks like htmlspecialchars is trying to encode & as &amp.

Would really appreciate if you could suggest a way to address this issue. Lost my thinking power already.

Please or to participate in this conversation.