Published 10 months ago by mathewparet
I have a requirement to identify similar "news" articles from among say 1000s. I have all the 1000 articles saved in a table with (title, description, date published (might vary -/+ one day).
Ideally, I am trying to identify if a particular news is reported in multiple news sources and if yes, club them together.
Is there a way I can accomplish this without using AI?
I download and save normal RSS feeds into a table (I save title and description). Based on the data stored in these fields, I need to identify if there is a duplicate entry for the news (not exact duplicate record).
For example, source a reports "Dog landed on moon for the first time". Source b reports "Crown, a dog, landeds on the moon". I need to identify that both these are the same news. How do I do that!
I can give you a rough idea to do that. If you need to identify similar type of text, than you can use
similar_text() php function. Than you set a benchmark that how many percentage you will allow. If similarities is more than 50%, than you can say that is a similar post.
Of course, there are few other ways to solve this one.