thebookcollector's avatar

Need help with fuzzy matching book titles

Background:

I have 2 tables in my book collecting database (small examples below). One is the main table called (‘books’) with all of the official book titles. The second table (‘auctions’) is a massive collection of auction data with user generated titles that will often not directly match the exact correct book title.

‘books’

The Great Gatsby
Mody Dick
Don Quixote
Ulysses
etc…

‘auctions’

F. Scott Fitzgerald - The Great Gatsby
Moby Dick by Herman Melville
One of a kind book - Moby Dick
Don Quixote (gold spine)
The Great Gatsby (1925)
Rare misprint of Ulysses
The Great Gatsby First Printing
Great Gatsby new leatherbound
Mobby Dick Book from 1800s
Signed copy - The Great Gatsby
etc…

Goal:

My ultimate goal is to have a new table where these auctions correctly map to the actual book entries in the ‘books’ table so that users can go to a book page in the app and see what the book has sold for at various auctions. I realize that this is going to be a majorly manual process, so I’m trying to make up a blade page where I can fairly quickly go through all of the auction records and associate them with the correct book.

Approach:

The solution I’m attempting is to have a page where it basically shows me titles from the ‘books’ table (maybe 20 at a time) and for each title, it shows me all of the titles from ‘auctions’ that may be a match. I can then quickly look through them and click a button that says “yes, all of these are for this book title” or skip some auction items that don’t match so that I can then click the “yes, these match now” button. I’m currently using https://github.com/caneara/quest for fuzzy matching and it seems to be working well when I hard code a book title to test it (‘done’ is just a column I’m using to keep track of auction items I’ve already reviewed).

eg.

'auctionsToSort' => Auction::whereFuzzy(‘title’, 'The Great Gatsby')->where('done', '=', 0)->having('_fuzzy_relevance_', '>',  19)->limit(200)->get()

This returns ~29 auctions that have titles that are close to “The Great Gatsby”.

eg.

F. Scott Fitzgerald - The Great Gatsby
The Great Gatsby (1925)
The Great Gatsby First Printing
Great Gatsby new leatherbound
Signed copy - The Great Gatsby
etc…

Question:

I’m having trouble figuring out how to craft the query (eloquent or otherwise) that will grab the first 20 books from ‘books’ *that have matches in the ‘auctions’ table* along with all of the applicable matches from ‘auctions’ to display on my blade. Skip the book if there are no matches in 'auctions'. I’ll be going through the entire ‘books’ table, just thinking of limiting to 20 at a time just so the page doesn’t get massive.

eg.

*Book*
The Great Gatsby

*Auctions*
F. Scott Fitzgerald - The Great Gatsby
The Great Gatsby (1925)
The Great Gatsby First Printing
Great Gatsby new leatherbound
Signed copy - The Great Gatsby

<button>Correct?</button>

*Book*
Mody Dick

*Auctions*
Moby Dick by Herman Melville
One of a kind book - Moby Dick
Mobby Dick Book from 1800s

<button>Correct?</button>

I can play with the _fuzzy_relevance_ to get the desired results and I have a plan for the rest of the logic, just need help with the query(s) to basically getting the above output.

0 likes
1 reply
thebookcollector's avatar

Trying something like this as a starting point but only getting errors so far...

'auctionsToSort' => Book::orderBy(‘title’)
->rightJoin('auctions', function ($join) {
	$join->whereFuzzy(‘auctions.title’, ‘books.title’);
})
->where('auctions.done', '=', 0)->having('_fuzzy_relevance_', '>',  19)->limit(200)->get(),

Please or to participate in this conversation.