troelsjohnsen's avatar

How to sync data from (sports) API with local database

Hi Everybody

I am currently working on integrating with a football API to provide data about matches, venues, competitions etc. The API will provide data for my own database, which can also be populated by manually entered data.

I am struggling a bit with how to match data coming from the API with data in my own database?

I need to be able to check whether the entity coming from the database already exists in the database. In this case three scenarios exist.

  1. The entity doesn't exist
  2. The entity exist and have been manually created
  3. The entity exist and have been synced from the API

To be able to handle scenario 1 and 2 I need to have a way to lookup data - I guess I need to identify fields for each entity that could be used to identify them.

To handle scenario 3 I am thinking about creating an alias table to connect local ID's with API id's.

Anyone have any tips, ideas, experiences og ressources to share how to handle this?

0 likes
7 replies
bugsysha's avatar

I do not see the value in knowing how it got to the database. For me I see only 2 cases. First where it does exist, and second where it doesn't. If you can say more than that might help for better ideas on how to do this.

troelsjohnsen's avatar

@bugsysha

I need to be sure, that the same entity (match, player, venue etc) is not added to the database multiple times.

If I add a match to the database and I sync with the API at a later time, I would want it to update the existing match instead of creating a new entry.

It is also a realistic scenario that we would be working with multiple providers - or change provider.

I think I'll go for a solution in which I create a table for aliases, so a model can have alternative ID's. And then I'll have to find a way to match data referring to the same entity but having different ID's.

For a match this could the datetime of the match start combined with the team names and for a play it could be the birthday combined with first and last name.

troelsjohnsen's avatar

Perhaps this as total overenginering, but something like this is what I am currently going after :-)

Scenario 1 - Match records from the API with unsynced records in the local database

$match = Match::where( DB::raw('md5(concat("db_matchstart","db_hometeam","db_awayteam")') , md5($api_matchstart.$api_hometeam.$api_awayteam) );

Scenario 2 - Match records from the API with synced records in the local database

$match = Match::whereHas('aliasable', function (Builder $query) {
   $query->where('alias', '=', $api_match_id)->where('provider','api_provider');
})->get();     
bugsysha's avatar

If you want to be sure that there are no duplicates than you can use something like

Model::firstOrCreate([
    // first array contains items that identify unique record
    // from your example above it will be probably
    'db_matchstart' => $api_matchstart,
    'db_hometeam' => $api_hometeam,
    'db_awayteam' => $api_awayteam,
], [
    // and then here are items that do not identify unique record
    'field1' => $field1,
    'field2' => $field2,
]);
troelsjohnsen's avatar

The implementation I am pursuing is the following - just leaving it here for reference.

I have created my own version of updateOrCreate

function updateOrCreate($class, $alias, $fields, $data)
    {
        $model = $class::whereHas('aliases', function($query) use ($alias){
            $query->where('alias', $alias)->where('provider','[name of provider]');
        })->first();

        if(!$model){
            $model = $class::firstOrCreate($fields);

            $model->aliases()->firstOrCreate([
                'alias' => $alias,
                'provider' => '[name of provider]'
            ]);
        }

        $model->fill($data);
        $model->save();

        return $model;
    }
  1. The function takes a class (Match, Team, Stadium etc.), an alias, some attributes and some data.
  2. It then checks whether this entity exists in the database using the ID (alias) from the API - which is configured as a polymorphic one-to-many
  3. If a model is not found using the alias, it then makes a call to firstOrCreate using a configurable set of attributes and then attaches the alias to that model for future reference.
  4. And last it updates the model using the new data and returns the saved model.

I like this implementation as this is supports having multiple providers as long as we can match models using attributes.

@bugsysha Thanks for reminding me of firstOrCreate :-)

bugsysha's avatar

No problem. Strange use case you have there. You know best what it should do 😉

1 like
rogerwick's avatar

A practical way to handle syncing from a sports API with your local database is to rely on stable unique identifiers from the API (such as match_id, team_id, venue_id) and map them directly to fields in your database so you can quickly check for existence before inserting or updating. If the API doesn’t guarantee consistent IDs, then create a matching layer using a combination of attributes (like team name + competition + date) and normalize the data to avoid duplicates caused by formatting differences. For your three scenarios, treat them as insert (new record), update (existing record with changed data), or ignore (no changes), and consider keeping a last_synced timestamp or version hash so you only process what’s actually new or modified instead of rechecking everything each time....

Please or to participate in this conversation.