Oct 20, 2020

Level 10

How to refactor large repositories?

I have a repository with 8 methods for create/update calls and 10 methods for retrieving different kind of results. Its a repository for a tree model and the get methods are something like getLeftSubtree getRightSubtree getFullSubtree, getUpline getSponsorUpline, .. things like that.

Now I feel the repository size is too large with its 18 methods. Do you have any recommendations how to refactor this? I am thinking about splitting the create/update and get methods, but I also think they belong in the tree-repository.

martinbean

5 years ago

Level 80

I am thinking about splitting the create/update and get methods, but I also think they belong in the tree-repository.

@elenktik Well, repositories shouldn’t be creating/updating things. Repositories are meant to be classes used to access entities in a collection-like manner.

As for the actual question, if you use a repository as a class to group methods for querying then by that very nature they’re going to grow in terms of number of methods. I’m not sure what solution you’re looking for? If you need methods and you put methods in a class, then that class is going to have a lot of methods.

You could put tree-related methods in a trait, but then that’s just sweeping methods to somewhere else for the sake of keeping your repository class under some arbitrary number of methods.

1 like

NielsNumbers

5 years ago

Level 10

@martinbean Thanks for your respond. I thought the purpose of the repository class is to be a layer between domain and data and therefor it is responsible for all CRUD tasks.

I just googled Laravel repository pattern and the first hit was for example https://asperbrothers.com/blog/implement-repository-pattern-in-laravel/ which defines an EloquentRepositoryInferface by a find and create method:

{
   /**
    * @param array $attributes
    * @return Model
    */
   public function create(array $attributes): Model;

   /**
    * @param $id
    * @return Model
    */
   public function find($id): ?Model;
}

In my case, I have a Tree model and a TreeRepository, so moving tree-related methods in a trait is like moving the complete repository in a train.

So if I understood you right, you would say let the repository just grow as much as it needs? Also, where do you usually put the logic for creating/updating things when you don't put them in a repository?

martinbean

5 years ago

Level 80

I thought the purpose of the repository class is to be a layer between domain and data and therefor it is responsible for all CRUD tasks.

@elenktik Nope. And just because someone’s implement a “repository” like that doesn’t mean it’s intended for that purpose either; it just means they’ve also misinterpreted what a repository is for. You pass criteria to repositories, and a repository returns objects that satisfy that criteria.

In terms of persisting data, there are various candidates: services and commands are two popular solutions. But then you’re just getting in the murky world of domain-driven design, which isn’t applicable for a lot of projects.

If you have a Tree model and a TreeRepository, then just keep your tree-related methods there. There’s nothing wrong if the repository contains many methods if all of those methods are for accessing different representations or sub-sets of trees.

NielsNumbers

5 years ago

Level 10

Yes ofc a singe opinion on the web is not the ultimate truth, just wanted to point out that I have seen such an implementation before. Was just the first hit on google, I have seen more of that.

Also I didn't find any good source for the definition of the repository pattern, if you have one, I would be happy to see it. I found a German wiki entry at https://de.wikipedia.org/wiki/Repository_(Entwurfsmuster) which claimes that it should contain CRUD:

Conceptually, the repository encapsulates the objects persisted by the data access layer and the access to them - regardless of whether they are stored in a database or made available via a web service (or otherwise)

I also found an entry on https://en.wikipedia.org/wiki/Domain-driven_design

Repository: Methods for retrieving domain objects should delegate to a specialized Repository object such that alternative storage implementations may be easily interchanged.

So at least according to Wikipedia and the random dude it seems that storage is part of the repository pattern.

Still thanks for your feedback.

rodrigo.pedra

5 years ago

Level 56

Reading both definitions I still think Repositories are meant for data retrieval, highlighting for example:

Repository: Methods for retrieving domain objects should delegate to a specialized Repository object such that alternative storage implementations may be easily interchanged.

EDIT: It states right at the beginning: "Methods for retrieving domain objects". Added this line as the bold face is not very "bold" inside the quote above

In this definition the storage is implied to the source of the domain objects, but it does not imply you should manipulate that storage through the repository.

A similar case could be made for the second quote, but as I can't read German I might have missed something from the original.

I like to have inserts/updates/deletes delegated to commands/services as @martinbean suggested. It is easier to have a read DB and a write DB, aligns better with the CQRS architecture, and make tests cleaner.

But at the end of the day choose whatever better fits your team's preferences.

Regarding on how to refactor a large code-base, I would ensure first I have a good set of tests in place covering most use-cases. And then will try to identify low-hanging fruits and then apply SOLID principles to clean things up.

Laracasts recently started a series on refactoring that might come in handy:

https://laracasts.com/series/refactoring-workshops

It still only has one episode but it is already a very good one to watch.

1 like

NielsNumbers

5 years ago

Level 10

@martinbean , @rodrigo.pedra I just read in from Martin Fowler's book Patterns of Enterprise Application Architecture (p.322):

A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declarative and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes.

So looks like some developer with good reputation think of a repository as using inserts and update.

It also makes sense to me, because the repositories task should be to be a layer between the data model and your business logic. If you only read, than you end up doing the inserts somewhere in your service classes and you only have a 50% layer.

PS: Thanks for the link, its a nice episode.

1 like

rodrigo.pedra

5 years ago

Level 56

@elenktik thanks for the quote, that book is in my read-list but I didn't get to it yet.

I think this is more a matter of concepts definition.

My notion of repositories to read and commands (I prefer calling them actions) to write comes from the CQRS architecture pattern:

https://en.wikipedia.org/wiki/Command%E2%80%93query_separation#Command_query_responsibility_segregation

I grew to adopt this pattern as I found it easier to compose actions/commands and test them in isolation.

For example, this is an excerpt from a project I worked on previously:

<?php

namespace App\Actions\Users;

use App\DTO\SystemUserDTO;
use App\Models\User;
use Illuminate\Database\ConnectionInterface;

class CreateSystemUser
{
    private ConnectionInterface $connection;
    private CreateUser $userCreator;
    private UpdateSystemUser $userUpdater;
    private GrantSystemAccess $accessGranter;

    public function __construct(
        ConnectionInterface $connection,
        CreateUser $userCreator, // another action
        UpdateSystemUser $userUpdater, // another action
        GrantSystemAccess $accessGranter // another action
    ) {
        $this->connection = $connection;
        $this->userCreator = $userCreator;
        $this->userUpdater = $userUpdater;
        $this->accessGranter = $accessGranter;
    }

    public function perform(SystemUserDTO $dto): User
    {
        $user = User::query()->firstWhere('email', $dto->email);

        if ($user) {
            // defer to more specialized action
            return $this->userUpdater->perform($user, $dto);
        }

        return $this->connection->transaction(function () use ($user, $dto) {
            $user = $this->userCreator->perform($dto);

            return $this->accessGranter->perform($dto->role, $user);
        });
    }
}

You can see here I am composing different actions to build a more specific one.

In this code, for example, the GrantSystemAccess class is responsible for sending a welcome email if it is a new user or if the user was granted a new role.

When testing, I can test the actions that actually interact with the database separately, and test this is one using mocks from the others and asserting the expected methods were called.

When composing actions like this, the constructor dependencies are resolved from the container. So it is easy to swap implementation of dependencies.

In a controller I call this action like this:

public function store(SystemUserRequest $request, CreateSystemUser $action): RedirectResponse
{
    $user = $action->perform($request->dto());

    // removed flash message and simplified redirect details 
    // to preserve client's info
    return redirect()->back(); 
}

Also some actions that are composed here are used directly in other parts of the system, for example from an API webhook that is only called for existing users and calls GrantSystemAccess (and its counter-part RevokeSystemAccess) directly.

Another example is from a CSV import ran from an artisan command, which calls a different composition of actions.

On the other hand, I have Repository classes that only care about data retrieval and adding constraints to the SELECT queries (where clauses, group by's, having clauses, etc.)

Of course I could have all these actions on a single "Repository" (quoted to differentiate the concept) class. But, in my opinion, having the commands separated allows for more composable code.

In the end of the day, it is a matter on finding the architecture that "clicks" best for you, and allows you to understand and maintain the code in the best way.

Hope this helps somehow.

And I apologize for not knowing before hand the same name was used for a different concept, and implying your concept might be wrong. My intention was the best.

Have a nice day =)

rodrigo.pedra

5 years ago

Level 56

One final comment about my code exceprt. If at the time I wrote that code PHP 8 was available, the code would be much shorter using constructor property promotion.

rodrigo.pedra

5 years ago

Level 56

I know I said my last comment was the final one about the code, but just to clarify one thing:

This project had a requirement to different roles for different modules, and users could have one role for each module.

So there is also a CreateAccountUser, and others. That is why there is a lower level CreateUser, which is shared between the actions specific to a module.

rodrigo.pedra

5 years ago

Level 56

One thing I noted by re-reading the thread is the we derailed from your original question.

In another project I worked with I had a tree-like data structure for managing a large tree.

I had a TreeBuilder that built a Tree object which was basically to hold Node instances. I used a TreeNode interface and had three implementations: Node (which was basically where the data lied), RootNode (a marker for the root node to facilitate traversal) and NullNode (to model dead ends and avoid null checks when traversing the tree).

The TreeNode interface had a changeParent and changeChildren methods that allowed me to manipulate the tree nodes in memory directly in the node instances.

The Tree object was just a proxy to the RootNode and let apply some high-level operations (filter, map, traversal, etc.) from the RootNode.

I know it seems to be breaking the CQRS pattern I just explained on my last post, but I as it was basically a data-structure code (that held data in-memory), no storage related code was present in the nodes, nor in any tree-manipulation methods.

Every node implemented the IteratorAggregate interface (from PHP itself) and I used a RecursiveIteratorIterator (I know it is a bad name, but it is a actually PHP built-in class) to traverse the tree in a linear fashion.

So after manipulating the tree in memory, I then had action classes that traversed/iterated the tree and actually persisted the changes to the database.

So basically the TreeBuilder read data from database and built the data-structure, and action classes were responsible to persist the tree back to the database. The tree itself was a data-structure unaware of any storage specific code.

I unfortunately can't share the code for this, as it was done under a contract and I would have to ask the client for permission to share it.

But it was a very nice implementation, I used generators for building the tree from the database, and learned a lot about iterators when writing the traversal code.

Hope this helps, somehow, with your original problem.

rodrigo.pedra

5 years ago

Level 56

Just for the record, the link to the RecursiveIteratorIterator class:

https://www.php.net/manual/en/class.recursiveiteratoriterator.php

One project from which I learned how to work with it was this:

https://github.com/sebastianbergmann/php-code-coverage/blob/3dd5d2deb8bbfd0e13c221330d9170689897393e/src/Node/Directory.php#L89-L95

The class linked above has a lot of specific code to its use-case that I ended up not using, but it was a good starting point at the time.

Please or to participate in this conversation.