mibu31's avatar

Page slugs and nesting

Hi there,

Could someone please point me in the right direction for dealing with nested pages and slugs. eg. "about-us/meet-the-team/steve-o"

Unfortunately as soon as the number of pages increase/depth of the nested pages, queries can increase quite drastically!

The two routes I have in mind are: 1. Use "parent_id" and recursively loop through parents to generate the slug and to implement caching. 2. Use nested sets.

Could someone who's "been there done that" with a variety of methods please advise me on what worked best for them?

Thank you.

0 likes
17 replies
zefman's avatar

I'de be interested in this too, its a tricky topic!

pmall's avatar

Are all the page of your website really dynamic ? In about-us/meet-the-team/steve-o, seems like only the name of the team member is dynamic.

And I don't think you will have url with hundreds of nested slugs, three queries will not kill your application. You can cache them too.

mibu31's avatar

Sorry, I didn't give a very good example :( Let's just imagine a scenario where all the pages are dynamic and therefore all slugs are changeable.

bashy's avatar

CMS based pages

In CodeIgniter, I use a route caching system to cache the links to controllers and branch IDs so they're all mapped without needing to do queries to the database. Could use something like it to save having to do some query to work out all the possible nested routes.

Example, this is generated by the function below

$route["services/data-management"] = "defaultSublevelController/index/15";
$route["services/applications-support"] = "defaultSublevelController/index/16";
$route["legal/terms-conditions"] = "defaultController/index/5";
$route["legal/privacy-policy"] = "defaultController/index/6";
$route["about"] = "defaultController/index/13";
function cache_routes()
{
    $branches = new Branch();

    // order by left_id for page ordering
    $branches->order_by('left_id', 'ASC')->get();

    $data[] = $data2[] = "<?php";

    $data2[] = "\nreturn\narray(";

    foreach ($branches as $branch)
    {
        // the uri path to match
        if ($branch->id == 1)
        {
            $uri = 'default_controller'; // CodeIgniter default route
        }
        elseif ($branch->id == 2)
        {
            $uri = '404_override'; // CodeIgniter 404
        }
        elseif ($branch->level <= 1)
        {
            $uri = $branch->url_title;
        }
        else
        {
            $path = $branch->get_clone()->get_path()->all_to_single_array('url_title');

            // remove the home page
            array_shift($path);

            $path = implode('/', $path); // full url including 'home/'

            $uri = $path;
        }

        // the controller to load
        $parent = $branch->get_clone()->get_parent();

        // determine the controller to use. If a controller is set, use it
        if ( ! empty($branch->controller))
        {
            $destination = $branch->controller;
        }
        // otherwise, if the parent branch has a child controller, use it
        elseif ( ! empty($parent->child_controller))
        {
            $destination = $parent->child_controller;
        }
        // fallback if no controller is found
        else
        {
            $destination = 'default_controller';
        }

        $data[] = '$route["' . $uri . '"] = "' . $destination . '/index/' . $branch->id . '";';

        $data2[] = $branch->id . ' => "' . $uri . '",';

    }

    $output = implode("\n", $data);

    $data2[] = ");";
    $output2 = implode("\n", $data2);

    $CI =& get_instance();

    $CI->load->helper('file');

    write_file(SHARED . "cache/frontend_routes.php", $output);
    write_file(SHARED . "cache/frontend_uri_paths.php", $output2);
}
harryg's avatar

OK I'm trying to implement this myself. The first step I take is to use the nested set package Baum.

Once installed ensure your model extends the Baum\Node and you have your migrations etc.

I have a route, which is the last route in my routes file so it doesn't get matched before more specific routes files like this:

Route::get('{slug1}/{slug2?}/{slug3?}', 'PagesController@show');

Currently it supports 3 levels of nesting but you can just add more optional parameters to achieve the max nesting level you want.

Next, implement the show method:

// PagesController.php
public function show()
{
    $slugs = collect(func_get_args());

    $page = $slugs->reduce(function($page, $slug) {
        return ($page->children()->where('slug', $slug)->first());
    }, Page::whereSlug($slugs->shift())->with('children')->first());

    abort_if(!$page, 404);

    return view('pages.single-page', compact('page'));
}

Now, this is not optimal as it performs an extra query for each level of nesting. You could cache this but it would be great to have a more efficient method.

Finally, in order to derive the permalink for a given page I added the following method to my Page model:

public function getPermalink()
{
    return $this->ancestorsAndSelf()->pluck('slug')->implode('/');
}

This will extract all the parents up the heirachy and get connect the slugs together. This is where the Baum package is great as it only takes 1 query for this, no matter how deep the page.

Still need to implement the functionality for organising pages but it should be pretty much acheivable with Baum by just using the makeChildOf($otherNode) method (or makeRoot()).

Any further suggestions welcome, especially with regards to deriving the page from a set of slugs.

1 like
MikeHopley's avatar

Any further suggestions welcome, especially with regards to deriving the page from a set of slugs.

I'm using Baum to organise my content, which can be at any level of depth. I simply have a catch-all route for content pages, like this:

// No "special" routes match, so try matching a content page
get('{path?}', 'ContentController@show')->where('path', '.+');

In the ContentController, I check my database for this url. If that fails, I check for a redirected URL. If that fails, I 404.

These content urls are added to the database when I publish the content page. The URL is generated from the position of the page in the nested set, together with the page breadcrumb (which I have to specify). This means that breadcrumbs and URLs are always consistent.

Using this approach I can easily add, remove, or move content. When content is moved, I automatically create a redirect. Redirects are also stored on the database.

To create content, I made a CMS that lets me choose the location of the new page relative to an existing page. In the background, this will run the appropriate Baum methods.

It gets a little more complicated when you add the ability to make drafts and delay publication. My solution was to keep the drafts table separate, but reserve a placeholder space for the draft on the main pages table. These placeholders are assigned a "draft" status flag on the table. Doing this greatly simplified the logic.

harryg's avatar

@MikeHopley That is a nice approach. You'll certainly save sql statements by storing the permalink on save and update, although you must remember to regenerate it whenever this happens, as well as save a redirect (well suited to a queued job!). I suppose you listen for create/delete/update model events to fire this logic off.

Regarding redirects, how is it avoiding conflicts? I.e. you change the location of a page and create a new page in its place. Won't a redirect mean that the original page will get loaded first or do you check the pages table before applying a redirect? Either way I imagine it can get a bit complex - I'm gonna skip the redirects for now! Good reason to have decent integration tests.

MikeHopley's avatar

I suppose you listen for create/delete/update model events to fire this logic off.

I thought about that, but it didn't seem to work out well. You have to consider that the create event and the moved event (Baum-specific) are different. I did try hooking into the moved event, but I seem to remember it not being sufficient. For one thing, it only covers the case where the page has moved in the nested set. What really matters to me is whether the page URL has changed, which could also happen because I edit a breadcrumb.

In the end, I created some classes to help me. I have a ContentRouteManager that updates the routes for pages. Whenever a page is published or moved, I call (new ContentRouteManager)->updateRoute($page). This rebuilds the route for the page, and adds/changes a redirect if necessary, and then recursively calls itself on all the page's children.

I do think using model events might potentially be more elegant. I just didn't arrive at a sensible solution when I tried them. That may say more about me than anything else!

Regarding redirects, how is it avoiding conflicts? I.e. you change the location of a page and create a new page in its place. Won't a redirect mean that the original page will get loaded first or do you check the pages table before applying a redirect?

"Current" urls are checked first. If none matches, the redirects table is checked.

If I move a page and then create a new page at the original location, then the new page will get loaded and the redirect will be ignored. The redirect could then be removed automatically, since it will never be reached.

If necessary I can also manually add or remove redirects.

Good reason to have decent integration tests.

Definitely. Since these CRUD operations are rather more involved than usual, I've organised them into jobs. For example, I have a PublishContent job, as well as separate EditContentPage and EditContentDraft jobs. Each job is thoroughly tested. There are plenty of opportunities to screw up!

These jobs are run synchronously.

...well suited to a queued job!

Yeah, some of these things might work well as queued jobs. However, I haven't used any because all these things are plenty fast enough. Even with hundreds of queries, you're still looking at a small fraction of a second, which is fine for admin work.

Of course, that could change with more content, in which case I'd look at queuing some stuff.

harryg's avatar

OK I've implemented having the path column in my pages table which removes the need for the complex recursive query to get the page from a set of slugs.

In terms of ensuring a page's path is always correct I made a job to handle the moved and created events. So in a service provider I do this:

public function boot()
{
    Page::moved(function($page) {
        dispatch(new \App\Jobs\UpdatePagePath($page));
    });

    Page::created(function($page) {
        dispatch(new \App\Jobs\UpdatePagePath($page));
    });
}

I don't yet make use of breadcrumbs but I could easily derive a set by exploding the path to get each slug in the trail. You need to dispath the job for both events as a page is not moved when it's created.

The job itself looks like this:

class UpdatePagePath extends Job
{
    private $page;

    /**
     * Create a new job instance.
     *
     * @return void
     */
    public function __construct(Page $page)
    {
        $this->page = $page;
    }

    /**
     * Execute the job.
     *
     * @return void
     */
    public function handle()
    {
        $this->setPath($this->page);
    }

    /**
     * Re-calculate the path of the page and recursively update the paths of children
     * 
     * @return void
     */
    private function setPath(Page $page)
    {
        $page->update(['path' => $page->getPath()]);

        $page->getImmediateDescendants()->map(function($page) {
            $this->setPath($page);
        });
    }
}

This is a recursive operation as if you move a page in the tree, the paths of each descendant page will also change. Thus I might consider making this a queued job to improve admin panel speed for pages with many children but as you say it's probably perfectly fast to do syncronously in most cases.

A nice set of integration tests ensure everything works as expected.

1 like
MikeHopley's avatar

Well done, that was quicker than when I did it!

If I come to refactor my code, I may look again at your code and see what I can learn from it.

MikeHopley's avatar

Thus I might consider making this a queued job to improve admin panel speed for pages with many children but as you say it's probably perfectly fast to do syncronously in most cases.

Since I've been using the system with ~100 pages, I can definitely notice some delay when publishing a page. It's up to a few seconds sometimes, so definitely worth considering.

On the other hand, if I make the jobs asynchronous I might need to make more of the admin dashboard "dynamic", because I need to see the results of adding pages to the tree.

I'm sure I could do a better job of separating these things out too. But the main thing is that it works.

harryg's avatar

@MikeHopley I have done some further refactoring. I'll post a link my repo tomorrow in case you want to review.

A good way to make use of the queues is to synchronously recalculate the page you're editing's url path, but delegate the child pages to a queue as it probably doesn't matter that they're not up to date on page refresh - you're only interested in the current page's attributes.

You also need to account for when a slug is changed as recalculate then as well.

Need to write a few more tests and tidy up but I think I've got something pretty elegant.

1 like
MikeHopley's avatar

@harryg thanks, I'll be very interested to see that! It could be a rare opportunity for me, as I learn best by seeing how something I already made could be done better.

harryg's avatar

@MikeHopley Oh sorry, yeah I merged the branch in and deleted it. Have updated the links to the dev branch so should be working now.

MikeHopley's avatar

Thanks -- it looks pretty elegant! While your code doesn't cover everything I need, I think I can learn some cleaner ways of doing things from it. I will come back to it in the future. :)

Please or to participate in this conversation.