Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

keevitaja's avatar

Finding the "shortest path" with SQLite and Lua

Hello!

I have relational SQLite setup with 2 tables:

nodes table with columns id and connector_id
connectors table with columns id, name, node_from_id and node_to_id

I need to find all nodes between x and y node.

Example from 2 to 7

like node 2 leads to node 12, nodel 12 leads to 3 and 3 leads to 7

So i would get nodes 12 and 3. But there can be other connectors as well and i always need to find the shortest possible path.

How can it be done? Amount of nodes can easily be 100 000 or more

0 likes
34 replies
pmall's avatar

This is an extremely difficult graph theory problem ^^ With database involved it gets even more difficult.

https://en.wikipedia.org/wiki/Shortest_path_problem

I dont think there is a solution other than selecting all nodes and edges, construct the network and applying some graph theory algorithm.

phildawson's avatar

So its like this kind of thing but on a bigger scale?

2  leads to 5,12,8
5  leads to 2,4
4  leads to 3
3  leads to 7
12 leads to 3,6
6  leads to 5
8  leads to 9,1,5
9  leads to 4
1  leads to 2

[ 2 ] -> [  5 ] -> [ 2 ] X stop checking
[ 2 ] -> [  5 ] -> [ 4 ] -> [ 3 ] -> [ 7 ] 3 steps to reach 7
[ 2 ] -> [ 12 ] -> [ 3 ] -> [ 7 ] 2 steps to reach 7 (Winner)
[ 2 ] -> [ 12 ] -> [ 6 ] -> [ 5 ] -> [ 2 ] X stop checking
[ 2 ] -> [ 12 ] -> [ 6 ] -> [ 5 ] -> [ 4 ] -> [ 3 ] -> [ 7 ] 5 steps to reach 7
[ 2 ] -> [  8 ] -> [ 9 ] -> [ 4 ] -> [ 3 ] -> [ 7 ] 4 steps to reach 7
[ 2 ] -> [  8 ] -> [ 1 ] -> [ 2 ] X stop checking
[ 2 ] -> [  8 ] -> [ 5 ] -> [ 2 ] X stop checking
[ 2 ] -> [  8 ] -> [ 5 ] -> [ 4 ] -> [ 3 ] -> [ 7 ] 4 steps to reach 7

I would have though you just recursively loop each node's nodes until you either get to the value or itself, and record the num steps to determine a winner after all have been checked.

1 like
keevitaja's avatar

But with lot of nodes, 100k+, this can take forever...

phildawson's avatar

this can take forever...

Have you tried any code you can show? or is that a guess it will take a long time?

How is the data being used? Is this a one time calculation or is the dataset constantly changing?

pmall's avatar

or is that a guess it will take a long time?

As a mentioned above it is a well known problem.

You cant calculate this live. If you want to achieve this you will have to run an algorithm on this dataset and store the shortest path for each pair of nodes.

keevitaja's avatar

I am just guessing, but there might be up to 8 connections per node...

And data is changing.

phildawson's avatar

@keevitaja I was just trying to establish how frequent? Every second, min, or hour, day, wk, month, or year?

If you can cache the results then it'll be fine.

I would go about turning the records into something like this, each time you work out from the current node to the target you cache which node took you there the quickest and how many steps it took. Then it just needs to check does this particular node have the path to 7 already, yes, well then how many steps is it, if not find out and cache for future checks.

$graph = [
    2 =>  [
        'nodes' => [5,12,8],
        'paths' => [
            7 => ['node' => 12, 'steps' => 2]
        ]
    ],
    5 =>  [
        'nodes' => [2,4],
        'paths' => [
            7 => ['node' => 4, 'steps' => 2]
        ]
    ],
    4 =>  [
        'nodes' => [3],
        'paths' => [
            7 => ['node' => 3, 'steps' => 1]
        ]
    ],
    3 =>  [
        'nodes' => [7],
        'paths' => [
            7 => ['node' => 7, 'steps' => 0]
        ]
    ],
    12 => [
        'nodes' => [3,6],
        'paths' => [
            7 => ['node' => 3, 'steps' => 1]
        ]
    ],
    6 =>  [
        'nodes' => [5],
        'paths' => [
            7 => ['node' => 5, 'steps' => 3]
        ]
    ],
    8 =>  [
        'nodes' => [9,1,5],
        'paths' => [
            7 => ['node' => 5, 'steps' => 3]
        ]
    ],
    9 =>  [
        'nodes' => [4],
        'paths' => [
            7 => ['node' => 4, 'steps' => 2]
        ]
    ],
    1 =>  [
        'nodes' => [2],
        'paths' => [
            7 => ['node' => 2, 'steps' => 3]
        ]
    ],
    7 =>  [
        'nodes' => [23,47],
        'paths' => []
    ],

];

eg so going from 5 to 7 take route 4 (2 steps away)

    5 =>  [
        'nodes' => [2,4],
        'paths' => [
            7 => ['node' => 4, 'steps' => 2]
        ]
    ],
    4 =>  [
        'nodes' => [3],
        'paths' => [
            7 => ['node' => 3, 'steps' => 1]
        ]
    3 =>  [
        'nodes' => [7],
        'paths' => [
            7 => ['node' => 7, 'steps' => 0]
        ]
    ],

then you find say 2 to 6, looks like 12 is quickest.

2 =>  [
    'nodes' => [5,12,8],
    'paths' => [
        6 => ['node' => 12, 'steps' => 1],
        7 => ['node' => 12, 'steps' => 2]
    ]
],
    12 => [
        'nodes' => [3,6],
        'paths' => [
            6 => ['node' => 6, 'steps' => 0],
            7 => ['node' => 3, 'steps' => 1]
        ]
    ],
willvincent's avatar

This blog post will probably be useful, it covers exactly this topic, and how to achieve it with recursive queries: http://www.vertabelo.com/blog/technical-articles/sql-recursive-queries

EDIT: the bad news is, this isn't very easy to do with mysql, since it lacks some necessary functionality. It can still be accomplished with some workarounds though. Research hierarchical queries in mysql, probably will have to make use of mysql views and/or stored procedures to accomplish in MySQL

jimmck's avatar

By shortest possible path, do you mean smallest sized result set with no duplicates? Because that is what a database query is for. There are no distances involved besides the size of your result set. Each node can have an infinite number of nodes between from_node and two_node. Does the node record imply that there are n nodes between from and to, or exactly 2 nodes; from and to? This is a good way to test the solder joints on your motherboard, other than just taking up disk space.

willvincent's avatar

shortest path is the number of hops between two distinct elements that are related through one or more other elements.

Take this diagram, for instance:

graph

Any node on that graph can be related to any other node, one or more steps away. The path 0 -> 1 -> 2 is a shorter path than 0 -> 1 -> 3 -> 5 -> 8 -> 6 -> 2 though both are valid relational paths between nodes 0 and 2.

1 like
uxweb's avatar

Isn't this related to the nested set model?

keevitaja's avatar

@uxweb no, as it does not have nested nodes, eg no parant-child relations.

it is a "relational" set

phildawson's avatar

This isn't really the find the shortest-path just the least number of hops should be relatively straightforward.

@keevitaja Is there a sample dataset to try or any existing code you have produced to try and solve it?

Also going back to my q are the nodes constantly changing as in every second, min, or hour, day, wk, month, or year? And most importantly are you controlling when the dataset changes via an import or is this live data you have no control over when new nodes are inserted?

pmall's avatar

All the most efficient algorithms for this problem are in the first link I posted. Again, there is no solution without constructing the whole network.

neo4j would probably be better suited for this.

1 like
phildawson's avatar

@pmall @keevitaja @willvincent

This is exactly what I had in mind using DLL in php.

http://www.sitepoint.com/data-structures-4/

Based on my example using the adjacency list on the previous page.

(new Graph([
    2  => [5,12,8],
    5  => [2,4],
    4  => [3],
    3  => [7],
    12 => [3,6],
    6  => [5],
    8  => [9,1,5],
    9  => [4],
    1  => [2],
    7  => [5]
]))->breadthFirstSearch(2, 7); // 2 to 7 in 3 hops 2->12->3->7

I'd be interested at the speed throwing 100000 nodes with 8 connectors at it.

keevitaja's avatar

This actually a mapper engine for a MUD i play! To be correct the mapper engine i am trying to build.

And now the goal is to build it with Sqlite. So forget i mentioned Mysql! Scripting language i use is Lua, but it is not important.

https://en.wikipedia.org/wiki/MUD

Each room may have north, west, east, south, up, down and n+1 custom exits. There are also "global exits" in a form of hand held portals. For instance if i have portal to room 222, then each room has this "global exit" which leads room 222.

Of course there are many more variables to consider, but these are the basics.

And yes, there are written mappers for various clients. But that is not the goal. Goal is DIY - do it yourself.

@phildawson so it does run when i need to use the mapper.

phildawson's avatar

@keevitaja I'll give it a go, looks like it would need only minor alterations to get it to get the command after finding the path :)

Have you got some to -> from paths you would like to test?

phildawson's avatar

@keevitaja So I have a working example, and have added a debug so you can see the path it took.

With 489 exits it takes ~0.00001696090698242 seconds to calculate the route on my desktop iMac (3.2 i5).

use App\Graph;
use App\Exceptions\NoRouteException;

$graph = [];

foreach(DB::table('exits')->get() as $exit) {
    $graph[$exit->from_num][$exit->command] = $exit->to_num;
}

$g = new Graph($graph);

try {
    $result = $g->breadthFirstSearch(30602, 30638);
    dd($result);

} catch(NoRouteException $e) {
    dd($e->getMessage());
}
<?php

namespace App;

use App\Exceptions\NoRouteException;
use SplDoublyLinkedList;
use SplQueue;

class Graph
{
    protected $graph;
    protected $visited = array();

    public function __construct($graph) {
        $this->graph = $graph;
    }

    // find least number of hops (edges) between 2 nodes
    // (vertices)
    public function breadthFirstSearch($origin, $destination) {
        // mark all nodes as unvisited
        foreach ($this->graph as $vertex => $adj) {
            $this->visited[$vertex] = false;
        }

        // create an empty queue
        $q = new SplQueue();

        // enqueue the origin vertex and mark as visited
        $q->enqueue($origin);
        $this->visited[$origin] = true;

        // this is used to track the path back from each node
        $path = array();
        $path[$origin] = new SplDoublyLinkedList();
        $path[$origin]->setIteratorMode(
            SplDoublyLinkedList::IT_MODE_FIFO|SplDoublyLinkedList::IT_MODE_KEEP
        );

        $path[$origin]->push($origin);

        $found = false;
        // while queue is not empty and destination not found
        while (!$q->isEmpty() && $q->bottom() != $destination) {
            $t = $q->dequeue();

            if (!empty($this->graph[$t])) {
                // for each adjacent neighbor
                foreach ($this->graph[$t] as $vertex) {

                    if (isset($this->visited[$vertex]) && !$this->visited[$vertex]) {
                        // if not yet visited, enqueue vertex and mark
                        // as visited
                        $q->enqueue($vertex);
                        $this->visited[$vertex] = true;
                        // add vertex to current path
                        $path[$vertex] = clone $path[$t];
                        $path[$vertex]->push($vertex);
                    }
                }
            }
        }

        if (isset($path[$destination])) {

            $result = [];

            $result['rooms'] = [];
            $result['commands'] = [];
            $result['debug'] = [];

            while($path[$destination]->valid()) {

                $current = (int)$path[$destination]->current();
                $path[$destination]->next();
                $next = $path[$destination]->current();

                $result['rooms'][] = $current;

                if (!is_null($next)) {
                    $result['commands'][] = array_search($next, $this->graph[$current]);
                    $result['debug'][$current] = $this->graph[$current];
                }
            }

            return $result;
        }
        else {
            throw new NoRouteException("No route from $origin to $destination");
        }
    }
}
array:3 [▼
  "rooms" => array:11 [▼
    0 => 30602
    1 => 30607
    2 => 30611
    3 => 30612
    4 => 30618
    5 => 30619
    6 => 30620
    7 => 30621
    8 => 30622
    9 => 30623
    10 => 30638
  ]
  "commands" => array:10 [▼
    0 => "e"
    1 => "e"
    2 => "s"
    3 => "s"
    4 => "e"
    5 => "s"
    6 => "e"
    7 => "e"
    8 => "n"
    9 => "w"
  ]
  "debug" => array:10 [▼
    30602 => array:4 [▼
      "e" => "30607"
      "w" => "30697"
      "s" => "30601"
      "n" => "30603"
    ]
    30607 => array:4 [▼
      "e" => "30611"
      "w" => "30602"
      "s" => "30608"
      "n" => "30606"
    ]
    30611 => array:3 [▼
      "w" => "30607"
      "s" => "30612"
      "n" => "30610"
    ]
    30612 => array:3 [▼
      "w" => "30608"
      "s" => "30618"
      "n" => "30611"
    ]
    30618 => array:2 [▼
      "e" => "30619"
      "n" => "30612"
    ]
    30619 => array:2 [▼
      "s" => "30620"
      "w" => "30618"
    ]
    30620 => array:2 [▼
      "e" => "30621"
      "n" => "30619"
    ]
    30621 => array:2 [▼
      "e" => "30622"
      "w" => "30620"
    ]
    30622 => array:2 [▼
      "w" => "30621"
      "n" => "30623"
    ]
    30623 => array:2 [▼
      "s" => "30622"
      "w" => "30638"
    ]
  ]
]
phildawson's avatar

Reading the rooms the journey makes sense.

    $rooms = DB::table('rooms')->select('num','name')->whereIn('num', $result['rooms'])->get();
    dd($rooms);
     .-' _/   _/\_   \_'-.
    |__ /   _/\__/\_   \__|
       |___/\_\__/  \___|
              \__/
              \__/
               \__/
                \__/
             ____\__/___
       . - '             ' -.
      /                      \
~~~~~~~  ~~~~~ ~~~~~  ~~~ ~~~  ~~~~~
  ~~~   ~~~~~   ~!~~   ~~ ~  ~ ~ ~pjb
array:11 [▼
  0 => {#566 ▼
    +"num": "30602"
    +"name": "Off the Coast of Verume"
  }
  1 => {#567 ▼
    +"num": "30607"
    +"name": "The Shores of Verume"
  }
  2 => {#568 ▼
    +"num": "30611"
    +"name": "Along the Sands"
  }
  3 => {#569 ▼
    +"num": "30612"
    +"name": "Along the Sands"
  }
  4 => {#570 ▼
    +"num": "30618"
    +"name": "A worn pathway"
  }
  5 => {#571 ▼
    +"num": "30619"
    +"name": "A worn pathway"
  }
  6 => {#572 ▼
    +"num": "30620"
    +"name": "A worn pathway"
  }
  7 => {#573 ▼
    +"num": "30621"
    +"name": "A worn pathway"
  }
  8 => {#574 ▼
    +"num": "30622"
    +"name": "A worn pathway"
  }
  9 => {#575 ▼
    +"num": "30623"
    +"name": "Entering the Jungles of Verume"
  }
  10 => {#576 ▼
    +"num": "30638"
    +"name": "Jungles of Verume"
  }
]
      {0O}
      \__/
      /^/
     ( (              -Peter Bier-
     \_\_____
     (_______)
    (_________()Oo
keevitaja's avatar

BFS is too slow. I tried the algorithm on 90k+ exits, and it takes forever.

I ported https://github.com/lextoumbourou/bfs-php to Lua:

require "tbl" -- table.contains and table.copy
local queue = require "queue"

local function bfs(graph, start, goal)
    if not graph[start] then return false end

    local visited = {}
    local queue = queue:init()

    queue:push({start})
    table.insert(visited, start)

    while queue:count() > 0 do
        local path = queue:pull()
        local node = path[#path]

        if node == goal then return path end

        for _, exit in pairs(graph[node]) do
            if not table.contains(visited, exit) then
                table.insert(visited, exit)

                if graph[exit] then
                    local new = table.copy(path)

                    table.insert(new, exit)
                    queue:push(new)
                end
            end
        end
    end

    return false
end

return bfs

So i think the only option is to do get the path out from the database. I'll dig...

keevitaja's avatar

@phildawson yes, i do.

https://github.com/keevitaja/bfs-lua-test

One thing for sure, my queue implementation is slow. Very slow. So perhaps PHP will be more speedy. But as i am not going to use this with PHP, there is no point int trying it with PHP.

run the test.lua , all times are from the os.clock() first time call

goal room 30483 is just few rooms away from start and bfs takes just few milliseconds to run. 41624 takes 10 seconds.

also fetching takes lot of time. and there is not much difference, if i play with tables inside the fetching loop or not. so i guess i would need a sql query, which takes the correct nodes out.

Running Dell E5420, 7200rpm hdd, i3

phildawson's avatar

@keevitaja

I've just downloaded the aardwolf.db with the 90,939 exits, this again running on a low-spec desktop iMac.

52180 -> 30483

0.046905040740967 seconds

array:2 [▼
  "rooms" => array:4 [▼
    0 => 52180
    1 => 30468
    2 => 30469
    3 => 30483
  ]
  "commands" => array:3 [▼
    0 => "n"
    1 => "n"
    2 => "n"
  ]
]

52180 -> 41624

0.15206694602966 seconds

array:2 [▼
  "rooms" => array:21 [▼
    0 => 52180
    1 => 34652
    2 => 12885
    3 => 12945
    4 => 13005
    5 => 13065
    6 => 13125
    7 => 13185
    8 => 13245
    9 => 13305
    10 => 13365
    11 => 13425
    12 => 13485
    13 => 13545
    14 => 13546
    15 => 13547
    16 => 13548
    17 => 13549
    18 => 13550
    19 => 13551
    20 => 41624
  ]
  "commands" => array:20 [▼
    0 => "e"
    1 => "s"
    2 => "s"
    3 => "s"
    4 => "s"
    5 => "s"
    6 => "s"
    7 => "s"
    8 => "s"
    9 => "s"
    10 => "s"
    11 => "s"
    12 => "s"
    13 => "e"
    14 => "e"
    15 => "e"
    16 => "e"
    17 => "e"
    18 => "e"
    19 => "e"
  ]
]

On a decent production server it should totally nail these back down to 0.0001 seconds

phildawson's avatar

@keevitaja so I've just done

brew install lua
brew install luarocks
luarocks install luasql-sqlite3

52180 -> 30483

fetching: 0.27418
bfs: 0.274727
time elapsed: 0.274734

52180 -> 41624

fetching: 0.239803
bfs: 6.882752
time elapsed: 6.882805

thats 6.88 seconds vs the PHP 0.15 seconds :P

keevitaja's avatar

yeah, and afaik both scripts have the same bfs implementation

Next

Please or to participate in this conversation.