Indra7667's avatar

non numeric primary key

Just a curiosity of mine about using non-numeric column as primary key (PK) so there's no code to show nor actual database table

Let's say I have a table of computers, each of those computers have unique code in real life so I make the table like this:
id -> smallint (auto increment)
uid -> varchar (8), unique (that computer's irl unique id)
room -> varchar (32)
etc.

and then I have a table of employee_computer like this:
employee_id -> int
computer_id -> smallint/varchar(?)
active -> bit

I know I could use the uid as pk and use varchar with on update cascade on the employee_computer,
but should I do that?

I know that most of the time, using number for pk is better due to fast search time and could easily set unique by using auto-increment, but having id and uid just feel redundant,

especially when searching what computer is being used by who, thing will go
employee.name->employee.id->employee_computer->computer.id->computer.uid

if I use the uid as PK, it will shorten the steps to
employee.name->employee.id->employee_computer->computer.id(varchar)

0 likes
13 replies
jlrdw's avatar

You can still have a PK and index your other column. No different than searching a user unique name (Handle).

1 like
Indra7667's avatar

@jlrdw Yes, but doesn't that mean the computer id became an unused column if I index the unique column?

jlrdw's avatar

@Indra7667 No an indexed column is still used for search, orderby, etc.

No different than having a customer table and searching for a name, or orderby name.

I suggest reading in the MySql manual on indexes.

1 like
Indra7667's avatar

@jlrdw yeah, the more I read the answers in this discussion, the more I'm confused on the definition of indexing. I thought indexing is like marking a column as important to tell the database to "start looking from these column first" during query execution (and some kind of anchor for cross-table references)

Snapey's avatar

What would be the benefit, in your mind?

Is there something about your use case that you think would benefit?

1 like
Indra7667's avatar

@Snapey I'm not sure(?) maybe if a company often synchronize their data with other company's database while both are adding/removing their computer list based on their own situation?\

since the other company might have difference in id/uid combination, the import will become harder to do(?)\

honestly I don't know, it's a made up scenario, it's hard to imagine how things would go

Tray2's avatar

I would suggest against using anything else other than ints as the primary key, since it takes less space and is very indexable. You could use charcters of course as long as they are easy to index, things like english words are quite fast to search for. However any kind of random unique hash makes it trickier to index.

If you have like 100 records, it doesn't matter, but if you have several thousands or even millions of rows, the indexability of you primary an foreign key becomes very important.

I suggest giving this one a read https://tray2.se/posts/properly-formed-foreign-keys-are-your-best-friends

1 like
Indra7667's avatar

@Tray2 If I understand your article correctly, it's about the importance of having foreign table constraint right?\ I understand that part from a personal test to differentiate the speed of filtering indexed column and non-indexed column\

but it's true I haven't considered the difference between indexing(?) random string and indexing(?) int

Tray2's avatar

@Indra7667 Thats is correcy, there is a difference, the random string will be slower since it needs to take more steps, or rather the tree will have more branches.

Simple example showing a simplified index of integers.

To find the value 16, if will first check if the value is between 1 and 10, then 11 and 20, and then look for the value 16 on the second branch and in total look at 8 values, without the index it would look at 16 values before finding the correct one.

  • 1-10
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
  • 11-20
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
  • 21-30
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30

The simple between comparison of a uuid is way harder to do.

1 like
Indra7667's avatar

@Tray2 Aah, I see... so that's how it works. I never thought of how they do comparison before

Snapey's avatar

an index is a lookup table of data that is sorted in favour of one column, so that when you want to find a specific record it speeds up the search if there is an index for the column you are matching.

Each index comes at a cost of write speed, and storage. Everytime you insert a record the indexes need to be updated. So a lot of indexes slows writing, but speeds up reading. The skill comes in finding the right balance for each application.

In your hypothetical case, I would have auto incrementing primary key, and a unique index on the equipment serial number, although I would worry about clashes if two manifacturers have overlapping serial number ranges. For this type of situation, you can have a unique index which is made from two columns combined, such as vendor and serial.

This latter point is why I would always use my own primary key, rather than using external data which cannot be guaranteed to be unique, but still has to be stored.

1 like
Indra7667's avatar

@Snapey I see, I understand. So I take it the primary key id is used for relations and opening specific computer's detail in web (something.net/detail?id={id}) while the serial number column is specifically for searching a device (something.net/search?q={serial}) right?

I don't quite understand the function of the combined column though. I mean, I don't think the combined column would be good for relations key nor is it good for searching a computer

martinbean's avatar

@indra7667 Yes, you can have non-numeric primary keys, but I’d advise against it as an integer primary key is always going to be more performant for look-ups and foreign keys than something string-based.

Given your example:

Let's say I have a table of computers, each of those computers have unique code in real life

I wouldn’t use that as a reason to stray away from numeric primary keys. Store additional identifiers in additional columns. So if the code is something like a SKU, then create a column to hold that value:

Schema::create('computers', function (Blueprint $table) {
    $table->id();
    $table->string('sku')->unique();
    // Any other computer-related columns...
});
1 like

Please or to participate in this conversation.