I try to modernize an animal genetics hobby project of mine (>15 years old).
I have entries with title, description, gene code and images.
The gene code consists of gene pairs currently saved in table columns.
This genes are similar, currently there are 17, but they will be more in the future.
While upgrading i think about extending the project so it can be used for more than one species.
All species have in fact the same genes, but not each gene is relevant for each species.
Maybe i am going to explain a little too much, but i am unsure how much you need to know to understand the question.
- Each animal has 2 genes of each sort (on from each parent, you know).
- Genes may have dominant variants, recessive ones or variants with partial dominance.
- Often only the dominant one matters, but not always.
- For the partially dominant ones it's a sort of addition of both.
I use numbers to save the different gene variants.
This allows me to sort them by dominance and to use what notation i want (scientific names, common names, ...)
So for each gene i concatenate the numbers and save them in a char(2) column.
0 means not set (f.e. in search), 9 means unknown.
Example:
19 = one is the dominant one, the other is unknown
22 = both are the same - the recessive variant.
From genes relevant for sorting a number like 112142001013 is created as sort_order to get entries lists ordered by dominance and gene order.
There are genes which have very common mutations in most species up to genes only relevant for one species.
Nonrelevant genes may be saved as NULL or with there wildtype as default value.
I need to enlarge the gene number pool to > 10.
Maybe 1090 and 2020 rather than 19 and 22, which causes problems with 00, which works as string, but not as a real number.
As a first step of refactoring i created a bunch of similar Enums for the gene variants and a bunch of classes which holds pairs of them for each gene.
I made Enums like this:
\App\Enums\Species1\gene1, gene2, ...
\App\Enums\Species2\gene1, gene2, ...
All Enums use the same trait to get the number, name, symbol, ...
The classes are like this:
\App\Models\Species1\Gene1, Gene2, ...
\App\Models\Species2\Gene1, Gene2, ...
\App\Models\BaseGene, Gene1, Gene2, ...
final class Species/Gene1 inherits from class Gene1 which inherits from abstract class BaseGene.
So i can use common methods for common genes and may overwrite them on each level.
Question
Now i need to put this genes back in my entry model. But what's the best/most efficiently way for this?
Several groups of genes are used to generate name parts of the generic name for the genetic variant of the animal.
Like this:
Genes ABC -> part1
Gene DEF -> part2
Genes GHI -> part3
generic Name: __('part1') . __('part2') . __('part3')
I am not sure if it's useful to use gene groups. These groups may be different between species.
I could use BaseGenecode and GenecodeSpecies1, 2, ...
- Put all the genes in a json column? Probably not. I need to search entries with certain genes.
- Let them stay in the entries table and add more columns for new genes?
- Made a separate table gene_codes with all the genes (mostly to split it up someway)?
- Made separate gene_code tables per species? (probably not)
- Made a hasMany genes table with all the single gene pairs? But then i need to search all the genes for all the entries for listing them with gene codes...
- ???
Thank you for reading down here! I would be glad if someone has time to help me think around this problem.