Not sure if understand the upside. You will need to load the whole file if you want to search, add, delete, change anything. Vs a simple sql query
Does it make sense to store uploaded file tree in a JSON file instead of DB?
I was suggested to store the tree of users uploaded files in a JSON file somewhere in the project folder. So instead of having a row for every uploaded file, It will be stored in a file - Isn't it much better? Both for access times and less stress on the DB?
Any downside for that?
@Sinnbeck I thought it might be faster in this way because there isn't the part of connecting to a remote service (database), and it's on the filesystem with the project so perhaps relatively fast read? And might also even have files based on date so they won't grow too large in size?
@Ligonsker you can try it out. But be aware that you need to rebuild it from scratch at any change.
It don't understand this bit
And might also even have files based on date so they won't grow too large in size?
@Sinnbeck I was thinking that this json basically stores plenty of file paths and it can get very large if there are many uploads, so I thought it might make lookup time slower? Then just every x period of time store that new upload entries in a new json file to keep it small size for less search time?
What do you mean build it from scratch? If I plan to change a column for example?
@Ligonsker yes exactly. Json isn't great for searching. So if a user wants to find a file named "foo", json isn't the best. And if I want to change a folder name, you need to build the file tree and resave the json file
Any reason to not just make the actual folders and files on disk?
@Sinnbeck it's a personal project I'm doing for fun. To be honest the amount of files is probably going to be relatively small and not large scale app like I make it look it is. I just do this to learn new things in the process, like in this case designing app that would work in large scale - of course it probably makes no sense but I like to find these things I never dealt with before
In this case I read that for storing large amount of files I'd probably want to shard files so that if there are many files they are stored in a way that don't store many files in one path and will be faster to access them.
In my example I store them in path that matches the hash name: h6M3gbh.jpg will be stored in h/6/m/3/g/h6M3gbh.jpg for example
@Ligonsker If you are only storing the file names, it's always faster to store it in a proper table. If it for some reason would become huge, talking about hundreds of millions of rows in the database, then there are ways around that with partitioning in the database.
I would also not recommend having more than 10000 files per directory, if you store more the bottle neck will be the file system and not the database.
I would probably use the id as the filename
1 => 1.jpg
2 => 2.jpg
356 => 356.jpg
It will be lightening fast to search for the record containing the image.
A json file or several json files will never ever be faster than the database.
@Tray2 Thank you, understood, then DB it is! About IDs - I'm just using Laravel's store method and it already hashes the files so I might leave it as it is
And if I choose db, which is better: (Note that Folders is not actual filesystem folder - it's a folder the user creates on the website, like you can create a folder in your phone's gallery - and then the files will be linked to this folder by their id in the table - since the file paths are random by the file hash)
@Ligonsker I would suggest since this is a hobby project for learning try it both ways. Later when you have it as Json you will discover that Json is very hard to work with when it's stored in a field.
In a real project I would use a table.
But congratulations to you, you are doing the correct thing by doing projects to learn from.
@Ligonsker So you are talking about the users creating galleries, then just use a regular id for the gallery as well, then allow the user to name it if they want to, otherwise give it a random name.
Galleries
- id
- name
- user_id
images
- id
- file name
- user_id
Gallery_image
- image_id
- gallery_id
@Tray2 thanks! Can you please explain why you chose the many-to-many in this case? And not the first option where an image "belongs to" a gallery and then the folder_id would be a foreign key on the images table?
@jlrdw thank you :) I will attempt both ways, it's fun to learn these things
@Ligonsker I'm thinking that an image might belong to more than one gallery.
For example if you are on vacation and you take a photo of your wife. You put in the vacation gallery, then your wife might want to put it in another gallery with good photos of her.
@jlrdw Well, that is up to her. The ones she thinks is good ;)
Btw, in case she wants it in another folder in a gallery - shouldn't I make it an actual copy in the filesystem instead of doing symlink to both folders with the same file? I thought that if someone wants to display same photo in two folders - it means it's either moved or copied
I was thinking that in case it's copied - I'll create a real copy in the filesystem and if it's moved I'll change the folder it belongs to
@Ligonsker The file system should only be changed when the image is uploaded or deleted. Let the database keep track on what image belongs to which gallery. There is no reason the store the same image over and over again on it.
@Tray2 And what do you think happens on cloud services like Google Drive or iCloud when a user copies a file? Because these systems track the storage in use by the user to limit him.
You think that if a user copies a file there, what happens is that they just create a new row in the DB with path to this file, and then do some query to calculate the new total used storage? (Btw I added a filesize column in any case to my files table)
I can imagine problems with this approach - because if a user copied same image to 2 folders, and he deletes them from one - I will need to soft delete it and not hard delete it, and if he deletes from both - I then can hard delete it. But that requires extra DB queries no?
@Ligonsker I'd say you are overthinking it a bit here.
- User One uploads a picture.
- User One sets the visibility of uploaded picture by linking it to a gallery.
- User Two likes the picture and adds it to their own gallery.
- User One decides to delete the picture and all references to the picture is deleted.
- User Two no longer sees the picture in his gallery since he doesn't own the picture.
Copying the picture would create a whole new bunch of issues since it
- Would need to be unlinked x number of times.
- The id of the picture would change, which means you would need to keep track of that as well.
- The disk would fill up with hundreds of duplicate pictures.
- The uploaded picture might be "copyrighted", and should not be copied.
@Ligonsker Just putting stuff on disk makes it easy to work with. You can easily find out how much files fill in total
$size = 0;
foreach (Storage::allFiles() as $file) {
$size += Storage::size($file);
}
dd($size);
Regarding the issues you mentioned:
-
I don't need to unlink x number of times - it's an actual copy of the image - I just delete the one the user wants to delete - Just like on your phone's gallery - if you now create new folder and copy an image there, and delete the original image - the copy would still be in the new folder.
-
I won't need to track any change in this case because I just copy to another random place on the filesystem, for example - if the original image is
da2b113b21878cde19e96f4afe69e714.jpgand is stored inuploads/d/a/2/b/1/da2b113b21878cde19e96f4afe69e714.jpg, then I create an actual copy with random hash and store it:uploads/c/1/9/9/9/c199909d0b5fdc22c9db625e4edf0918.jpg, then create a new row in the DB for that new file. -
The disk will fill, but again each user has his limit, so he can have 10,000 of the same image, but then he won't be able to upload anymore
-
There would be permissions for that: A user's account is private and by default only he can access the files he uploads, any copy he makes is only his and no one can access it. If he ever decides to let others view it - he set public permission on that folder/file. And in such cases - even if someone uploads a public photo to Facebook/Instagram/Other platforms - then it's hard to control copyrights because people can just copy it once they can view it. What will stop another user to just copy this image quickly before the original user deletes his once it's public?
What do you think of what I wrote? :D Would love to get your opinion on that
@Sinnbeck, yep, that's why I want to do an actual copy as opposed to symlink DB copy
@Ligonsker I think it's a bad solution, there should only be a single point of truth for the images. You should handle everything in the gallery_image pivot table, but hey it's your application, you do as you please.
I think the user would be annoyed if he only has 3000 images uploaded but the 10000 limit is hit because almost all of then belong to more than one gallery.
@Tray2 Ok, your point makes more sense and I will use it, and since everything is handled in the gallery_image table, then even if the image is deleted from the original uploaded location - it would still be recognized in the other album it was added to without any problems.
The only thing I'd need to take care of is probably do an extra query every time a user deletes a photo from his album - because I will need to do a query that counts the number of times the same image appears in the entire gallery - and if it's only 1 (the current image that the user is deleting) - then I should also delete the image from the filesystem right?
@Ligonsker If you are hellbent on keeping the image then you need to do something like that yes.
I would probably make a job that runs every night, and removed images that doesn't belong to a gallery, rather than doing that each time a image is removed from a gallery.
@Tray2 I mean imagine that a user adds his photo to a second album then he thinks "alright, I don't want it in the first album anymore" and deletes it - I can't automatically delete it from the other album right? because the user deleted it from one album not both. Also the job idea sounds good
@Ligonsker You only delete the image if the user has removed it from all his albums.
@Tray2 Yes exactly, I think I misunderstood your previous comment because you said "If you are hellbent on keeping the image", so I thought you meant keeping the image on the folder the image was added to
@Ligonsker The image should only be stored in one place, there is no reason to copy it anywhere, except for backup purposes.
@Tray2 Yep, that's what I will do, thank you!
Please or to participate in this conversation.