memele
2 months ago

Remove unnecessary text from string

Posted 2 months ago by memele

I am learning about web scraping and I have created a website that lists movies and tv shows from the omdb API. I want to copy subtitles from other websites to mine. A lot of websites save subtitles with names that have a lot of additional info like 1080p.BluRay.x264.DTS-FGT but I only need the movie/tv show title so I can look it up in my database and then add the subtitle file to it. How do I get rid of all the extra info in the title? I was thinking of creating an array with most common acronyms like ['720p', '1080p', 'webrip', 'x264', 'en', 'web', 'dl', 'hevc', 'bluray', 'hd', 'aac', 'hdcam'] etc, but this isn't gonna work 100%. Is there a better way to do this?

Please sign in or create an account to participate in this conversation.