@Sinnbeck You're totally right. The problem is due to 'too high' parallelism. We sharded what a job can read, so no job can eat records from others job, also moved to rabbit mq for some task.
RabbitMQ uses more disk and less RAM, redis, for what i can see, is totally trying to keep all in RAM, so when we need to pass 2M of serialized data [please, keep it, it's absolutely needed] or 130k of jobs, RabbitMQ it using A LOT less RAM. I even do not know why.
But the main problem was parallelism using MySql, and, I repeat, we ended creating 'numbered' queues, where every job is spanned using a % 8 to shard record id to access. It works. A little bit slower but it is allowing another bunch of less job in redis.
Actually, redis is still my preferred tools. It's first time I have so many jobs, data, and a customer who is not paying a good server...
we also moved queue handling to a different VPS, with a support DB, so we greatly lowered CPU usage on backend and CPU usge on main DB node. So, totale of RAM and CPU is equal to before, but we are having lot better usage, and cpu and db are less loaded so frontend is still responsive.
Hard but very good week of lesson for us for my work future
Thanks to all for patience.