I wish to know if there's other, better way to parse user generated content and keep it safe? Of course, I'm aware of DomDocument and the Purify libraries that help to a certain extent. But let's say you've to perform following tasks(I could do a few of these using DomDocument) -
I wouldn't worry about regex in your use-case (one time conversion of a database). See the 2nd answer in that question on SO you linked to.
It's a little more forgiving to use DomDocument though, depending on the HTML you're trying to parse. Coming up with the correct end-all be-all regex parser isn't the easiest thing with all of the inconsistencies that are probable... "user generated content"
If you also have the CssSelector component installed you can use css selectors to parse the DOM tree which is a lot easier than xpath selectors or trying to find things directly with DOMDocument.