From an architectural point of view, I would advice to think about this in the vain of there being two mempools.
First the one you know, inside of the full node. Its task is to make the network function (detect double spends etc) and for normies as well as miners this mempool is important to make sure that block transfer is cheap. Which means, it should have all the transactions that might be in the next block so we can avoid fetching them afterwards.
The second mempool would be one explicitly for mining software.
Ideally it can be connected to a full node and slave to is so it is simply a subset from the full node’s mempool in order to avoid any and all validation. Keep it simple, and all that.
But adding new transactions to this mempool can be done much more leniently and take more CPU time (probably multi-threaded) in order to give weights and scores to transactions. Include the highest scoring ones and bumb less happy ones.
So, I imagine this second mempool to be snapshotted for a block template regularly and various algorithms can be used to make the most profitable block within the parameters given. Because we’ll inevitably end up with more transactions than will fit in blocks and selection then should be done smarter than just on fees.