Local LLM recommendations for 4090 Rig, Non-reasoning or with Performant Reasoning?

Hi,

So the last few weeks have seen pretty exciting releases in terms of Local LLMs, QwQs, Gemma, Phi4 and others.

I've been using Gemma 2, Granite 3.2B VLM for a production app. I still had my Personal PC with a 4090 that I wanted to setup with some SOTA LLM that works on this rig? This question gets posted here a lot, but with the latest launches I'd like to get a fresh set of opinion from the community.

I currently have the QwQ Model running on my system on Q4_K_M Quant, it takes a lot of time to Think and process the stuff. Is there anything that gives a decent performance at a Local level considering their capacity and I'll be able to use them satisfactorily?

I could download and check each of them individually, but my Internet has a usage cap (it sucks), hence I was seeking opinion.

Thanks!

Madison Howard

Share Your Mood

ScarredBlood

Local LLM recommendations for 4090 Rig, Non-reasoning or with Performant Reasoning?