<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Local AI |</title><link>https://nicolasfbportfolio.netlify.app/tags/local-ai/</link><atom:link href="https://nicolasfbportfolio.netlify.app/tags/local-ai/index.xml" rel="self" type="application/rss+xml"/><description>Local AI</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 21 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://nicolasfbportfolio.netlify.app/media/icon_hu_3795d420522f6b97.png</url><title>Local AI</title><link>https://nicolasfbportfolio.netlify.app/tags/local-ai/</link></image><item><title>llmfit: What LLMs run on my PC?</title><link>https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/</link><pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate><guid>https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/</guid><description>&lt;p&gt;Hi! LinkedIn told me there is a repository that shows you the models (LLMs, VLMs) that better fit your GPU and PC. It is called &lt;em&gt;llmfit&lt;/em&gt; (link to the repo in the resources section). Let&amp;rsquo;s take a look!&lt;/p&gt;
&lt;h2 id="installation"&gt;Installation&lt;/h2&gt;
&lt;p&gt;Couldn&amp;rsquo;t be easier. Run on your linux terminal, no sudo required:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="install"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/install_hu_7d8537d108961c06.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/install_hu_41d4c8741e2b2bc7.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/install_hu_f43e73e55b5dce1b.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/install_hu_7d8537d108961c06.webp"
width="760"
height="239"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Then, type &lt;code&gt;llmfit&lt;/code&gt; on your terminal and that&amp;rsquo;s it!&lt;/p&gt;
&lt;h2 id="execution-and-filtering"&gt;Execution and Filtering&lt;/h2&gt;
&lt;p&gt;As soon as you run it, you find a list of 900+ models under the main hardware specs of your PC (CPU, RAM, GPU and VRAM), with 72 models hidden by &lt;em&gt;incompatible backend&lt;/em&gt; though&amp;hellip;&lt;/p&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="all"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/all_hu_5ddd83016a049f01.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/all_hu_7eff320ef3122dd1.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/all_hu_3658eee1c9cef34f.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/all_hu_5ddd83016a049f01.webp"
width="760"
height="550"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;My list was comprised of 913 models, ordered by score. In addition to the name and score columns, we have: provider, number of parameters in billions, throughput in tokens per second, quantization type, size in disk, mode, memory percentage, context window, date, fit and use case. I couldn&amp;rsquo;t find the models from the Swallow LLM team in Japan, finetuned for Japanese proficiency (no nitpicking intended!).&lt;/p&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="absent"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/absent_hu_d867c96a63c7cdd3.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/absent_hu_af8638647ebbab29.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/absent_hu_b508dfae7291dc53.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/absent_hu_d867c96a63c7cdd3.webp"
width="760"
height="345"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id="alibaba"&gt;Alibaba&lt;/h3&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="alibaba"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/alibaba_hu_3acde906548bd18f.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/alibaba_hu_bf6162e86fde7423.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/alibaba_hu_b85c5f88d8170e32.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/alibaba_hu_3acde906548bd18f.webp"
width="760"
height="546"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;If someone is committed to the local/open-AI-model scene is Alibaba from China, with their famous Qwen models. &lt;strong&gt;Qwen2.5-Coder-14B-Instruct-AWQ&lt;/strong&gt; scored 94 on my hardware. Could be a nice prospect for a coding agent of some sort.&lt;/p&gt;
&lt;h3 id="google"&gt;Google&lt;/h3&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="google"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/google_hu_3e7c1a77c585238e.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/google_hu_cf5c633b7aa6b37a.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/google_hu_1420cbaeaf46503c.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/google_hu_3e7c1a77c585238e.webp"
width="760"
height="339"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Being as renowned as the bigger sibling Gemini, the Gemma family of models is great for general and multimodal local AI. &lt;strong&gt;gemma-3-12b-it&lt;/strong&gt; scored 90 on my hardware. I&amp;rsquo;ve used it before and got good results on OCR tasks, even though I got better results with Mistral-Small3.2 (not in my GPU but in a more capable RTX 5090).&lt;/p&gt;
&lt;h3 id="mistral"&gt;Mistral&lt;/h3&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="mistral"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/mistral_hu_cfff3c257ee597c.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/mistral_hu_bed2c44c5442f1b8.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/mistral_hu_9e0c265d431fe205.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/mistral_hu_cfff3c257ee597c.webp"
width="760"
height="168"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;And speaking of Mistral AI&amp;hellip; &lt;em&gt;Vive la France!&lt;/em&gt; I am very fond of the Mistral models. While the spotlight is elsewhere, they have built great local models, specially in terms of text extraction from images. That&amp;rsquo;s why I immediately noticed the absence of Mistral-Small3.1:24b and Mistral-Small3.2:24b on my list. However, checking the official llmfit repo, I found that might have been due to an updating problem on my end, because Mistral-Small3.1:24b is considered by the llmfit team.&lt;/p&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="mistral2"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/mistral2_hu_7f663eb32a6813a5.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/mistral2_hu_755bfbd9fecf56a6.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/mistral2_hu_2b0c62dbc7fb7e68.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/mistral2_hu_7f663eb32a6813a5.webp"
width="760"
height="312"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 id="meta"&gt;Meta&lt;/h3&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="meta"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/meta_hu_200e8e2f2cef28b.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/meta_hu_a07bbe24bf23d421.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/meta_hu_617b1f6d00865e55.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/meta_hu_200e8e2f2cef28b.webp"
width="760"
height="323"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Zuck&amp;rsquo;s company also has a varied model catalog, from small 1.2B to huge 405B parameter size. Surprise to no one, the 400+ billion parameter models can&amp;rsquo;t run on my system. I would need two DGX Sparks to run one of those for research purposes. However, &lt;strong&gt;Llama-3.2-11B-Vision-Instruct&lt;/strong&gt; scored 94 on my PC. I&amp;rsquo;ve used it in the past on a more powerful PC at work, and it had fair multimodal capabilities.&lt;/p&gt;
&lt;h3 id="microsoft"&gt;Microsoft&lt;/h3&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="microsoft"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/microsoft_hu_d42237147fd8c218.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/microsoft_hu_d3c4188ce9751f26.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/microsoft_hu_f10d00a4536c7fa.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/microsoft_hu_d42237147fd8c218.webp"
width="760"
height="289"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Microsoft also has interesting local models. I&amp;rsquo;m very fond of &lt;strong&gt;phi-4:14b&lt;/strong&gt;, which scored 90 on my PC. It&amp;rsquo;s meant for coding according to llmfit, but in practice it&amp;rsquo;s good for chat and multi-lingual NLP too. It&amp;rsquo;s a 14B powerhouse in my opinion, after using it for almost a year at work. I need to try their newer Phi-4-reasoning-vision-15B, released on March 4, 2026.&lt;/p&gt;
&lt;h3 id="nvidia"&gt;NVIDIA&lt;/h3&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="nvidia"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/nvidia_hu_fa97d093972cd3a.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/nvidia_hu_51b048f46acc01b5.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/nvidia_hu_554d34ba1c5022bc.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/nvidia_hu_fa97d093972cd3a.webp"
width="760"
height="423"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Even though I own 3 NVidia GPUs, I&amp;rsquo;ve never used an LLM released by them. Thanks to llmfit, I noticed a model that could be useful for my recent Japanese endeavours: &lt;strong&gt;NVIDIA-Nemotron-Nano-9B-v2-Japanese&lt;/strong&gt;. Fits perfectly on my system, I should try it with
&lt;/p&gt;
&lt;h3 id="openai"&gt;OpenAI&lt;/h3&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="openai"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/openai_hu_e234b6dc9be3228c.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/openai_hu_e4189252837d8e29.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/openai_hu_1b7cc82c7df31f83.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/openai_hu_e234b6dc9be3228c.webp"
width="760"
height="117"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Who would have thought OpenAI is not that open. Just kidding. With only two models on the list, &lt;strong&gt;gpt-oss-20b&lt;/strong&gt; is my best option, although the 120B variant apparently runs on my system (Mixture of Experts Mode). I know I need to take advantage of my RAM somehow, by loading some parameters there, just have to learn how.&lt;/p&gt;
&lt;h3 id="perfect-fit"&gt;Perfect Fit&lt;/h3&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="perfect"
srcset="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/perfect_hu_31bdeb94551df318.webp 320w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/perfect_hu_2d7d83b137bf9162.webp 480w, https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/perfect_hu_78b1f0d9b7c7f566.webp 760w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://nicolasfbportfolio.netlify.app/blog/llmfit-blog/perfect_hu_31bdeb94551df318.webp"
width="760"
height="546"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;By pressing &lt;code&gt;f&lt;/code&gt;, you can filter by &lt;em&gt;fit&lt;/em&gt;. It turns out 657 out of 913 models fit my system perfectly. The best one comes from a provider called &lt;em&gt;huihui-ai&lt;/em&gt;, their version of Llama-3.2-11B-Vision-Instruct. What might be the differences with the one made by Meta?&lt;/p&gt;
&lt;h2 id="takeaways"&gt;Takeaways&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;llmfit is easy to install, run and navigate.&lt;/li&gt;
&lt;li&gt;Beautiful, clean CLI.&lt;/li&gt;
&lt;li&gt;I could not find some models, e.g. the whole Swallow-LLM family of LLMs from Japan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Thank you for reading!&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="resources"&gt;Resources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;
&lt;/p&gt;</description></item></channel></rss>