Frustratingly bad at self hosting. Can someone help me access LLMs on my rig from my phone

BlackSnack@lemmy.zip · 7 months ago

tal@lemmy.today · edit-2 7 months ago

Ollama does have some features that make it easier to use for a first-time user, including:

Calculating automatically how many layers can fit in VRAM and loading that many layers and splitting between main memory/CPU and VRAM/GPU. kobold.cpp can’t do that automatically yet.
Automatically unloading the model from VRAM after a period of inactivity.

I had an easier time setting up ollama than other stuff, and OP does apparently already have it set up.