Using Mac M2 Ultra 192GB to Self-Host LLMs?

shaserlark@sh.itjust.works · 7 days ago

Shit just works as usual

shaserlark@sh.itjust.works · 11 days ago

https://hotio.dev/containers/qbittorrent/

Why don’t you use the hotio container? That already has it baked in

shaserlark@sh.itjust.works · 17 days ago

Thanks for the reply, still reading here. Yeah thanks to the comments and reading some benchmarks I abandoned the idea of getting an Apple, it’s just too slow.

I was hoping to test Qwen 32B or llama 70b for running longer contexts, hence the apple seemed appealing.

shaserlark@sh.itjust.works · 17 days ago

Congrats on being that guy

shaserlark@sh.itjust.works · 19 days ago

You’re aware that there’s the OpenAI API library right? https://github.com/openai/openai-python

It’s really nothing fancy especially on Lemmy where like 99% of people are software engineers…

shaserlark@sh.itjust.works · 20 days ago

Are you drunk?

shaserlark@sh.itjust.works · edit-2 21 days ago

Yeah I found some stats now and indeed you’re gonna wait like an hour to process if you throw like 80-100k token into a powerful model. With APIs that kinda works instantly, not surprising but just to give a comparison. Bummer.

shaserlark@sh.itjust.works · 21 days ago

Yeah I was thinking about running something like Code Qwen 72B which apparently requires 145GB Ram to run the full model. But if it’s super slow especially with large context and I can only run small models at acceptable speed anyway it may be worth going NVIDIA alone for CUDA.

shaserlark@sh.itjust.works · 22 days ago

Meh, ofc I don’t.

shaserlark@sh.itjust.works · 22 days ago

I’d honestly be open for that but would an AMD setup not take up a lot of space and consume lots of power / be loud?

It seems like in terms of price & speed, the Macs suck compared to other options, but if you don’t have a lot of space and don’t want to hear an airplane engine constantly I’m wondering if there are options.

shaserlark@sh.itjust.works · edit-2 21 days ago

Using Mac M2 Ultra 192GB to Self-Host LLMs?