• 2 Posts
  • 14 Comments
Joined 2 years ago
cake
Cake day: March 22nd, 2024

help-circle

  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldDo you host your own AI?
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    1
    ·
    edit-2
    8 hours ago

    I completely disagree.

    Frankly, I find the description “VC funding a FOSS” offensive. They aren’t funding the engine. I’ve been messing with LLM inference engines since 2022, and Ollama is the worst I’ve seen in the community.

    They misname models for SEO. They leech off llama.cpp while deliberately hiding attribution yet redirecting GH support requests there. They sometimes make their own GGUFs+forked releases which are broken and incompatibile with upstream llama.cpp, just so they can get a release out a day ahead for hype, even though it doesn’t really work and they’ll never upstream one line. They set a default context size thats basically unusable, they screw up chat templates and deep internal code with no obvious indicators, they release suboptimal quants without iMatrix, they gate you into their internal quantization repo and model card format, they hide model downloads on your hard drive, they mess with standard APIs for no good reason other than to mess up other backends. I could go on and on.

    And if that’s all fine, they’re enshittifying the app with closed code, and pointers to cloud models.

    They GIVE LLM inference a bad name, by making it a terrible quality engine that happens to show up in search as the “default.” Hence the comments below of people being unimpressed with local inference. And they sap attention from actual llama.cpp devs, without contributing a single dime. Everyone in the localllama communtity hates their guts, and that’s not even getting into the interpersonal drama they’ve stirred.

    They are a leech that’s a net drag to the whole community, that we can’t get rid of because they’re attention grifters. And they’ve gotten worse and worse over time.


    It’s more morale to use any cloud API over Ollama, in my eyes. They’re a grift.


    EDIT: And, to be clear, I’m not against VC funded downstream stuff.

    LM Studio is good! Even though it’s closed source.

    Tons of downstream projects are great.




  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldDo you host your own AI?
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    2
    ·
    11 hours ago

    https://sleepingrobots.com/dreams/stop-using-ollama/

    And that’s not even all of it. Basically they break models in many ways, and they’re slimey Tech Bros.

    LM Studio is better, and easy.

    If you’re on Nvidia, and want to run optimally, I would use the ik_llama.cpp fork. On AMD, regular llama.cpp. On a Mac, use an MLX runner (Like LM Studio) with an MLX quant (ideally an MLX-DWQ quant).

    It’s all pretty technical, and… thats kinda the point. LLMs are just too performance sensitive and too finicky to not have a grasp of how they work. There is no “easy button” to run them, there can’t be.

    But if you don’t have time for that and just want to see if it’s worth it, I’d suggest self hosing your own UI, and trying the dirt cheap APIs of models you can theoretically run on your setup. This will give you a “best case” taste of what they’re capable of.





  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldDo you host your own AI?
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    edit-2
    11 hours ago

    Yep.

    I have a RTX 3090 + 128GB CPU RAM.

    Currently I run my own custom IQ3_KT quantization of MiMo 2.5 300B, and it’s crazy good. It’s better than API models from not that long ago, and it’s served at about reading speed.

    Never thought I’d ever run such a thing on my lowly desktop.

    For quick scripts or code assistant, sometimes I use Qwen 27B (another custom quant, currently experimenting with exllama). Or Gemini 12B for messing with image/audio input. But TBH MiMo 2.5 with thinking disabled is smarter than 27B with it.


    …And honestly, I use GLM 5.2 API a good bit.

    I was lucky enough to get a yearly subscription for like $30, 6 months ago. I do self host the UIs or whatever takes the prompts, though.






  • If you’re wondering about Fedora vs CachyOS, it comes down to what you do on your PC. And what you’re used to.

    If you want better “preconfiguration” for graphics stuff, CachyOS is the way to go. With Fedora you will end up referencing and maintaining a whole lot more yourself, while the CachyOS maintainers basically do all that maintinance and config optimization for you.

    But Fedora might be better for a less GPU-focused “workstation” type system.

    Generally, I’d look at the “style” and interests of distro maintainers. CachyOS is built by a collective of linux gaming/compute enthusiasts that snowballed into popularity, though it does inherit all the work from Arch. Fedora is a long standing workstation/server workhorse, a “pre release” for Red Hat enterprise linux.