Recent post re: AI as utility

https://www.tomsguide.com/ai/people-will-buy-intelligence-from-us-on-a-meter-chatgpts-ceo-sam-altman-has-critics-worried-with-his-ai-vision

Myself, I’m a fan of local LLM / self hosted ML… but if you ever needed a clarion call that a hard pivot is coming (soon) for online/ cloud based AI…Altman et al are making some concerning mouth noises (to say nothing of broader concerns with OAI, Anthropic etc).

Right now, I’m sketching out a plan where my Raspberry Pi (always on, 2-3w) uses a magic packet to wake up my modest AI server (Lenovo P330 with Tesla P4) if/when needed (Qwen 3.6-35B-A3B); no point in chugging down 80-100w, 24/7 for no good reason.

If the trend continues the direction it appears to be (increasing costs, environmental impacts etc) then I’d feel a lot better hosting my own as port of first call and replacing simpler tasks with more traditional programs. YMMV.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 month ago

    Yeah.

    It’s not even about efficiency, really, but independence from corporations, privacy, and principle. Kind of like Lemmy.

  • Alavi@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    29 days ago

    I started working toward self hosting LLM for my small company using ollama and opencode as agent But I realized a good model like GLM 5 requures 250GB of RAM and 24GB vram with a 4080?? I dont know, this is what the LLM told me itself.

    I ended up using qwen-code2.7-7b-16k.

    Currently the best thing I have is my laptop, 16GB ram, i7 9750H gtx1650

    How do you guys selfhost? What models do you use that are actually good?

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      29 days ago

      I mean…that entirely depends on your use case - and I hate saying that. For me and what I do, Qwen SLM (esp Qwen3-4B 2507 instruct and Qwen3.5-2B) are exceptional. But I’m not trying to do Claude at home.

      Best bet? Spend $10 on OpenRouter and try different models. In a head to head with ChatGPT 5.4 mini (excellent for coding BTW), I’ve found Qwen 3.5 27B more than able to hold its own for coding tasks…IF you narrowly gate it/confine it. The last batch of Qwen’s really are something. Dunno about the 3.7 series.

      Having said ALL that, I’m really tempted to go back in time and code myself a deterministic expert system, with user updatable knowledge cascade, tool calling and a minimal amount of Markov chain word garnish for flavour. I think we use to just call that “a program” lol.

      Really tempted actually, because if 50% of llm use case is basically Super Google but not shit…well, I can make that myself. I just need to point my autism at it.

      PS: this might help

      https://www.youtube.com/watch?v=0AqpaFm11oI

  • Auli@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    Sure but all these self hosted ais are still done by companies who used massive amounts of power and water to train it.

    • KatherinaReichelt@feddit.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 month ago

      Which is an interesting dilemma: Those AIs are already trained. That power and water was used. If you use them, you will not pollute anything. But you may encourage those companies to train another AI

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      No.

      Even the biggest open weights models are trained on pennies compared to OpenAI and Claude. They just don’t have the hardware to be so wasteful.

      In fact, the Nvidia GPU ban was the best thing to ever happen to “small” AI devs. It made them thrifty.

  • GreenKnight23@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    28 days ago

    if you’re selfhosting AI, make sure you at least firewall it off from the internet. many providers still send metrics back home that includes usage and content.

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      25 days ago

      Respectfully, that’s not really how local LLMs work.

      A GGUF model sitting on my hard drive has no ability to “send content back home” any more than a PDF or a JPEG does. If you’re running something like llama.cpp or Ollama entirely locally, the model weights are just data files.

      The real privacy concerns are cloud APIs, telemetry in front-ends, browser extensions, analytics, update services, or accidentally exposing a service to the public internet.

      “Self-hosted AI” isn’t one thing. There’s a huge difference between:

      • Running ChatGPT through an API
      • Running a commercial AI appliance
      • Running a local Qwen/Mistral/Llama model on your own hardware

      Firewalling internet-facing services is good advice. Assuming every local model is secretly uploading prompts is not.

      EDIT: for the record, I didn’t down vote you - that was someone else.

  • Noxy@pawb.social
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    not gonna self host bullshit that wastes resources and makes me dumber.

  • Hiro8811@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    1 month ago

    You’re still paying for electricity and a big part of the world is in a electricity crisis. “AI” has few real uses and LLMs are not one of them.