Skip to main content


LlamaEdge is the easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge.

  • Lightweight inference apps. LlamaEdge is in MBs instead of GBs
  • Native and GPU accelerated performance
  • Supports many GPU and hardware accelerators
  • Supports many optimized inference libraries
  • Wide selection of AI / LLM models

Click on the links to learn why use LlamaEdge instead of Python / Pytorch, llama.cpp, and standalone API servers such as Ollama.