Skip to main content

LlamaEdge

LlamaEdge is the easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge.

Lightweight inference apps. LlamaEdge is in MBs instead of GBs
Native and GPU accelerated performance
Supports many GPU and hardware accelerators
Supports many optimized inference libraries
Wide selection of AI / LLM models

Click on the links to learn why use LlamaEdge instead of Python / Pytorch, llama.cpp, and standalone API servers such as Ollama.