LlamaEdge vs llama.cpp
The llama.cpp project is one of the inference backends for LlamaEdge. LlamaEdge provides high level application components to interact with AI models, such as encoding and decoding data, managing prompts and contexts, knowledge supplement, and tool use. It simplifies how business applications could make use of the models. LlamaEdge and llama.cpp are complementary technologies.
In fact, LlamaEdge is designed to be agnostic to the underlying native runtimes. You can swap out llama.cpp for a different LLM runtime, such as Intel neural speed engine and Apple MLX runtime, without changing or even recompiling the application code.
Besides LLMs, LlamaEdge could support runtimes for other types of AI models, such as stable diffusion, Yolo, whisper.cpp, and Google MediaPipe.