Skip to main content

Quick start with LLM models

LlamaEdge is a suite of component libraries and command line tools for developers to embed and run LLMs in their own apps. The best way to quickly experience LlamaEdge is to use easy-to-use utilities built on top of it.

It takes only a few minutes to start chatting with any open source LLM on your own laptop using LlamaEdge.

Step 1: Install WasmEdge

First off, you'll need WasmEdge, a high-performance, lightweight, and extensible WebAssembly (Wasm) runtime optimized for server-side and edge computing. To install WasmEdge along with the necessary plugin for AI inference, open your terminal and execute the following command:

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s

This command fetches and runs the WasmEdge installation script, which automatically installs WasmEdge and the WASI-NN plugin, essential for running LLM models like Llama 3.1.

Step 2: Download the LLM model

Next, you'll need to obtain a model file. For this tutorial, we're focusing on the Llama 3.2 1B model finetuned for instruction following, but the steps are generally applicable to other models too. Use the following command to download the model file.

curl -LO https://huggingface.co/second-state/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q5_K_M.gguf

This command downloads the Llama-3.2-1B-Instruct model from Huggingface, an AI model hosting platform.

Step 3: Download a portable API server app

Next, you need an application that can build an OpenAI compatible API server for the model. The LlamaEdge api server app is a lightweight and cross-platform Wasm app that works on any device you might have. Just download the compiled binary app.

curl -LO https://github.com/second-state/LlamaEdge/releases/latest/download/llama-api-server.wasm

The LlamaEdge apps are written in Rust and compiled to portable Wasm. That means they can run across devices and OSes without any change to the binary apps. You can simply download and run the compiled wasm apps regardless of your platform.

Step 4: Use the API

Start the web server by running the llama-api-server.wasm app in WasmEdge.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf llama-api-server.wasm -p llama-3-chat

The llama-api-server.wasm is a web server. You can use the OpenAI-compatible /chat/completions API endpoint directly.

curl -X POST http://localhost:8080/v1/chat/completions \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"messages":[{"role":"system", "content": "You are a helpful assistant. Try to be as brief as possible."}, {"role":"user", "content": "Where is the capital of Texas?"}]}'

The response is.

{"id":"chatcmpl-5f0b5247-7afc-45f8-bc48-614712396a05","object":"chat.completion","created":1751945744,"model":"Mistral-Small-3.1-24B-Instruct-2503-Q5_K_M","choices":[{"index":0,"message":{"content":"The capital of Texas is Austin.","role":"assistant"},"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":38,"completion_tokens":8,"total_tokens":46}}

Step 5: Chat with the chatbot UI

The Chatbot UI is a web app that can interact with the OpenAI-compatible /chat/completions API to provide a human-friendly chatbot in your browser.

Download and unzip the HTML and JS files for the Chatbot UI as follows.

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Restart the web server to serve those HTML and JS files.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf llama-api-server.wasm -p llama-3-chat

Go to http://localhost:8080 on your computer to access the chatbot UI on a web page!

Congratulations! You have now started an LLM app on your own device.