Unleash the Power of LLMs with LocalAI
LocalAI is a remarkable open-source project that unlocks the potential of large language models (LLMs) and brings them directly to your own hardware. Think of it as a locally-hosted, self-contained alternative to cloud-based AI solutions like OpenAI’s GPT-3. With LocalAI, you gain:
- Privacy and Control: Your data and AI interactions remain entirely within your own environment.
- Reduced Costs: Avoid recurring usage fees associated with third-party cloud services.
- Customization: Adapt and fine-tune models to suit your specific needs.
- Offline Functionality: Use LocalAI even without an internet connection.
What Can You Do with LocalAI?
LocalAI’s capabilities closely mirror those of cloud-based AI APIs, including:
- Text Generation: Create human-quality text, write different creative content forms, or translate languages.
- Code Generation: Assist with coding tasks and generate code snippets.
- Text Summarization: Condense lengthy text into concise summaries.
- Question Answering: Provide answers to factual inquiries based on your data.
- Image Generation (with certain models): Craft original images from text descriptions.
Why Choose LocalAI?
- Open-Source: Benefit from a community-driven project with transparent development and the freedom to modify.
- Cost-Effective: LocalAI is free to use, with the main cost being your hardware resources.
- Hardware Flexibility: Run LocalAI on standard consumer-grade hardware (CPU-based), or leverage a GPU for accelerated performance if you have one.
Streamlined Installation with Docker
While installing LocalAI’s dependencies directly is possible, Docker significantly simplifies the process:
- Prerequisites: Ensure you have Docker and Docker Compose installed.
- Project Setup: Create a project directory and the
docker-compose.yml
file as outlined in the previous article. - Run with Docker Compose: Use the command
docker-compose up -d
. - Access LocalAI: Open http://localhost:8080 in your web browser.
Exploring LocalAI
LocalAI provides both a web interface and a REST API. Start experimenting with different models and discover the incredible capabilities of LLMs on your own machine. The LocalAI documentation offers in-depth guidance.
Docker compose
This docker compose
services:
localai:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
image: quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg
container_name: localai
tty: true # enable colorized logs
restart: unless-stopped
ports:
- 8080:8080
env_file:
- .env
volumes:
- ./models:/models:cached
- ./images/:/tmp/generated/images/
environment:
- 'PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"},{"url": "github:go-skynet/model-gallery/mixtral-Q3.yaml", "name": "mixtral-Q3"},{"url": "github:go-skynet/model-gallery/stablediffusion.yamll", "name": "stablediffusion"},{"url": "github:go-skynet/model-gallery/whisper-base.yaml", "name": "whisper"}]'
- MODELS_PATH=/models
command: ["/usr/bin/local-ai" ]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 20
Docker compose explanation
services:
- localai: This defines a single Docker container service named ‘localai’.
deploy:
- resources:
- reservations:
- devices:
- driver: nvidia: This section reserves a single NVIDIA GPU for use by the container.
- count: 1: Specifies that only one GPU should be reserved.
- capabilities: [gpu]: Ensures the container has the necessary permissions to access and use the GPU.
- devices:
- reservations:
- This whole deploy section can be removed if you DON’T have a GPU.
image: quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg
- Specifies the Docker image to use. This image is a CUDA 12-enabled version of LocalAI that includes FFmpeg for additional multimedia capabilities.
- remove –ffmepg if you dont need to work with audio
- remove –cude12 if you dont want cuda support or change to cuda11 if you are on older cuda.
container_name: localai
- Gives the container a user-friendly name.
tty: true
- Enables colorized logs in your terminal for better readability.
restart: unless-stopped
- Ensures the container restarts automatically unless you explicitly stop it.
ports:
- 8080:8080 Maps port 8080 on your host machine to port 8080 inside the container, allowing you to access the LocalAI service.
env_file:
- .env Loads environment variables from an external
.env
file. This file might contain sensitive API keys or other configuration settings.
volumes:
- ./models:/models:cached Mounts your local
./models
directory into the container’s/models
directory using ‘cached’ mode for potential performance improvements. - ./images/:/tmp/generated/images/ Maps a local directory for storing images generated by LocalAI.
environment:
- PRELOAD_MODELS=[…] Defines a list of models to automatically load at startup. It includes GPT-3.5-turbo, Mixtral-Q3, Stable Diffusion, and Whisper models. you can remove any you dont want.
- gpt-3.5-turbo
- URL: github:go-skynet/model-gallery/gpt4all-j.yaml
- Name: gpt-3.5-turbo
- Description: This likely refers to a GPT-3 like text generation model. The “turbo” designation might suggest optimizations for speed or efficiency.
- mixtral-Q3
- URL: github:go-skynet/model-gallery/mixtral-Q3.yaml
- Name: mixtral-Q3
- Description: This model’s purpose is less evident from the name alone. It might be a multilingual model, a code generation model, or something else entirely. You’ll likely need to refer to the project’s documentation for a more specific description.
- stablediffusion
- URL: github:go-skynet/model-gallery/stablediffusion.yaml
- Name: stablediffusion
- Description: This is a Stable Diffusion model, used for generating images from text descriptions.
- whisper
- URL: github:go-skynet/model-gallery/whisper-base.yaml
- Name: whisper
- Description: This is an automatic speech recognition (ASR) model from OpenAI, capable of transcribing audio into text. The “base” likely indicates it’s a smaller or foundational version of the model.
- gpt-3.5-turbo
- MODELS_PATH=/models specifies the model directory within the container.
command: [“/usr/bin/local-ai”]
- This is the command executed when the container starts, launching the LocalAI application.
healthcheck:
- Provides a way for Docker to monitor the container’s health. It periodically runs
curl -f http://localhost:8080/readyz
to check if LocalAI is responsive. The settings define the interval, timeout, and number of retries.
LocalAI provides a variety of Docker images to accommodate different hardware setups and model preferences. You can select the most suitable image by adjusting the image tag in your docker-compose.yml file. Here’s a breakdown of some common options:
Latest Development Versions:
quay.io/go-skynet/local-ai:master-cublas-cuda12
or quay.io/go-skynet/local-ai:master-cublas-cuda11
: These images offer the latest features and models from the development branch. Choose the CUDA version (11 or 12) that matches your NVIDIA GPU.quay.io/go-skynet/local-ai:master
: This image is for CPU-only systems without a compatible NVIDIA GPU.
Stable Releases:
quay.io/go-skynet/local-ai:latest-cublas-cuda12
or quay.io/go-skynet/local-ai:latest-cublas-cuda11
: These images provide tested and stable functionality, along with GPU support (choose the matching CUDA version).quay.io/go-skynet/local-ai:latest
: Use this image if you don’t have a GPU.
Important Notes:
GPU Acceleration: CUDA-enabled images are specifically designed for systems with a compatible NVIDIA GPU.
Development vs. Stable: Choose development images if you want the absolute latest features but are willing to accept a potentially less stable experience. For the most reliable setup, use the stable releases.
Conclusion
LocalAI empowers developers, enthusiasts, and businesses to harness the power of AI in a private, flexible, and cost-effective manner. Its easy integration with Docker further enhances its usability. If you’re seeking to leverage large language models in your projects or research, LocalAI is an invaluable tool to explore.