Skip to content

Using Ollama

Learn how to set up and connect a self-hosted Ollama instance to generate detailed captions and accurate labels for your pictures with vision-capable LLMs.

Setup

Step 1: Install Ollama

To run Ollama on the same server as PhotoPrism, add the ollama service to the services section of your compose.yaml (or docker-compose.yml) file, as shown in the example below.2

Alternatively, most of the compose.yaml configuration examples on our download server already have Ollama preconfigured, so you can start it with the following command (remove profiles: ["ollama"] from the ollama service to start it by default, without using --profile ollama):

docker compose --profile ollama up -d

compose.yaml

services:
  photoprism:
    ## The ":preview" build gives early access to new features:
    image: photoprism/photoprism:preview
    ...

  ## Ollama Large-Language Model Runner (optional)
  ## Run "ollama pull [name]:[version]" to download a vision model
  ## listed at <https://ollama.com/search?c=vision>, for example:
  ## docker compose exec ollama ollama pull gemma3:latest
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    stop_grace_period: 15s
    ## Insecurely exposes the Ollama service on port 11434
    ## without authentication (for private networks only):
    # ports:
    #  - "11434:11434"
    environment:
      ## Ollama Configuration Options:
      OLLAMA_HOST: "0.0.0.0:11434"
      OLLAMA_MODELS: "/root/.ollama"  # model storage path (see volumes section below)
      OLLAMA_MAX_QUEUE: "100"         # maximum number of queued requests
      OLLAMA_NUM_PARALLEL: "1"        # maximum number of parallel requests
      OLLAMA_MAX_LOADED_MODELS: "1"   # maximum number of loaded models per GPU
      OLLAMA_LOAD_TIMEOUT: "5m"       # maximum time for loading models (default "5m")
      OLLAMA_KEEP_ALIVE: "5m"         # duration that models stay in memory (default "5m")
      OLLAMA_CONTEXT_LENGTH: "4096"   # maximum input context length
      OLLAMA_MULTIUSER_CACHE: "false" # optimize prompt caching for multi-user scenarios
      OLLAMA_NOPRUNE: "false"         # disables pruning of model blobs at startup
      OLLAMA_NOHISTORY: "true"        # disables readline history
      OLLAMA_FLASH_ATTENTION: "false" # enables the experimental flash attention feature
      OLLAMA_KV_CACHE_TYPE: "f16"     # cache quantization (f16, q8_0, or q4_0)
      OLLAMA_SCHED_SPREAD: "false"    # allows scheduling models across all GPUs.
      OLLAMA_NEW_ENGINE: "true"       # enables the new Ollama engine
      # OLLAMA_DEBUG: "true"            # shows additional debug information
      # OLLAMA_INTEL_GPU: "true"        # enables experimental Intel GPU detection
      ## NVIDIA GPU Hardware Acceleration (optional):
      # NVIDIA_VISIBLE_DEVICES: "all"
      # NVIDIA_DRIVER_CAPABILITIES: "compute,utility"
    volumes:
      - "./ollama:/root/.ollama"
    ## NVIDIA GPU Hardware Acceleration (optional):
    # deploy:
    #  resources:
    #    reservations:
    #      devices:
    #        - driver: "nvidia"
    #          capabilities: [ gpu ]
    #          count: "all"

Note that the NVIDIA Container Toolkit must be installed for GPU hardware acceleration to work. Experienced users may also run Ollama on a separate, more powerful server.

Ollama does not enforce authentication by default. Only expose port 11434 inside trusted networks or behind a reverse proxy that adds access control.

Step 2: Download Models

Once the Ollama service is running (see Step 1), you can download any of the listed vision models that match your hardware capabilities and preferences, as you will need it for the next step. For example:

docker compose exec ollama ollama pull gemma3:latest

View Model Comparison ›

Step 3: Configure Models

Now, create a new config/vision.yml file or edit the existing file in the storage folder of your PhotoPrism instance, following the example below. Its absolute path from inside the container is /photoprism/storage/config/vision.yml:

vision.yml

Models:
- Type: caption
  Model: gemma3:latest
  Engine: ollama
  Run: newly-indexed
  Service:
    Uri: http://ollama:11434/api/generate
- Type: labels
  Model: gemma3:latest
  Engine: ollama
  Run: newly-indexed
  Service:
    Uri: http://ollama:11434/api/generate

Scheduling Options

  • Run: newly-indexed (recommended): Runs after indexing completes via the metadata worker, avoiding slowdowns during import while still processing new files automatically. Also supports manual invocations.
  • Run: manual disables automatic execution so you can invoke the model explicitly via photoprism vision run -m caption

Step 4: Restart PhotoPrism

Run the following commands to restart photoprism and apply the new settings:

docker compose stop photoprism
docker compose up -d

You should now be able to use the photoprism vision CLI commands when opening a terminal, e.g. photoprism vision run -m caption to generate captions.

Troubleshooting

Verifying Your Configuration

If you encounter issues, a good first step is to verify how PhotoPrism has loaded your vision.yml configuration. You can do this by running:

docker compose exec photoprism photoprism vision ls

This command outputs the settings for all supported and configured model types. Compare the results with your vision.yml file to confirm that your configuration has been loaded correctly and to identify any parsing errors or misconfigurations.

Performing Test Runs

The following terminal commands will perform a single run for the specified model type:

photoprism vision run -m labels --count 1 --force
photoprism vision run -m caption --count 1 --force

If output is empty, enable trace logging temporarily (PHOTOPRISM_LOG_LEVEL=trace) and re-run the command to inspect the request/response.

GPU Performance Issues

When using Ollama with GPU acceleration, you may experience performance degradation over time due to VRAM management issues. This typically manifests as processing times gradually increasing and the Ollama service appearing to "crash" while still responding to requests, but without GPU acceleration.

The issue occurs because Ollama's VRAM allocation doesn't properly recover after processing multiple requests, leading to memory fragmentation and eventual GPU processing failures.

The Ollama service does not automatically recover from these VRAM issues. To restore full GPU acceleration, manually restart the Ollama container:

docker compose down ollama
docker compose up -d ollama

This will clear the VRAM and restore normal GPU-accelerated processing performance.


  1. Available to all users with the next stable version, see our release notes for details. 

  2. Unrelated configuration details have been omitted for brevity.