Ollama Setup Guide¶

Learn how to set up and connect a self-hosted Ollama instance to generate detailed captions and accurate labels for your pictures with vision-capable LLMs.

Step 1: Install Ollama¶

To run Ollama on the same server as PhotoPrism, add the ollama service to the services section of your compose.yaml (or docker-compose.yml) file, as shown in the example below.¹

Alternatively, most of the compose.yaml configuration examples on our download server already have Ollama preconfigured, so you can start it with the following command (remove profiles: ["ollama"] from the ollama service to start it by default, without using --profile ollama):

docker compose --profile ollama up -d

compose.yaml

services:
  photoprism:
    ## The ":preview" build gives early access to new features:
    image: photoprism/photoprism:preview
    ...

  ## Ollama Large-Language Model Runner (optional)
  ## Run "ollama pull [name]:[version]" to download a vision model
  ## listed at <https://ollama.com/search?c=vision>, for example:
  ## docker compose exec ollama ollama pull gemma3:latest
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    stop_grace_period: 15s
    ## Insecurely exposes the Ollama service on port 11434
    ## without authentication (for private networks only):
    # ports:
    #  - "11434:11434"
    environment:
      ## Ollama Configuration Options:
      OLLAMA_HOST: "0.0.0.0:11434"
      OLLAMA_MODELS: "/root/.ollama"  # model storage path (see volumes section below)
      OLLAMA_MAX_QUEUE: "100"         # maximum number of queued requests
      OLLAMA_NUM_PARALLEL: "1"        # maximum number of parallel requests
      OLLAMA_MAX_LOADED_MODELS: "1"   # maximum number of loaded models per GPU
      OLLAMA_LOAD_TIMEOUT: "5m"       # maximum time for loading models (default "5m")
      OLLAMA_KEEP_ALIVE: "5m"         # duration that models stay in memory (default "5m")
      OLLAMA_CONTEXT_LENGTH: "4096"   # maximum input context length
      OLLAMA_MULTIUSER_CACHE: "false" # optimize prompt caching for multi-user scenarios
      OLLAMA_NOPRUNE: "false"         # disables pruning of model blobs at startup
      OLLAMA_NOHISTORY: "true"        # disables readline history
      OLLAMA_FLASH_ATTENTION: "false" # enables the experimental flash attention feature
      OLLAMA_KV_CACHE_TYPE: "f16"     # cache quantization (f16, q8_0, or q4_0)
      OLLAMA_SCHED_SPREAD: "false"    # allows scheduling models across all GPUs.
      OLLAMA_NEW_ENGINE: "true"       # enables the new Ollama engine
      # OLLAMA_DEBUG: "true"            # shows additional debug information
      # OLLAMA_INTEL_GPU: "true"        # enables experimental Intel GPU detection
      ## NVIDIA GPU Hardware Acceleration (optional):
      # NVIDIA_VISIBLE_DEVICES: "all"
      # NVIDIA_DRIVER_CAPABILITIES: "compute,utility"
    volumes:
      - "./ollama:/root/.ollama"
    ## NVIDIA GPU Hardware Acceleration (optional):
    # deploy:
    #  resources:
    #    reservations:
    #      devices:
    #        - driver: "nvidia"
    #          capabilities: [ gpu ]
    #          count: "all"

Note that the NVIDIA Container Toolkit must be installed for GPU hardware acceleration to work. Experienced users may also run Ollama on a separate, more powerful server.

Ollama does not enforce authentication by default. Only expose port 11434 inside trusted networks or behind a reverse proxy that adds access control.

Step 2: Download Models¶

Once the Ollama service is running (see Step 1), you can download any of the listed vision models that match your hardware capabilities and preferences, as you will need it for the next step. For example:

docker compose exec ollama ollama pull gemma3:latest

Learn more ›

Step 3: Configure Models¶

Now, create a new vision.yml file in your config path (default: storage/config) or edit the existing file in the storage/config folder of your PhotoPrism instance, following the example below. Its absolute path from inside the container is /photoprism/storage/config/vision.yml:

If PhotoPrism can’t read your config file, make sure the file exists at the config path configured for your instance. Older installations may use storage/settings.

Run docker compose exec photoprism photoprism show config | grep config-path to find out what's your configured config path.

vision.yml

Models:
- Type: labels
  Model: gemma3:latest
  Engine: ollama
  Run: auto
  Service:
    Uri: http://ollama:11434/api/generate
- Type: caption
  Model: gemma3:latest
  Engine: ollama
  Run: auto
  Service:
    Uri: http://ollama:11434/api/generate

Learn more ›

Scheduling Options¶

Run: auto (recommended) automatically runs the model after indexing is complete to prevent slowdowns during indexing or importing. It also allows manual and scheduled invocations.
Run: manual disables automatic execution, allowing you to run the model manually via photoprism vision run -m caption or photoprism vision run -m labels.

Learn more ›

Configuration Tips¶

PhotoPrism evaluates models from the bottom of the list up, so placing the Ollama entries after the others ensures Ollama is chosen first while the others remain available as fallback options.

Ollama-generated captions and labels are stored with the ollama metadata source automatically, so you do not need to request a specific source field in the schema or pass --source to the CLI unless you want to override the default.

Prompt Localization

To generate output in other languages, keep the base instructions in English and add the desired language (e.g., "Respond in German"). This method works for both caption and label prompts.

Step 4: Restart PhotoPrism¶

Run the following commands to restart photoprism and apply the new settings:

docker compose stop photoprism
docker compose up -d

You should now be able to use the photoprism vision CLI commands when opening a terminal, e.g. photoprism vision run -m caption to generate captions, or photoprism vision run -m labels to generate labels.

Learn more ›

Troubleshooting¶

Verifying Your Configuration¶

If you encounter issues, a good first step is to verify how PhotoPrism has loaded your vision.yml configuration. You can do this by running:

docker compose exec photoprism photoprism vision ls

This command outputs the settings for all supported and configured model types. Compare the results with your vision.yml file to confirm that your configuration has been loaded correctly and to identify any parsing errors or misconfigurations.

Performing Test Runs¶

The following terminal commands will perform a single run for the specified model type:

photoprism vision run -m labels --count 1 --force
photoprism vision run -m caption --count 1 --force

If you don't get the expected results or notice any errors, you can re-run the commands with trace log mode enabled to inspect the request and response:

photoprism --log-level=trace vision run -m labels --count 1 --force
photoprism --log-level=trace vision run -m caption --count 1 --force

Incomplete Captions with Thinking Models¶

If you use a reasoning or "thinking" model and notice incomplete or truncated captions, the model may be spending most of its output token budget on internal reasoning, leaving too few tokens for the actual caption.

To fix this, either disable reasoning for that model with Service.Think: "false", switch to a non-thinking model, or increase the NumPredict value in your vision.yml options to give the model more room:

Models:
- Type: caption
  Model: qwen3-vl:235b-instruct
  Engine: ollama
  Service:
    Think: "false"

If you still need reasoning enabled, increase the output budget for the final caption:

Options:
  NumPredict: 4096

GPU Performance Issues¶

When using Ollama with GPU acceleration, you may experience performance degradation over time due to VRAM management issues. This typically manifests as processing times gradually increasing and the Ollama service appearing to "crash" while still responding to requests, but without GPU acceleration.

The issue occurs because Ollama's VRAM allocation doesn't properly recover after processing multiple requests, leading to memory fragmentation and eventual GPU processing failures.

The Ollama service does not automatically recover from these VRAM issues. To restore full GPU acceleration, manually restart the Ollama container:

docker compose down ollama
docker compose up -d ollama

This should clear the VRAM and restore normal GPU-accelerated processing performance.

Unrelated configuration details have been omitted for brevity. ↩