Skip to content

Caption Generation

As an addition to its built-in AI capabilities, PhotoPrism lets you generate image captions through a direct Ollama integration, as described in this guide.1

It allows you to choose from the available vision models and customize the prompts according to your needs.

The Ollama integration is under active development, so the configuration, commands, and other details may change or break unexpectedly. Please keep this in mind and notify us when something doesn't work as expected. Thank you for your help in keeping this documentation updated!

Ollama Setup Guide

Follow the steps below to connect PhotoPrism directly to an Ollama instance and generate captions with vision-capable LLMs.

Step 1: Install Ollama

To run Ollama on the same server as PhotoPrism, add the ollama service to the services section of your compose.yaml (or docker-compose.yml) file, as shown in the example below.2

Alternatively, most of the compose.yaml configuration examples on our download server already have Ollama preconfigured, so you can start it with the following command (remove profiles: ["ollama"] from the ollama service to start it by default, without using --profile ollama):

docker compose --profile ollama up -d

compose.yaml

services:
  photoprism:
    ## The ":preview" build gives early access to new features:
    image: photoprism/photoprism:preview
    ...

  ## Ollama Large-Language Model Runner (optional)
  ## Run "ollama pull [name]:[version]" to download a vision model
  ## listed at <https://ollama.com/search?c=vision>, for example:
  ## docker compose exec ollama ollama pull gemma3:latest
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    stop_grace_period: 15s
    ## Insecurely exposes the Ollama service on port 11434
    ## without authentication (for private networks only):
    # ports:
    #  - "11434:11434"
    environment:
      ## Ollama Configuration Options:
      OLLAMA_HOST: "0.0.0.0:11434"
      OLLAMA_MODELS: "/root/.ollama"  # model storage path (see volumes section below)
      OLLAMA_MAX_QUEUE: "100"         # maximum number of queued requests
      OLLAMA_NUM_PARALLEL: "1"        # maximum number of parallel requests
      OLLAMA_MAX_LOADED_MODELS: "1"   # maximum number of loaded models per GPU
      OLLAMA_LOAD_TIMEOUT: "5m"       # maximum time for loading models (default "5m")
      OLLAMA_KEEP_ALIVE: "5m"         # duration that models stay in memory (default "5m")
      OLLAMA_CONTEXT_LENGTH: "4096"   # maximum input context length
      OLLAMA_MULTIUSER_CACHE: "false" # optimize prompt caching for multi-user scenarios
      OLLAMA_NOPRUNE: "false"         # disables pruning of model blobs at startup
      OLLAMA_NOHISTORY: "true"        # disables readline history
      OLLAMA_FLASH_ATTENTION: "false" # enables the experimental flash attention feature
      OLLAMA_KV_CACHE_TYPE: "f16"     # cache quantization (f16, q8_0, or q4_0)
      OLLAMA_SCHED_SPREAD: "false"    # allows scheduling models across all GPUs.
      OLLAMA_NEW_ENGINE: "true"       # enables the new Ollama engine
      # OLLAMA_DEBUG: "true"            # shows additional debug information
      # OLLAMA_INTEL_GPU: "true"        # enables experimental Intel GPU detection
      ## NVIDIA GPU Hardware Acceleration (optional):
      # NVIDIA_VISIBLE_DEVICES: "all"
      # NVIDIA_DRIVER_CAPABILITIES: "compute,utility"
    volumes:
      - "./ollama:/root/.ollama"
    ## NVIDIA GPU Hardware Acceleration (optional):
    # deploy:
    #  resources:
    #    reservations:
    #      devices:
    #        - driver: "nvidia"
    #          capabilities: [ gpu ]
    #          count: "all"

Note that the NVIDIA Container Toolkit must be installed for GPU hardware acceleration to work. Experienced users may also run Ollama on a separate, more powerful server.

Ollama does not enforce authentication by default. Only expose port 11434 inside trusted networks or behind a reverse proxy that adds access control.

Step 2: Download Models

Once the Ollama service is running (see Step 1), you can download any of the listed vision models that match your hardware capabilities and preferences, as you will need it for the next step. For example:

docker compose exec ollama ollama pull gemma3:latest

View Model Comparison ›

Step 3: Configure PhotoPrism

Now, create a new config/vision.yml file or edit the existing file in the storage folder of your PhotoPrism instance, following the example below. Its absolute path from inside the container is /photoprism/storage/config/vision.yml:

vision.yml

Models:
- Type: labels
  Default: true
- Type: nsfw
  Default: true
- Type: face
  Default: true
- Type: caption
  Name: gemma3:latest
  Engine: ollama
  Run: newly-indexed
  Prompt: Create a caption with exactly one sentence in the active voice that describes
    the main visual content. Begin with the main subject and clear action. Avoid text
    formatting, meta-language, and filler words.
  Service:
    # Ollama API endpoint (adjust as needed):
    Uri: http://ollama:11434/api/generate
Thresholds:
  Confidence: 10

The config file must be named vision.yml, not vision.yaml, as otherwise it won't be found and will have no effect.

Model Defaults

When using a custom vision.yml config file, you can apply the default settings to one or more model types by setting the Default flag to true, as shown in the following example:

vision.yml

Models:
- Type: labels
  Default: true
- Type: nsfw
  Default: true
- Type: face
  Default: true
- Type: caption
  Default: true
Thresholds:
  Confidence: 10

This simplifies your configuration, allowing you to customize only specific model types.

Scheduling Options

  • Run: newly-indexed (recommended): Runs after indexing completes via the metadata worker, avoiding slowdowns during import while still processing new files automatically. Also supports manual invocations.
  • Run: manual disables automatic execution so you can invoke the model explicitly via photoprism vision run -m caption

Step 4: Restart PhotoPrism

Run the following commands to restart photoprism and apply the new settings:

docker compose stop photoprism
docker compose up -d

You should now be able to use the photoprism vision CLI commands when opening a terminal, e.g. photoprism vision run -m caption to generate captions.

Troubleshooting

Verifying Your Configuration

If you encounter issues, a good first step is to verify how PhotoPrism has loaded your vision.yml configuration. You can do this by running:

docker compose exec photoprism photoprism vision ls

This command outputs the settings for all supported and configured model types. Compare the results with your vision.yml file to confirm that your configuration has been loaded correctly and to identify any parsing errors or misconfigurations.

GPU Performance Issues

When using Ollama with GPU acceleration, you may experience performance degradation over time due to VRAM management issues. This typically manifests as processing times gradually increasing and the Ollama service appearing to "crash" while still responding to requests, but without GPU acceleration.

The issue occurs because Ollama's VRAM allocation doesn't properly recover after processing multiple requests, leading to memory fragmentation and eventual GPU processing failures.

The Ollama service does not automatically recover from these VRAM issues. To restore full GPU acceleration, manually restart the Ollama container:

docker compose down ollama
docker compose up -d ollama

This will clear the VRAM and restore normal GPU-accelerated processing performance.


  1. Available to all users with the next stable version, see our release notes for details. 

  2. Unrelated configuration details have been omitted for brevity.