Skip to content

Label Generation

PhotoPrism’s built-in image classification relies on TensorFlow models such as Nasnet. With the new Ollama integration, you can generate labels via multimodal LLMs.

Ollama Setup Guide

Follow the steps in our User Guide to connect PhotoPrism directly to an Ollama instance and replace (or augment) the default Nasnet classifier with labels generated by a vision-capable LLM.

Learn more ›

Configuration Tips

PhotoPrism evaluates models from the bottom of the list up, so placing the Ollama entries after the others ensures Ollama is chosen first while the others remain available as fallback options.

Ollama-generated captions and labels are stored with the ollama metadata source automatically, so you do not need to request a specific source field in the schema or pass --source to the CLI unless you want to override the default.

Prompt Localization

To generate output in other languages, keep the base instructions in English and add the desired language (e.g., "Respond in German"). This method works for both caption and label prompts.

NSFW Detection Through Labels

When an Ollama or OpenAI model is wired up for Type: labels, PhotoPrism can ask it to return NSFW classification alongside the regular label fields. The shortcut is implemented in internal/ai/vision/config.go:

vision.DetectNSFWLabels = c.DetectNSFW() && c.Experimental()

When DetectNSFWLabels is true, the engine builders in internal/ai/vision/engine_ollama.go and engine_openai.go swap their default user prompts for LabelPromptNSFW, and the JSON schema generators (SchemaLabels(includeNSFW=true)) add the nsfw and nsfw_confidence fields. When it is false, the prompt and schema describe only name, confidence, and topicality, so the LLM response cannot trigger NSFW flagging.

Downstream, the index pipeline (internal/photoprism/index_mediafile.go) and the vision worker (internal/workers/vision.go) both guard the labels-based NSFW promotion with conf.DetectNSFW():

if w.conf.DetectNSFW() && !m.PhotoPrivate {
    if labels.IsNSFW(vision.Config.Thresholds.GetNSFW()) {
        m.PhotoPrivate = true
    }
}

The dedicated ModelTypeNsfw entry (TensorFlow by default, overridable in vision.yml) is a separate inference pass that only runs when DetectNSFW is true and the caller includes nsfw in the active model list (--models labels,nsfw for the CLI; the scheduler picks it up from VisionModelShouldRun automatically).

The user-facing matrix and threshold details are in NSFW Detection.

Troubleshooting

Verify Active Configuration

docker compose exec photoprism photoprism vision ls

Ensure the output lists both Nasnet (default) and your Ollama label model. If the custom entry is missing, double-check the YAML indentation, file name (vision.yml, not .yaml), or environment overrides.

Schema or JSON Errors

If PhotoPrism logs vision: invalid label payload from ollama, the model returned data that didn’t match the expected structure. Confirm that:

  • The adapter injected schema instructions (keep System/Prompt intact or reuse the defaults).

PhotoPrism may fall back to the existing TensorFlow Nasnet model when the Ollama response cannot be parsed.

Latency & Timeouts

Structured responses introduce additional parsing overhead. If you encounter timeouts:

  • Increase the global service timeout (e.g., ServiceTimeout in advanced deployments) if needed.
  • Reduce image resolution (Resolution: 500) or use smaller models.
  • Keep Options.Temperature low to encourage deterministic output.

GPU Considerations

When Ollama uses GPUs, long-running sessions might degrade over time due to VRAM fragmentation. Restart the Ollama container to recover performance:

docker compose down ollama
docker compose up -d ollama