Label Generation¶
PhotoPrism’s built-in image classification relies on TensorFlow models such as Nasnet. With the new Ollama integration, you can generate labels via multimodal LLMs.
Ollama Setup Guide¶
Follow the steps in our User Guide to connect PhotoPrism directly to an Ollama instance and replace (or augment) the default Nasnet classifier with labels generated by a vision-capable LLM.
Configuration Tips¶
PhotoPrism evaluates models from the bottom of the list up, so placing the Ollama entries after the others ensures Ollama is chosen first while the others remain available as fallback options.
Ollama-generated captions and labels are stored with the ollama metadata source automatically, so you do not need to request a specific source field in the schema or pass --source to the CLI unless you want to override the default.
Prompt Localization
To generate output in other languages, keep the base instructions in English and add the desired language (e.g., "Respond in German"). This method works for both caption and label prompts.
NSFW Detection Through Labels¶
When an Ollama or OpenAI model is wired up for Type: labels, PhotoPrism can ask it to return NSFW classification alongside the regular label fields. The shortcut is implemented in internal/ai/vision/config.go:
vision.DetectNSFWLabels = c.DetectNSFW() && c.Experimental()
When DetectNSFWLabels is true, the engine builders in internal/ai/vision/engine_ollama.go and engine_openai.go swap their default user prompts for LabelPromptNSFW, and the JSON schema generators (SchemaLabels(includeNSFW=true)) add the nsfw and nsfw_confidence fields. When it is false, the prompt and schema describe only name, confidence, and topicality, so the LLM response cannot trigger NSFW flagging.
Downstream, the index pipeline (internal/photoprism/index_mediafile.go) and the vision worker (internal/workers/vision.go) both guard the labels-based NSFW promotion with conf.DetectNSFW():
if w.conf.DetectNSFW() && !m.PhotoPrivate {
if labels.IsNSFW(vision.Config.Thresholds.GetNSFW()) {
m.PhotoPrivate = true
}
}
The dedicated ModelTypeNsfw entry (TensorFlow by default, overridable in vision.yml) is a separate inference pass that only runs when DetectNSFW is true and the caller includes nsfw in the active model list (--models labels,nsfw for the CLI; the scheduler picks it up from VisionModelShouldRun automatically).
The user-facing matrix and threshold details are in NSFW Detection.
Troubleshooting¶
Verify Active Configuration¶
docker compose exec photoprism photoprism vision ls
Ensure the output lists both Nasnet (default) and your Ollama label model. If the custom entry is missing, double-check the YAML indentation, file name (vision.yml, not .yaml), or environment overrides.
Schema or JSON Errors¶
If PhotoPrism logs vision: invalid label payload from ollama, the model returned data that didn’t match the expected structure. Confirm that:
- The adapter injected schema instructions (keep
System/Promptintact or reuse the defaults).
PhotoPrism may fall back to the existing TensorFlow Nasnet model when the Ollama response cannot be parsed.
Latency & Timeouts¶
Structured responses introduce additional parsing overhead. If you encounter timeouts:
- Increase the global service timeout (e.g.,
ServiceTimeoutin advanced deployments) if needed. - Reduce image resolution (
Resolution: 500) or use smaller models. - Keep
Options.Temperaturelow to encourage deterministic output.
GPU Considerations¶
When Ollama uses GPUs, long-running sessions might degrade over time due to VRAM fragmentation. Restart the Ollama container to recover performance:
docker compose down ollama
docker compose up -d ollama