NPU Model Conversion
This section covers converting your own Stable Diffusion checkpoints into NPU-compatible assets that Local Dream can load on supported Snapdragon devices.
When You Need This
- ✅ You want to run a custom SD1.5 or SDXL checkpoint on the NPU path.
- ❌ You want to run a custom SD1.5 checkpoint on the CPU/GPU path — this is supported directly in the app, no host-side conversion required.
Available Workflows
| Workflow | Status | Guide |
|---|---|---|
| SD1.5 → NPU | Stable | SD1.5 Conversion Guide |
| SDXL → NPU | Experimental | SDXL Conversion Guide |
What to Expect
- Conversion is host-side, not on-device. You will need a Linux or WSL machine.
- The pipeline produces W8A16-quantized QNN binaries packaged into a zip that the app imports.
- For SD1.5 you build one zip per chip tier (
_min/_8gen1/_8gen2). For SDXL there is only one chip tier (_8gen3). - A single SD1.5 conversion run takes several hours of CPU time. SDXL takes substantially longer.
Why two QNN SDK versions?
The conversion scripts pin QNN SDK 2.28, but the Android app itself ships with QNN SDK 2.39 as its runtime. This is intentional: 2.28 is the version known to produce correct quantized binaries for the conversion pipelines in this guide, while the runtime stays current. You do not need 2.39 to convert models, and you should not mix versions inside a single conversion run.
Hardware Requirements
| Workflow | RAM + swap | Disk | GPU |
|---|---|---|---|
| SD1.5 @ 512×512 | ~20 GB | ~30 GB | optional |
| SD1.5 @ higher resolutions | 64 GB+ | 60 GB+ | optional |
| SDXL @ 1024×1024 | 64 GB+ | 60 GB+ | optional |
A CUDA-enabled GPU is optional — it only speeds up the data preparation phase. The actual quantization runs on CPU.
Skip the Conversion?
If you just want a model that works without the conversion overhead, check the pre-converted community collections first. Many popular SD1.5 and SDXL checkpoints are already available there.