Stable Diffusion: AI Image Generation Guide
Master Stable Diffusion for AI image generation. Learn prompting, ControlNet, LoRA fine-tuning, and build image generation applications.
Moshiour Rahman
Advertisement
What is Stable Diffusion?
Stable Diffusion is an open-source AI model that generates images from text descriptions. Unlike DALL-E, it can run locally, enabling unlimited, private image generation.
Key Features
| Feature | Description |
|---|---|
| Text-to-Image | Generate images from prompts |
| Image-to-Image | Transform existing images |
| Inpainting | Edit specific areas |
| ControlNet | Precise control over output |
| Fine-tuning | Custom models with LoRA |
Getting Started
Installation
pip install diffusers transformers accelerate torch
pip install safetensors xformers # Optional optimizations
Basic Generation
from diffusers import StableDiffusionPipeline
import torch
# Load model
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Enable memory optimizations
pipe.enable_attention_slicing()
# Generate image
prompt = "A serene mountain landscape at sunset, photorealistic"
image = pipe(prompt).images[0]
image.save("mountain.png")
Stable Diffusion XL
from diffusers import StableDiffusionXLPipeline
# Load SDXL
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
pipe = pipe.to("cuda")
# Generate with SDXL
prompt = "An astronaut riding a horse on Mars, digital art, highly detailed"
negative_prompt = "blurry, low quality, distorted"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=30,
guidance_scale=7.5,
width=1024,
height=1024
).images[0]
image.save("astronaut.png")
Prompt Engineering
Effective Prompts
# Structure: Subject + Details + Style + Quality modifiers
# Basic prompt
prompt = "a cat"
# Better prompt
prompt = """
A fluffy orange tabby cat sitting on a windowsill,
soft natural lighting, bokeh background,
professional photography, 8k resolution,
highly detailed fur texture
"""
# Negative prompts (what to avoid)
negative_prompt = """
blurry, low quality, distorted, deformed,
bad anatomy, extra limbs, watermark, text,
oversaturated, cartoon, anime
"""
Style Keywords
styles = {
"photorealistic": [
"photorealistic", "hyperrealistic", "8k", "RAW photo",
"professional photography", "DSLR", "sharp focus"
],
"digital_art": [
"digital art", "concept art", "artstation",
"trending on artstation", "illustration"
],
"anime": [
"anime style", "manga", "studio ghibli",
"makoto shinkai", "cel shaded"
],
"oil_painting": [
"oil painting", "classical art", "renaissance",
"rembrandt lighting", "canvas texture"
],
"3d_render": [
"3d render", "octane render", "unreal engine",
"cinema 4d", "blender", "ray tracing"
]
}
def build_prompt(subject: str, style: str) -> str:
style_words = ", ".join(styles.get(style, []))
return f"{subject}, {style_words}, masterpiece, best quality"
Advanced Generation
Image-to-Image
from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Load and prepare image
init_image = Image.open("input.jpg").convert("RGB")
init_image = init_image.resize((512, 512))
# Transform image
prompt = "A painting in the style of Van Gogh"
image = pipe(
prompt=prompt,
image=init_image,
strength=0.75, # 0.0 = no change, 1.0 = complete change
guidance_scale=7.5
).images[0]
image.save("vangogh_style.png")
Inpainting
from diffusers import StableDiffusionInpaintPipeline
pipe = StableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Load images
image = Image.open("photo.jpg").convert("RGB")
mask = Image.open("mask.png").convert("RGB") # White = edit, Black = keep
# Inpaint
prompt = "A golden retriever dog"
result = pipe(
prompt=prompt,
image=image,
mask_image=mask,
num_inference_steps=50
).images[0]
result.save("inpainted.png")
ControlNet
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
import cv2
import numpy as np
# Load ControlNet for edge detection
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-canny",
torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Prepare control image
image = Image.open("reference.jpg")
canny = CannyDetector()
control_image = canny(image)
# Generate with control
prompt = "A futuristic cityscape, cyberpunk style"
result = pipe(
prompt=prompt,
image=control_image,
num_inference_steps=30
).images[0]
result.save("controlled_output.png")
LoRA Fine-Tuning
Using LoRA Models
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
# Load LoRA weights
pipe.load_lora_weights("path/to/lora/weights.safetensors")
# Generate with LoRA
image = pipe(
"A portrait in custom style",
num_inference_steps=30
).images[0]
# Unload LoRA
pipe.unload_lora_weights()
Training LoRA
# Using Kohya trainer (recommended approach)
# Create training config
training_config = """
pretrained_model_name_or_path: runwayml/stable-diffusion-v1-5
train_data_dir: ./training_images
output_dir: ./lora_output
resolution: 512
train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 1e-4
max_train_steps: 1000
save_every_n_steps: 200
network_dim: 32
network_alpha: 16
"""
# Training images should be in format:
# training_images/
# ├── image1.png
# ├── image1.txt (caption)
# ├── image2.png
# └── image2.txt
Batch Generation
import asyncio
from typing import List
class BatchGenerator:
def __init__(self, model_id: str = "runwayml/stable-diffusion-v1-5"):
self.pipe = StableDiffusionPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16
).to("cuda")
self.pipe.enable_attention_slicing()
def generate_batch(
self,
prompts: List[str],
negative_prompt: str = "",
num_images_per_prompt: int = 1
) -> List[Image.Image]:
images = []
for prompt in prompts:
result = self.pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_images_per_prompt=num_images_per_prompt,
num_inference_steps=30
)
images.extend(result.images)
return images
def generate_variations(
self,
prompt: str,
num_variations: int = 4,
seed_start: int = 0
) -> List[Image.Image]:
images = []
for i in range(num_variations):
generator = torch.Generator("cuda").manual_seed(seed_start + i)
result = self.pipe(
prompt=prompt,
generator=generator,
num_inference_steps=30
)
images.append(result.images[0])
return images
# Usage
generator = BatchGenerator()
prompts = [
"A sunset over the ocean",
"A forest in autumn",
"A snowy mountain peak"
]
images = generator.generate_batch(prompts)
for i, img in enumerate(images):
img.save(f"batch_{i}.png")
FastAPI Service
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from PIL import Image
import io
import base64
app = FastAPI()
# Load model on startup
@app.on_event("startup")
async def load_model():
global pipe
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
pipe.enable_attention_slicing()
class GenerationRequest(BaseModel):
prompt: str
negative_prompt: str = ""
width: int = 512
height: int = 512
steps: int = 30
guidance_scale: float = 7.5
seed: int = -1
@app.post("/generate")
async def generate_image(request: GenerationRequest):
try:
generator = None
if request.seed >= 0:
generator = torch.Generator("cuda").manual_seed(request.seed)
image = pipe(
prompt=request.prompt,
negative_prompt=request.negative_prompt,
width=request.width,
height=request.height,
num_inference_steps=request.steps,
guidance_scale=request.guidance_scale,
generator=generator
).images[0]
# Convert to base64
buffer = io.BytesIO()
image.save(buffer, format="PNG")
img_str = base64.b64encode(buffer.getvalue()).decode()
return {"image": img_str}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Memory Optimization
# For limited VRAM
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
# Enable all optimizations
pipe.enable_attention_slicing(1)
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload() # Moves to CPU when not used
# Or use sequential CPU offload (slowest but lowest VRAM)
pipe.enable_sequential_cpu_offload()
# Use xformers for faster attention
pipe.enable_xformers_memory_efficient_attention()
Summary
| Feature | Method |
|---|---|
| Text-to-Image | StableDiffusionPipeline |
| Image-to-Image | StableDiffusionImg2ImgPipeline |
| Inpainting | StableDiffusionInpaintPipeline |
| ControlNet | StableDiffusionControlNetPipeline |
| SDXL | StableDiffusionXLPipeline |
Stable Diffusion enables powerful, customizable AI image generation locally or at scale.
Advertisement
Moshiour Rahman
Software Architect & AI Engineer
Enterprise software architect with deep expertise in financial systems, distributed architecture, and AI-powered applications. Building large-scale systems at Fortune 500 companies. Specializing in LLM orchestration, multi-agent systems, and cloud-native solutions. I share battle-tested patterns from real enterprise projects.
Related Articles
AI Agents Fundamentals: Build Your First Agent from Scratch
Master AI agents from the ground up. Learn the agent loop, build a working agent in pure Python, and understand the foundations that power LangGraph and CrewAI.
PythonFine-Tuning LLMs: Complete Guide to Custom AI Models
Learn to fine-tune large language models for your use case. Master LoRA, QLoRA, dataset preparation, and deploy custom models with OpenAI and Hugging Face.
PythonRAG Applications: Build AI with Your Own Data
Master Retrieval-Augmented Generation for LLM applications. Learn document processing, embeddings, vector search, and build production RAG systems.
Comments
Comments are powered by GitHub Discussions.
Configure Giscus at giscus.app to enable comments.