Intro

ControlNet allows users to control the generation process of stable diffusion images with spatial contexts. One of the most common examples is using a “pose” - a stick figure drawing to control the position of a person in the output. But there are many other use cases.

You must have the ControlNet extension installed in Automatic1111.

Encode Input in Base64

Since we need to send an image to our server, we need to encode the image in base64. This example uses javascript:

const buf = fs.readFileSync('my-file.png');
const myBase64EncodedImage = Buffer.from(buf).toString('base64')

API Request

POST /sdapi/v1/txt2img

To use ControlNet, we use the same txt2img endpoint, but we pass in extra params:

{
  "prompt": "Sunset in the mountains, lake in front",
  // other txt2img params
  "alwayson_scripts": {
    "controlnet": {
      "args": [
        {
          "input_image": myBase64EncodedImage,
          "model": "depth_fp16",
          "module": "depth"
        }
      ]
    }
  }
}

The input_image param should be encoded base64 string of your input image.

The module param is the name of the preprocessor to use.

The model param should be set to the name of a ControlNet model you have installed. For example, if your model name is called depth_fp16.safetensors, the value should be depth_fp16.

Response

See the txt2img guide for response examples

Full List of Params

KeyDescriptionTypeDefault
modelName of the installed modelstringnone
moduleName of the control type to use. One ofstringnone
input_imageBase64 encoded input imagestringnone
resize_modeIf the input image does not match the resolution of the output image, how should it handle it. Possible values:
  • Envelope (Outer Fit)
  • Scale to Fit (Inner Fit)
  • Just Resize
stringJust Resize
weightThe weight of the control imagenumber1
control_modeHow much impact the control image should have on the generated image Possible values:
  • Balanced
  • My prompt is more important
  • ControlNet is more important
stringBalanced
pixel_perfectEnables the Pixel Perfect preprocessorboolean
maskBase64 encoded mask imagestringnone
lowvramWhether to compensate for low VRAMboolean
processor_resThe resolution to use for the preprocessornumber64
threshold_aThe first preprocessor parameter, if applicablenumber64
threshold_bThe second preprocessor parameter, if applicablenumber64
guidance_startRatio of generation where ControlNet starts to have an effectnumber0
guidance_endRatio of generation where ControlNet stops having an effectnumber1