Instant Character

InstantCharacter creates high-quality, consistent characters from text prompts, supporting diverse poses, styles, and appearances with strong identity control.

Features

InstantCharacter is a cutting-edge model designed for open-domain character personalization. It addresses the common limitations of traditional models in terms of generalization ability and image quality. By leveraging a scalable Diffusion Transformer architecture, combined with adaptable adapter modules and a massive character dataset, InstantCharacter delivers high-quality, highly customizable character imagery across diverse scenarios.

Key Features

Open-Domain Personalization：InstantCharacter excels in generating a wide variety of character appearances, poses, and styles, maintaining high fidelity while allowing extensive personalization — perfect for a wide range of creative and professional applications.
Scalable Adapter Modules：The model integrates stacked Transformer encoders as adapters that specialize in handling open-domain character features. These adapters interact seamlessly with the latent space of the modern Diffusion Transformer, significantly enhancing the diversity and consistency of generated results.
Large-Scale Character Dataset Training：To empower InstantCharacter’s capabilities, Tencent AI Lab built a massive character dataset containing tens of millions of samples, covering multi-view character images and paired text-image examples. This robust dataset enables superior identity consistency and strong text-based editability.

Use Cases

Game Development Rapidly create a wide variety of high-fidelity character assets, accelerating production pipelines.
Virtual Reality Generate immersive, personalized avatars to enhance user experiences.
Digital Content Creation Produce diverse character designs for animations, comics, and more, empowering creative expression.
Social Media Create personalized profile pictures and stickers that reflect individual styles and preferences.

Accelerated Inference

Our accelerated inference approach leverages advanced optimization technology from WavespeedAI. This innovative fusion technique significantly reduces computational overhead and latency, enabling rapid image generation without compromising quality. The entire system is designed to efficiently handle large-scale inference tasks while ensuring that real-time applications achieve an optimal balance between speed and accuracy. For further details, please refer to the blog post.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/instant-character" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "prompt": "A futuristic android girl with glowing blue eyes, wearing green suit, standing next to a holographic sign displaying \"I Love WaveSpeedAI\", neon-lit cyberpunk city background, sleek and high-tech vibe, Sci-fi movie style",
    "image": "https://d3gnftk2yhz9lr.wavespeed.ai/media/images/1745853771145444122_WW40XTQM.jpg",
    "size": "1024*1024",
    "negative_prompt": "",
    "guidance_scale": 3.5,
    "num_inference_steps": 28,
    "num_images": 1,
    "enable_safety_checker": true
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-	The prompt to generate an image from.
image	string	Yes		-	The image URL to generate an image from. Needs to match the dimensions of the mask.
size	string	No	1024*1024	256 ~ 1536 per dimension	The size of the generated image.
negative_prompt	string	No		-	The negative prompt to use. Use it to address details that you don't want in the image. This could be colors, objects, scenery and even the small details (e.g. moustache, blurry, low resolution).
seed	integer	No	-	-1 ~ 2147483647	The same seed and the same prompt given to the same version of the model will output the same image every time.
guidance_scale	number	No	3.5	0 ~ 20	The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt when looking for a related image to show you.
num_inference_steps	integer	No	28	1 ~ 50	The number of inference steps to perform.
num_images	integer	No	1	1 ~ 4	The number of images to generate.
enable_safety_checker	boolean	No	true	-	If set to true, the safety checker will be enabled.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Query Parameters

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Step1x Edit Dia TTS