Auto Steering allows you to automatically generate feature interventions based on natural language descriptions of desired behaviors. This provides an easy way to steer model outputs without manually selecting features.
Basic Usage
The simplest way to use Auto Steering is with the AutoSteer
method:
# Create automatic feature edits for desired behavior
edits = client.features.AutoSteer(
specification="be funny", # Natural language description
model=variant, # Model variant to use
)
# Apply the edits to your variant
variant.set(edits)
# The model will now attempt to be funnier
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Tell me a story"}],
model=variant
)
print(response.choices[0].message["content"])
How It Works
Under the hood, Auto Steer:
- Generates contrastive examples of content with and without the desired behavior
- Identifies the most relevant features that distinguish the behavior
- Determines optimal feature values to encourage the desired behavior
- Creates a set of feature edits that can be applied to the model
Advanced Usage
Combining with Manual Edits
Auto Steer edits can be combined with manual feature interventions:
# Generate automatic edits
auto_edits = client.features.AutoSteer(
specification="be professional",
model=variant
)
# Combine with manual feature edits
variant.set({
**auto_edits,
manual_feature: 0.8
})
Using with Conditionals
Auto Steer can be used within conditional statements:
# Generate edits for desired behavior
funny_edits = client.features.AutoSteer(
specification="be funny",
model=variant
)
# Apply edits conditionally
variant.set_when(context_feature > 0.5, funny_edits)
Best Practices
- Use clear, specific behavior descriptions
- Test generated edits to ensure desired results
- Consider combining multiple Auto Steer edits for complex behaviors
- Adjust number of features based on steering precision needed
- Use with conditionals for context-aware behavior
API Reference
AutoSteer
Generate automatic feature edits based on natural language description.
Parameters:
Natural language description of desired behavior
model
Union[str, Variant]
required
Model or variant to use for generating edits
Returns:
Dictionary mapping features to their target values
Example:
edits = client.features.AutoSteer(
specification="be more creative",
model=variant
)