Classes
Auto Steer
Auto Steering allows you to automatically generate feature interventions based on natural language descriptions of desired behaviors. This provides an easy way to steer model outputs without manually selecting features.
Basic Usage
The simplest way to use Auto Steering is with the AutoSteer
method:
How It Works
Under the hood, Auto Steer:
- Generates contrastive examples of content with and without the desired behavior
- Identifies the most relevant features that distinguish the behavior
- Determines optimal feature values to encourage the desired behavior
- Creates a set of feature edits that can be applied to the model
Advanced Usage
Combining with Manual Edits
Auto Steer edits can be combined with manual feature interventions:
Using with Conditionals
Auto Steer can be used within conditional statements:
Best Practices
- Use clear, specific behavior descriptions
- Test generated edits to ensure desired results
- Consider combining multiple Auto Steer edits for complex behaviors
- Adjust number of features based on steering precision needed
- Use with conditionals for context-aware behavior
API Reference
AutoSteer
Generate automatic feature edits based on natural language description.
Parameters:
Natural language description of desired behavior
Model or variant to use for generating edits
Returns:
Dictionary mapping features to their target values
Example:
Auto Steer