Auto Steering allows you to automatically generate feature interventions based on natural language descriptions of desired behaviors. This provides an easy way to steer model outputs without manually selecting features.

Basic Usage

The simplest way to use Auto Steering is with the AutoSteer method:

How It Works

Under the hood, Auto Steer:

  1. Generates contrastive examples of content with and without the desired behavior
  2. Identifies the most relevant features that distinguish the behavior
  3. Determines optimal feature values to encourage the desired behavior
  4. Creates a set of feature edits that can be applied to the model

Advanced Usage

Combining with Manual Edits

Auto Steer edits can be combined with manual feature interventions:

# Generate automatic edits
auto_edits = client.features.AutoSteer(
    specification="be professional",
    model=variant
)

# Combine with manual feature edits
variant.set({
    **auto_edits,
    manual_feature: 0.8
})

Using with Conditionals

Auto Steer can be used within conditional statements:

# Generate edits for desired behavior
funny_edits = client.features.AutoSteer(
    specification="be funny",
    model=variant
)

# Apply edits conditionally
variant.set_when(context_feature > 0.5, funny_edits)

Best Practices

  • Use clear, specific behavior descriptions
  • Test generated edits to ensure desired results
  • Consider combining multiple Auto Steer edits for complex behaviors
  • Adjust number of features based on steering precision needed
  • Use with conditionals for context-aware behavior

API Reference

AutoSteer

Generate automatic feature edits based on natural language description.

Parameters:

specification
str
required

Natural language description of desired behavior

model
Union[str, Variant]
required

Model or variant to use for generating edits

Returns:

edits
dict[Feature, float]

Dictionary mapping features to their target values

Example:

Auto Steer
edits = client.features.AutoSteer(
    specification="be more creative",
    model=variant
)