Documentation for working with conditional feature interventions
Conditionals allow you to define dynamic feature interventions that are applied based on the activation patterns of other features during model inference. This enables creating more sophisticated steering behaviors that respond to the content being generated.
Before using the Conditionals API, you’ll need to find the features you want to intervene on, and a model variant
Apply pirate-themed features only when whale-related content is detected:
variant.reset()# Find relevant featureswhale_feature = client.features.search( "whales", model=variant, top_k=1)pirate_features = client.features.search( "talk like a pirate", model=variant, top_k=5)# Set up conditional interventionvariant.set_when(whale_feature > 0.75, { pirate_features[0]: 0.4})# The model will now talk like a pirate when discussing whalesresponse = client.chat.completions.create( messages=[{"role": "user", "content": "Tell me about whales."}], model=variant)print(response.choices[0].message["content"])
# Abort if whale features are too strongvariant.abort_when(whale_feature > 0.75)try: response = client.chat.completions.create( messages=[{"role": "user", "content": "Tell me about whales."}], model=variant )except goodfire.exceptions.InferenceAbortedException: print("Generation aborted due to whale content")
Use natural language to automatically generate conditional statements:
Auto-Generated Conditionals example
#create a variantvariant = goodfire.Variant("meta-llama/Llama-3.3-70B-Instruct")# Generate conditional based on description - this will create conditions for both whales and penguins being presentconditional= client.features.AutoConditional( "when the model talks about whales and penguins", model=variant)# Get pirate featurepirate_feature = client.features.search( "talk like a pirate", model=variant, top_k=1)# Make the model talk like a pirate when it talks about both whales and penguinsvariant.set_when(conditional, { pirate_feature[0]: 0.9})response = client.chat.completions.create( messages=[{"role": "user", "content": "Tell me about whales and penguins!"}], model=variant)print(response.choices[0].message["content"])
You can create conditionals by comparing features or feature groups with numeric values or other features using standard comparison operators. This creates a Conditional object that can be used in steering behaviors.
# Compare feature to numeric valuecondition = feature > 0.75# Compare feature group to numeric valuecondition = feature_group >= 0.5# Compare features to each othercondition = feature1 < feature2
# Abort if whale features are too strongvariant.abort_when(whale_feature > 0.75)try: response = client.chat.completions.create( messages=[{"role": "user", "content": "Tell me about whales."}], model=variant )except goodfire.exceptions.InferenceAbortedException: print("Generation aborted due to whale content")
# Generate conditional based on description - this will create conditions for both whales and penguins being presentconditional= client.features.AutoConditional( "when the model talks about whales and penguins", model=variant)# Get pirate featurepirate_feature = client.features.search( "talk like a pirate", model=variant, top_k=1)# Make the model talk like a pirate when it talks about both whales and penguinsvariant.set_when(conditional, { pirate_feature[0]: 0.9})