Ember is a hosted API/SDK that lets you shape AI model behavior by directly controlling a model’s internal units of computation, or “features”.
With Ember, you can modify features to precisely control model outputs, or use them as building blocks for tasks like classification.
import goodfireclient = goodfire.Client(api_key=GOODFIRE_API_KEY)# Instantiate a model variant. variant = goodfire.Variant("meta-llama/Llama-3.3-70B-Instruct")
Our sampling API is OpenAI compatible, making it easy to integrate.
Copy
for token in client.chat.completions.create( [{"role": "user", "content": "Hi, how are you?"}], model=variant, stream=True, max_completion_tokens=100,): print(token.choices[0].delta.content, end="")
Auto steering automatically finds and adjusts feature weights to achieve your desired behavior. Simply provide a short prompt describing what you want, and autosteering will:
Find the relevant features
Set appropriate feature weights
Return a FeatureEdits object that you can set directly
Copy
edits = client.features.AutoSteer( specification="be funny", # or your desired behavior model=variant,)variant.set(edits)print(edits)
Now that we have a few funny edits, let’s see how the model responds!
Copy
for token in client.chat.completions.create( [{"role": "user", "content": "Tell me about pirates"}], model=variant, stream=True, max_completion_tokens=120,): print(token.choices[0].delta.content, end="")
The model automatically added puns/jokes, even though we didn’t specify anything about comedy in our prompt.
When setting feature weights manually, start with 0.5 to enhance a feature and -0.3 to ablate a feature. When setting multiple features, you may need to tune down the weights.
Copy
variant.set(funny_features[0], 0.6)for token in client.chat.completions.create( [ {"role": "user", "content": "tell me about foxes"} ], model=variant, stream=True, max_completion_tokens=100, ): print(token.choices[0].delta.content, end="")
Feel free to play around with the weights and features to see how the model responds.
Get neighboring features by comparing them to either individual features or groups of features. When comparing to individual features, neighbors() looks at similarity in the embedding space. When comparing to groups, neighbors() finds features closest to the group’s centroid.
neighbors() helps you understand feature relationships beyond just their labels. It can reveal which features might work best for your intended model adjustments.
Contrastive search becomes more powerful when combined with reranking. First, contrastive search finds features that distinguish between your datasets. Then, reranking sorts these features using your description of the desired behavior.
This two-step process ensures you get features that are both:
Mechanistically useful (from contrastive search)
Aligned with your goals (from reranking)
Let’s specify two conversation datasets. The first has a typical helpful assistant response and the second assistant replies in jokes.
Copy
variant.reset()default_conversation = [ [ { "role": "user", "content": "Hello how are you?" }, { "role": "assistant", "content": "I am a helpful assistant. How can I help you?" } ]]joke_conversation = [ [ { "role": "user", "content": "Hello how are you?" }, { "role": "assistant", "content": "What do you call an alligator in a vest? An investigator!" } ]]helpful_assistant_features, joke_features = client.features.contrast( dataset_1=default_conversation, dataset_2=joke_conversation, model=variant, top_k=30)# Let's rerank to surface humor related featuresjoke_features = client.features.rerank( features=joke_features, query="funny", model=variant, top_k=5)joke_features
We now have a list of features to consider adding. Let’s set some plausible-looking ones from joke_features.
Note that we could also explore removing some of the helpful_assistant features.
Copy
variant.reset()variant.set(joke_features[0,1], 0.6)for token in client.chat.completions.create( [ {"role": "user", "content": "Hello. Tell me about whales."} ], model=variant, stream=True, max_completion_tokens=100,): print(token.choices[0].delta.content, end="")
You can establish relationships between different features (or feature groups) using conditional interventions.
First, let’s reset the variant and pick out the funny features.
Copy
variant.reset()funny_features
Now, let’s find a features where the model is talking like a pirate.
Copy
pirate_features = client.features.search( "talk like a pirate", model=variant, top_k=3)print(pirate_features)
Now, let’s set up behaviour so that when the model is talking like a pirate, it will be funny.
Copy
variant.set_when(pirate_features[1] > 0.75, { funny_features[0]: 0.7,})# The model will now try to be funny when talking about piratesresponse = client.chat.completions.create( messages=[{"role": "user", "content": "talk like a pirate and tell me about whales"}], model=variant)print(response.choices[0].message["content"])
Say we decide the model isn’t very good at pirate jokes. Let’s set up behavior to stop generation altogether if the pirate features are too strong.
Abort when Pirate Features are too strong
Copy
# Abort if pirate features are too strongvariant.abort_when(pirate_features > 0.75)try: response = client.chat.completions.create( messages=[{"role": "user", "content": "Tell me about pirates."}], model=variant )except goodfire.exceptions.InferenceAbortedException: print("Generation aborted due to too much pirate content")
If you aren’t sure of the features you want to condition on, use AutoConditional with a specified prompt to get back an automatically generated condition.
Auto Conditional Code
Copy
# Generate auto conditional based on a description. This will automatically# choose the relevant features and conditional weightconditional = client.features.AutoConditional( "pirates", model="meta-llama/Llama-3.3-70B-Instruct",)# Apply feature edits when condition is metvariant.set_when(conditional, { joke_features[0]: 0.5, joke_features[1]: 0.5})
You can inspect what features are activating in a given conversation with the inspect API, which returns a context object.
Say you want to understand what model features are important when the model tells a joke. You can pass in the same joke conversation dataset to the inspect endpoint.
From the context object, you can access a lookup object which can be used to look at the set of feature labels in the context.
Copy
lookup = context.lookup()lookup
You can select the top k activating features in the context, ranked by activation strength. There are features related to jokes and tongue twisters, among other syntactical features.
Copy
top_features = context.top(k=10)top_features
You can also inspect individual tokens level feature activations. Let’s see what features are active at the punchline token.
Get feature activation vectors for machine learning tasks
To run a machine learning pipeline at the feature level (for instance, for humor detection) you can directly export features using client.features.activations to get a matrix or retrieve a sparse vector for a specific FeatureGroup.
There may be specific features whose activation patterns you’re interested in exploring. In this case, you can specify features such as humor_features and pass that into the features argument of inspect.
Copy
humor_features = client.features.search("jokes and humor", model=variant, top_k=15)humor_features
Now, let’s see if these features are activating in the joke conversation.
Now, let’s generate a response with the loaded variant.
Copy
for token in client.chat.completions.create( [ {"role": "user", "content": "tell me about whales"} ], model=loaded_variant, stream=True, max_completion_tokens=150,): print(token.choices[0].delta.content, end="")