Prerequisite: You’ll need a Goodfire API key to follow this guide. Get one through our platform or contact support.

Quickstart

Ember is a hosted API/SDK that lets you shape AI model behavior by directly controlling a model’s internal units of computation, or “features”. With Ember, you can modify features to precisely control model outputs, or use them as building blocks for tasks like classification.

In this quickstart, you’ll learn how to:

  • Find features that matter for your specific needs
  • Edit features to create model variants
  • Discover which features are active in your data
  • Save and load your model variants
Code
pip install goodfire

You can get an API key through our platform

Code
GOODFIRE_API_KEY = "{YOUR_API_KEY}"

Initialize the SDK

Code
import goodfire
client = goodfire.Client(api_key=GOODFIRE_API_KEY)
# Instantiate a model variant. 
variant = goodfire.Variant("meta-llama/Llama-3.3-70B-Instruct")

Our sampling API is OpenAI compatible, making it easy to integrate.

for token in client.chat.completions.create(
    [{"role": "user", "content": "Hi, how are you?"}],
    model=variant,
    stream=True,
    max_completion_tokens=100,
):
    print(token.choices[0].delta.content, end="")

Editing features to create model variants

How to find relevant features for edits

There are three ways to find features you may want to modify:

  • Auto Steer: Simply describe what you want, and let the API automatically select and adjust feature weights

  • Feature Search: Find features using semantic search

  • Contrastive Search: Identify relevant features by comparing two different datasets

Let’s explore each method in detail.

Auto Steer

Auto steering automatically finds and adjusts feature weights to achieve your desired behavior. Simply provide a short prompt describing what you want, and autosteering will:

  • Find the relevant features
  • Set appropriate feature weights
  • Return a FeatureEdits object that you can set directly
edits = client.features.AutoSteer(
    specification="be funny",  # or your desired behavior
    model=variant,
)
variant.set(edits)
print(edits)

Now that we have a few funny edits, let’s see how the model responds!

for token in client.chat.completions.create(
    [{"role": "user", "content": "Tell me about pirates"}],
    model=variant,
    stream=True,
    max_completion_tokens=120,
):
    print(token.choices[0].delta.content, end="")

The model automatically added puns/jokes, even though we didn’t specify anything about comedy in our prompt.

Let’s reset the model to its default state (without any feature edits)

Code
variant.reset()

Feature search helps you explore and discover what capabilities your model has. It can be useful when you want to browse through available features.

funny_features = client.features.search(
    "funny",
    model=variant,
    top_k=10
)
print(funny_features)

When setting feature weights manually, start with 0.5 to enhance a feature and -0.3 to ablate a feature. When setting multiple features, you may need to tune down the weights.

variant.set(funny_features[0], 0.6)
for token in client.chat.completions.create(
      [
          {"role": "user", "content": "tell me about foxes"}
      ],
      model=variant,
      stream=True,
      max_completion_tokens=100,
  ):
      print(token.choices[0].delta.content, end="")

Feel free to play around with the weights and features to see how the model responds.

(Advanced) Look at a feature’s nearest neighbors

Get neighboring features by comparing them to either individual features or groups of features. When comparing to individual features, neighbors() looks at similarity in the embedding space. When comparing to groups, neighbors() finds features closest to the group’s centroid.

neighbors() helps you understand feature relationships beyond just their labels. It can reveal which features might work best for your intended model adjustments.

client.features.neighbors(
    funny_features[0],
    model=variant,
    top_k=5
)

Now, you can find more features that are similar to other features

client.features.neighbors(
    funny_features[2],
    model=variant,
    top_k=5
)

Contrastive search lets you discover relevant features in a data-driven way.

Provide two datasets of chat examples:

  • dataset_1: Examples of behavior you want to avoid
  • dataset_2: Examples of behavior you want to encourage

Examples are paired such that the first example in dataset_1 contrasts the first example in dataset_2, and so on.

Reranking

Contrastive search becomes more powerful when combined with reranking. First, contrastive search finds features that distinguish between your datasets. Then, reranking sorts these features using your description of the desired behavior.

This two-step process ensures you get features that are both:

  • Mechanistically useful (from contrastive search)
  • Aligned with your goals (from reranking)

Let’s specify two conversation datasets. The first has a typical helpful assistant response and the second assistant replies in jokes.

variant.reset()
default_conversation = [
    [
        {
            "role": "user",
            "content": "Hello how are you?"
        },
        {
            "role": "assistant",
            "content": "I am a helpful assistant. How can I help you?"
        }
    ]
]
joke_conversation = [
    [
        {
            "role": "user",
            "content": "Hello how are you?"
        },
        {
            "role": "assistant",
            "content": "What do you call an alligator in a vest? An investigator!"
        }
    ]
]
helpful_assistant_features, joke_features = client.features.contrast(
    dataset_1=default_conversation,
    dataset_2=joke_conversation,
    model=variant,
    top_k=30
)
# Let's rerank to surface humor related features
joke_features = client.features.rerank(
    features=joke_features,
    query="funny",
    model=variant,
    top_k=5
)
joke_features

We now have a list of features to consider adding. Let’s set some plausible-looking ones from joke_features.

Note that we could also explore removing some of the helpful_assistant features.

variant.reset()
variant.set(joke_features[0,1], 0.6)
for token in client.chat.completions.create(
    [
        {"role": "user", "content": "Hello. Tell me about whales."}
    ],
    model=variant,
    stream=True,
    max_completion_tokens=100,
):
    print(token.choices[0].delta.content, end="")

(Advanced) Conditional logic for feature edits

You can establish relationships between different features (or feature groups) using conditional interventions.

First, let’s reset the variant and pick out the funny features.

variant.reset()
funny_features

Now, let’s find a features where the model is talking like a pirate.

pirate_features = client.features.search(
    "talk like a pirate",
    model=variant,
    top_k=3
)
print(pirate_features)

Now, let’s set up behaviour so that when the model is talking like a pirate, it will be funny.

variant.set_when(pirate_features[1] > 0.75, {
    funny_features[0]: 0.7,
})
# The model will now try to be funny when talking about pirates
response = client.chat.completions.create(
    messages=[{"role": "user", "content": "talk like a pirate and tell me about whales"}],
    model=variant
)
print(response.choices[0].message["content"])

Say we decide the model isn’t very good at pirate jokes. Let’s set up behavior to stop generation altogether if the pirate features are too strong.

Abort when Pirate Features are too strong
# Abort if pirate features are too strong
variant.abort_when(pirate_features > 0.75)
try:
    response = client.chat.completions.create(
        messages=[{"role": "user", "content": "Tell me about pirates."}],
        model=variant
    )
except goodfire.exceptions.InferenceAbortedException:
    print("Generation aborted due to too much pirate content")

If you aren’t sure of the features you want to condition on, use AutoConditional with a specified prompt to get back an automatically generated condition.

Auto Conditional Code
# Generate auto conditional based on a description. This will automatically
# choose the relevant features and conditional weight
conditional = client.features.AutoConditional(
    "pirates",
    model="meta-llama/Llama-3.3-70B-Instruct",
)

# Apply feature edits when condition is met
variant.set_when(conditional, {
    joke_features[0]: 0.5,
    joke_features[1]: 0.5
})

Discover which features are active in your data

Working with a conversation context

You can inspect what features are activating in a given conversation with the inspect API, which returns a context object.

Say you want to understand what model features are important when the model tells a joke. You can pass in the same joke conversation dataset to the inspect endpoint.

variant.reset()
context = client.features.inspect(
    messages=joke_conversation[0],
    model=variant,
)
context

From the context object, you can access a lookup object which can be used to look at the set of feature labels in the context.

lookup = context.lookup()
lookup

You can select the top k activating features in the context, ranked by activation strength. There are features related to jokes and tongue twisters, among other syntactical features.

top_features = context.top(k=10)
top_features

You can also inspect individual tokens level feature activations. Let’s see what features are active at the punchline token.

print(context.tokens[-3])
token_acts = context.tokens[-3].inspect()
token_acts

(Advanced) Look at next token logits

logits = client.chat.logits(
    messages=joke_conversation[0],
    model=variant,
)
logits.logits

Get feature activation vectors for machine learning tasks

To run a machine learning pipeline at the feature level (for instance, for humor detection) you can directly export features using client.features.activations to get a matrix or retrieve a sparse vector for a specific FeatureGroup.

activations = client.features.activations(
    messages=joke_conversation[0],
    model=variant,
)
activations
top_features.vector()

Inspecting specific features

There may be specific features whose activation patterns you’re interested in exploring. In this case, you can specify features such as humor_features and pass that into the features argument of inspect.

humor_features = client.features.search("jokes and humor", model=variant, top_k=15)
humor_features

Now, let’s see if these features are activating in the joke conversation.

context = client.features.inspect(
    messages=joke_conversation[0],
    model=variant,
    features=humor_features
)
context

Now you can retrieve the top k activating humor features in the context. This might be a more interesting set of features for downstream tasks.

humor_feature_acts = context.top(k=5)
humor_feature_acts

Save and load your model variants

You can serialize a variant to JSON format for saving.

variant.reset()
variant.set(pirate_features[1], 0.9)
variant_json = variant.json()
variant_json

And load a variant from JSON format.

loaded_variant = goodfire.Variant.from_json(variant_json)
loaded_variant

Now, let’s generate a response with the loaded variant.

for token in client.chat.completions.create(
    [
        {"role": "user", "content": "tell me about whales"}
    ],
    model=loaded_variant,
    stream=True,
    max_completion_tokens=150,
):
    print(token.choices[0].delta.content, end="")

Using OpenAI SDK

You can also work directly with the OpenAI SDK for inference since our endpoint is fully compatible.

Install OpenAI Code
!pip install openai --quiet
OpenAI SDK Code
from openai import OpenAI

oai_client = OpenAI(
    api_key=GOODFIRE_API_KEY,
    base_url="https://api.goodfire.ai/api/inference/v1",
)

response = oai_client.chat.completions.create(
    messages=[
        {"role": "user", "content": "who is this"},
    ],
    model=variant.base_model,
    extra_body={"controller": variant.controller.json()},
)

response.choices[0].message.content

For more advanced usage and detailed API reference, check out our SDK reference and example notebooks.