Features - Goodfire SDK

Accessing Features

Before using the Features API, you’ll need a model variant:

variant = Variant("meta-llama/Llama-3.3-70B-Instruct")

You can then access features through the client’s features interface. For example, to search for features:

# Search for features related to a concept
features = client.features.search(
    "angry",
    model=variant,
    top_k=5
)

# Print the found features
print(features)

Or inspect feature activations in text:

# Analyze how features activate in text
inspector = client.features.inspect(
    [
        {"role": "user", "content": "What do you think about pirates and whales"},
        {"role": "assistant", "content": "I think pirates are cool and whales are cool"}
    ],
    model=variant
)

# Get top activated features
for activation in inspector.top(k=5):
    print(f"{activation.feature.label}: {activation.activation}")

The Features API provides methods for working with interpretable features of language models. Features represent learned patterns in model behavior that can be analyzed and modified.

Methods

neighbors()

Get the nearest neighbors of a feature or group of features. Parameters:

features

Feature | FeatureGroup

required

Feature or group of features to find neighbors for

model

str | VariantInterface

required

Model identifier or variant interface

top_k

int

default:10

Number of neighbors to return

Returns: FeatureGroup Example:

# Get neighbors of a feature
neighbors =  client.features.neighbors(
    feature,
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=10
)

# Print neighbor labels
for neighbor in neighbors:
    print(neighbor.label)

search()

Search for features based on semantic similarity to a query string. Parameters:

query

str

required

Search string to compare against feature labels

model

str | VariantInterface

required

Model identifier or variant interface

top_k

int

default:10

Number of features to return

Returns: FeatureGroup - Collection of matching features Example:

Search features

# Search for features related to writing style
features = client.features.search(
    "formal writing style",
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=10
)

# Print features
for feature in features:
    print(feature.label)

inspect()

Analyzes how features are activated across the input messages. Parameters:

messages

list[ChatMessage]

required

Messages to analyze

model

str | VariantInterface

required

Model identifier or variant interface

features

Feature | FeatureGroup | None

Optional specific features to analyze. If None, inspects all features.

aggregate_by

str

default:"frequency"

Method to aggregate feature activations across tokens: - “frequency”: Count of tokens where feature is active - “mean”: Mean activation value across tokens - “max”: Maximum activation value across tokens - “sum”: Sum of activation values across tokens

Returns: ContextInspector - An inspector object that provides methods for analyzing and visualizing how features are activated in the given context. Example:

Inspect feature activations

# Analyze how features activate in text
inspector = client.features.inspect(
    [
        {"role": "user", "content": "What do you think about pirates and whales"},
        {"role": "assistant", "content": "I think pirates are cool and whales are cool"}
    ],
    model=variant
)

# Get top activated features
for activation in inspector.top(k=5):
    print(f"{activation.feature.label}: {activation.activation}")

contrast()

Identify features that differentiate between two conversation datasets. Parameters:

dataset_1

list[list[ChatMessage]]

required

First dataset of conversations

dataset_2

list[list[ChatMessage]]

required

Second dataset of conversations

model

str | VariantInterface

required

Model identifier or variant interface

top_k

int

default:5

Number of top features to return for each dataset

Returns: tuple[FeatureGroup, FeatureGroup] - Two FeatureGroups containing:

Features steering towards dataset_1
Features steering towards dataset_2

Example:

Get constrast features

# Compare formal vs informal conversations
dataset_1 = [[
    {"role": "user", "content": "Hi how are you?"},
    {"role": "assistant", "content": "I'm doing well..."}
]]
dataset_2 = [[
    {"role": "user", "content": "Hi how are you?"},
    {"role": "assistant", "content": "Arr my spirits be high..."}
]]

formal_features, informal_features =  client.features.contrast(
    dataset_1=dataset_1,
    dataset_2=dataset_2,
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=5
)

rerank()

Rerank a set of features based on a query. Parameters:

features

FeatureGroup

required

Features to rerank

query

str

required

Query to rerank features by

model

str | VariantInterface

required

Model identifier or variant interface

top_k

int

default:10

Number of top features to return

Returns: FeatureGroup Example:

Rerank

# Rerank features based on relevance to "writing style"
reranked = client.features.rerank(
    features=formal_features,
    query="writing style",
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=10
)

activations()

Retrieves feature activation values for each token in the input messages. Parameters:

messages

list[ChatMessage]

required

Messages to analyze

model

str | VariantInterface

required

Model identifier or variant interface

features

Feature | FeatureGroup | None

Optional specific features to analyze. If None, analyzes all features.

Returns: NDArray[np.float64] - Sparse activation matrix of shape [n_tokens, n_features] where each element represents the activation strength of a feature at a specific token. Most values are zero due to sparsity. Example:

Get activation matrix

# Get activation matrix for a conversation
matrix =  client.features.activations(
    messages=[{"role": "user", "content": "Hello world"}],
    model="meta-llama/Llama-3.3-70B-Instruct"
)

lookup()

Retrieves details for a list of features by their indices. Parameters:

indices

list[int]

required

List of feature indices to fetch

model

str | VariantInterface

required

Model identifier or variant interface

Returns: dict[int, Feature] - Mapping of feature index to Feature object

list()

Retrieves details for a list of features by their UUIDs. Parameters:

ids

list[str]

required

List of feature UUIDs to fetch

Returns: FeatureGroup - Collection of Feature objects

Classes

Feature

A class representing a human-interpretable “feature” - a model’s conceptual neural unit. Features can be combined into groups and compared using standard operators.

Show Properties

uuid

UUID

Unique identifier for the feature

label

str

Human-readable label describing the feature

max_activation_strength

float

Maximum activation strength of the feature in the training dataset

index_in_sae

int

Feature index relative to all features in a given model

Methods

json(): Convert feature to JSON format
from_json(data): Create feature from JSON data
Supports comparison operators (==, !=, <=, <, >=, >)

FeatureGroup

A collection of Feature instances with group operations.

Show Properties

Methods

add(feature): Add a feature to the group - pop(index): Remove and return feature at specified index - union(feature_group): Combine this group with another feature group - intersection(feature_group): Create new group with features common to both groups - Supports indexing and slicing operations

ConditionalGroup

Groups multiple conditions with logical operators.

Show Properties

conditionals

list[Conditional]

List of Conditional instances to group

operator

str

Logical operator to join conditions (“AND” or “OR”)

Methods

Supports logical operations (& for AND, | for OR)
json(): Convert to JSON format
from_json(): Create from JSON data

FeatureActivation

Represents the activation of a feature.

Show Properties

feature

Feature

The activated feature

activation

float

Activation strength

ContextInspector

Analyzes feature activations in text.

Show Properties

Methods

top(k): Get top activated features

Example

# Get top 5 activated features
top_features = inspector.top(k=5)

Classes

​Accessing Features

​Methods

​neighbors()

​search()

​inspect()

​contrast()

​rerank()

​activations()

​lookup()

​list()

​Classes

​Feature

​FeatureGroup

​ConditionalGroup

​FeatureActivation

​ContextInspector

Accessing Features

Methods

neighbors()

search()

inspect()

contrast()

rerank()

activations()

lookup()

list()

Classes

Feature

FeatureGroup

ConditionalGroup

FeatureActivation

ContextInspector