Accessing Features

Before using the Features API, you’ll need a model variant:

variant = Variant("meta-llama/Llama-3.3-70B-Instruct")

You can then access features through the client’s features interface. For example, to search for features:

Or inspect feature activations in text:

The Features API provides methods for working with interpretable features of language models. Features represent learned patterns in model behavior that can be analyzed and modified.

Methods

neighbors()

Get the nearest neighbors of a feature or group of features.

Parameters:

features
Feature | FeatureGroup
required

Feature or group of features to find neighbors for

model
str | VariantInterface
required

Model identifier or variant interface

top_k
int
default:
10

Number of neighbors to return

Returns: FeatureGroup

Example:

Search for features based on semantic similarity to a query string.

Parameters:

query
str
required

Search string to compare against feature labels

model
str | VariantInterface
required

Model identifier or variant interface

top_k
int
default:
10

Number of features to return

Returns: FeatureGroup - Collection of matching features

Example:

Search features
# Search for features related to writing style
features = client.features.search(
    "formal writing style",
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=10
)

# Print features
for feature in features:
    print(feature.label)

inspect()

Analyzes how features are activated across the input messages.

Parameters:

messages
list[ChatMessage]
required

Messages to analyze

model
str | VariantInterface
required

Model identifier or variant interface

features
Feature | FeatureGroup | None

Optional specific features to analyze. If None, inspects all features.

aggregate_by
str
default:
"frequency"

Method to aggregate feature activations across tokens: - “frequency”: Count of tokens where feature is active - “mean”: Mean activation value across tokens - “max”: Maximum activation value across tokens - “sum”: Sum of activation values across tokens

Returns: ContextInspector - An inspector object that provides methods for analyzing and visualizing how features are activated in the given context.

Example:

Inspect feature activations
# Analyze how features activate in text
inspector = client.features.inspect(
    [
        {"role": "user", "content": "What do you think about pirates and whales"},
        {"role": "assistant", "content": "I think pirates are cool and whales are cool"}
    ],
    model=variant
)

# Get top activated features
for activation in inspector.top(k=5):
    print(f"{activation.feature.label}: {activation.activation}")

contrast()

Identify features that differentiate between two conversation datasets.

Parameters:

dataset_1
list[list[ChatMessage]]
required

First dataset of conversations

dataset_2
list[list[ChatMessage]]
required

Second dataset of conversations

model
str | VariantInterface
required

Model identifier or variant interface

top_k
int
default:
5

Number of top features to return for each dataset

Returns: tuple[FeatureGroup, FeatureGroup] - Two FeatureGroups containing:

  • Features steering towards dataset_1
  • Features steering towards dataset_2

Example:

Get constrast features
# Compare formal vs informal conversations
dataset_1 = [[
    {"role": "user", "content": "Hi how are you?"},
    {"role": "assistant", "content": "I'm doing well..."}
]]
dataset_2 = [[
    {"role": "user", "content": "Hi how are you?"},
    {"role": "assistant", "content": "Arr my spirits be high..."}
]]

formal_features, informal_features =  client.features.contrast(
    dataset_1=dataset_1,
    dataset_2=dataset_2,
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=5
)

rerank()

Rerank a set of features based on a query.

Parameters:

features
FeatureGroup
required

Features to rerank

query
str
required

Query to rerank features by

model
str | VariantInterface
required

Model identifier or variant interface

top_k
int
default:
10

Number of top features to return

Returns: FeatureGroup

Example:

Rerank
# Rerank features based on relevance to "writing style"
reranked = client.features.rerank(
    features=formal_features,
    query="writing style",
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=10
)

activations()

Retrieves feature activation values for each token in the input messages.

Parameters:

messages
list[ChatMessage]
required

Messages to analyze

model
str | VariantInterface
required

Model identifier or variant interface

features
Feature | FeatureGroup | None

Optional specific features to analyze. If None, analyzes all features.

Returns: NDArray[np.float64] - Sparse activation matrix of shape [n_tokens, n_features] where each element represents the activation strength of a feature at a specific token. Most values are zero due to sparsity.

Example:

Get activation matrix
# Get activation matrix for a conversation
matrix =  client.features.activations(
    messages=[{"role": "user", "content": "Hello world"}],
    model="meta-llama/Llama-3.3-70B-Instruct"
)

lookup()

Retrieves details for a list of features by their indices.

Parameters:

indices
list[int]
required

List of feature indices to fetch

model
str | VariantInterface
required

Model identifier or variant interface

Returns: dict[int, Feature] - Mapping of feature index to Feature object

list()

Retrieves details for a list of features by their UUIDs.

Parameters:

ids
list[str]
required

List of feature UUIDs to fetch

Returns: FeatureGroup - Collection of Feature objects

Classes

Feature

A class representing a human-interpretable “feature” - a model’s conceptual neural unit. Features can be combined into groups and compared using standard operators.

FeatureGroup

A collection of Feature instances with group operations.

ConditionalGroup

Groups multiple conditions with logical operators.

FeatureActivation

Represents the activation of a feature.

ContextInspector

Analyzes feature activations in text.