Accessing Features

Before using the Features API, you’ll need a model variant:
variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
You can then access features through the client’s features interface. For example, to search for features:
# Search for features related to a concept
features = client.features.search(
    "angry",
    model=variant,
    top_k=5
)

# Print the found features
print(features)
Or inspect feature activations in text:
# Analyze how features activate in text
inspector = client.features.inspect(
    [
        {"role": "user", "content": "What do you think about pirates and whales"},
        {"role": "assistant", "content": "I think pirates are cool and whales are cool"}
    ],
    model=variant
)

# Get top activated features
for activation in inspector.top(k=5):
    print(f"{activation.feature.label}: {activation.activation}")
The Features API provides methods for working with interpretable features of language models. Features represent learned patterns in model behavior that can be analyzed and modified.

Methods

neighbors()

Get the nearest neighbors of a feature or group of features. Parameters:
features
Feature | FeatureGroup
required
Feature or group of features to find neighbors for
model
str | VariantInterface
required
Model identifier or variant interface
top_k
int
default:10
Number of neighbors to return
Returns: FeatureGroup Example:
# Get neighbors of a feature
neighbors =  client.features.neighbors(
    feature,
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=10
)

# Print neighbor labels
for neighbor in neighbors:
    print(neighbor.label)
Search for features based on semantic similarity to a query string. Parameters:
query
str
required
Search string to compare against feature labels
model
str | VariantInterface
required
Model identifier or variant interface
top_k
int
default:10
Number of features to return
Returns: FeatureGroup - Collection of matching features Example:
Search features
# Search for features related to writing style
features = client.features.search(
    "formal writing style",
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=10
)

# Print features
for feature in features:
    print(feature.label)

inspect()

Analyzes how features are activated across the input messages. Parameters:
messages
list[ChatMessage]
required
Messages to analyze
model
str | VariantInterface
required
Model identifier or variant interface
features
Feature | FeatureGroup | None
Optional specific features to analyze. If None, inspects all features.
aggregate_by
str
default:"frequency"
Method to aggregate feature activations across tokens: - “frequency”: Count of tokens where feature is active - “mean”: Mean activation value across tokens - “max”: Maximum activation value across tokens - “sum”: Sum of activation values across tokens
Returns: ContextInspector - An inspector object that provides methods for analyzing and visualizing how features are activated in the given context. Example:
Inspect feature activations
# Analyze how features activate in text
inspector = client.features.inspect(
    [
        {"role": "user", "content": "What do you think about pirates and whales"},
        {"role": "assistant", "content": "I think pirates are cool and whales are cool"}
    ],
    model=variant
)

# Get top activated features
for activation in inspector.top(k=5):
    print(f"{activation.feature.label}: {activation.activation}")

contrast()

Identify features that differentiate between two conversation datasets. Parameters:
dataset_1
list[list[ChatMessage]]
required
First dataset of conversations
dataset_2
list[list[ChatMessage]]
required
Second dataset of conversations
model
str | VariantInterface
required
Model identifier or variant interface
top_k
int
default:5
Number of top features to return for each dataset
Returns: tuple[FeatureGroup, FeatureGroup] - Two FeatureGroups containing:
  • Features steering towards dataset_1
  • Features steering towards dataset_2
Example:
Get constrast features
# Compare formal vs informal conversations
dataset_1 = [[
    {"role": "user", "content": "Hi how are you?"},
    {"role": "assistant", "content": "I'm doing well..."}
]]
dataset_2 = [[
    {"role": "user", "content": "Hi how are you?"},
    {"role": "assistant", "content": "Arr my spirits be high..."}
]]

formal_features, informal_features =  client.features.contrast(
    dataset_1=dataset_1,
    dataset_2=dataset_2,
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=5
)

rerank()

Rerank a set of features based on a query. Parameters:
features
FeatureGroup
required
Features to rerank
query
str
required
Query to rerank features by
model
str | VariantInterface
required
Model identifier or variant interface
top_k
int
default:10
Number of top features to return
Returns: FeatureGroup Example:
Rerank
# Rerank features based on relevance to "writing style"
reranked = client.features.rerank(
    features=formal_features,
    query="writing style",
    model="meta-llama/Llama-3.3-70B-Instruct",
    top_k=10
)

activations()

Retrieves feature activation values for each token in the input messages. Parameters:
messages
list[ChatMessage]
required
Messages to analyze
model
str | VariantInterface
required
Model identifier or variant interface
features
Feature | FeatureGroup | None
Optional specific features to analyze. If None, analyzes all features.
Returns: NDArray[np.float64] - Sparse activation matrix of shape [n_tokens, n_features] where each element represents the activation strength of a feature at a specific token. Most values are zero due to sparsity. Example:
Get activation matrix
# Get activation matrix for a conversation
matrix =  client.features.activations(
    messages=[{"role": "user", "content": "Hello world"}],
    model="meta-llama/Llama-3.3-70B-Instruct"
)

lookup()

Retrieves details for a list of features by their indices. Parameters:
indices
list[int]
required
List of feature indices to fetch
model
str | VariantInterface
required
Model identifier or variant interface
Returns: dict[int, Feature] - Mapping of feature index to Feature object

list()

Retrieves details for a list of features by their UUIDs. Parameters:
ids
list[str]
required
List of feature UUIDs to fetch
Returns: FeatureGroup - Collection of Feature objects

Classes

Feature

A class representing a human-interpretable “feature” - a model’s conceptual neural unit. Features can be combined into groups and compared using standard operators.

FeatureGroup

A collection of Feature instances with group operations.

ConditionalGroup

Groups multiple conditions with logical operators.

FeatureActivation

Represents the activation of a feature.

ContextInspector

Analyzes feature activations in text.