You can then access features through the client’s features interface. For example, to search for features:
# Search for features related to a conceptfeatures = client.features.search( "angry", model=variant, top_k=5)# Print the found featuresprint(features)
Or inspect feature activations in text:
# Analyze how features activate in textinspector = client.features.inspect( [ {"role": "user", "content": "What do you think about pirates and whales"}, {"role": "assistant", "content": "I think pirates are cool and whales are cool"} ], model=variant)# Get top activated featuresfor activation in inspector.top(k=5): print(f"{activation.feature.label}: {activation.activation}")
The Features API provides methods for working with interpretable features of language models. Features represent learned patterns in model behavior that can be analyzed and modified.
# Get neighbors of a featureneighbors = client.features.neighbors( feature, model="meta-llama/Llama-3.3-70B-Instruct", top_k=10)# Print neighbor labelsfor neighbor in neighbors: print(neighbor.label)
Returns:FeatureGroup - Collection of matching features
Example:
Search features
# Search for features related to writing stylefeatures = client.features.search( "formal writing style", model="meta-llama/Llama-3.3-70B-Instruct", top_k=10)# Print featuresfor feature in features: print(feature.label)
Method to aggregate feature activations across tokens: - “frequency”: Count of
tokens where feature is active - “mean”: Mean activation value across tokens -
“max”: Maximum activation value across tokens - “sum”: Sum of activation
values across tokens
Returns:ContextInspector - An inspector object that provides methods for analyzing and visualizing how features are activated in the given context.
Example:
Inspect feature activations
# Analyze how features activate in textinspector = client.features.inspect( [ {"role": "user", "content": "What do you think about pirates and whales"}, {"role": "assistant", "content": "I think pirates are cool and whales are cool"} ], model=variant)# Get top activated featuresfor activation in inspector.top(k=5): print(f"{activation.feature.label}: {activation.activation}")
# Rerank features based on relevance to "writing style"reranked = client.features.rerank( features=formal_features, query="writing style", model="meta-llama/Llama-3.3-70B-Instruct", top_k=10)
Optional specific features to analyze. If None, analyzes all features.
Returns:NDArray[np.float64] - Sparse activation matrix of shape [n_tokens, n_features] where each element represents the activation strength of a feature at a specific token. Most values are zero due to sparsity.
Example:
Get activation matrix
# Get activation matrix for a conversationmatrix = client.features.activations( messages=[{"role": "user", "content": "Hello world"}], model="meta-llama/Llama-3.3-70B-Instruct")
A class representing a human-interpretable “feature” - a model’s conceptual neural unit. Features can be combined into groups and compared using standard operators.
add(feature): Add a feature to the group - pop(index): Remove and
return feature at specified index - union(feature_group): Combine this
group with another feature group - intersection(feature_group): Create new
group with features common to both groups - Supports indexing and slicing
operations