Get started with the Goodfire Ember SDK
neighbors()
looks at similarity in the embedding space. When comparing to groups, neighbors()
finds features closest to the group’s centroid.
neighbors()
helps you understand feature relationships beyond just their labels. It can reveal which features might work best for your intended model adjustments.
joke_features
.
Note that we could also explore removing some of the helpful_assistant features.
inspect
API, which returns a context
object.
Say you want to understand what model features are important when the model tells a joke. You can pass in the same joke conversation dataset to the inspect endpoint.
k
activating features in the context, ranked by activation strength. There are features related to jokes and tongue twisters, among other syntactical features.
client.features.activations
to get a matrix or retrieve a sparse vector for a specific FeatureGroup
.
features
argument of inspect
.
context
. This might be a more interesting set of features for downstream tasks.