The Chat API provides methods for interacting with Goodfire’s language models in a chat format. The base chat interface is OpenAI-compatible. It supports both streaming and non-streaming completions, as well as logits computation. Once you have a model variant, you can use it to create chat completions.

Examples

Basic Chat Completion

# Initialize the client
client = goodfire.Client(
    '{YOUR_API_KEY}',
)

# Create a non-streaming completion
response =  client.chat.completions.create(
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct"
)

print(response.choices[0].message["content"])

Streaming Chat Completion

# Stream the response tokens
for chunk in client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Write a short poem"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    stream=True
):
    print(chunk.choices[0].delta.content, end="")

Using with Model Variants

# Create a variant with feature modifications
variant = goodfire.Variant("meta-llama/Llama-3.3-70B-Instruct")
pirate_features = client.features.search(
    "talk like a pirate",
    model=variant,
    top_k=1
)
variant.set(pirate_features[0], 0.5)

# Use the variant in chat completion
for token in client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Tell me about the ocean."}
    ],
    model=variant,
    stream=True,
    max_completion_tokens=100,
):
    print(token.choices[0].delta.content, end="")

Computing Token Probabilities

logits =  client.chat.logits(
    messages=[
        {"role": "user", "content": "The capital of France is"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    filter_vocabulary=["Paris", "London", "Berlin"]
)
print(logits.logits)

Methods

create()

Create a chat completion with the model. Parameters:
messages
list[ChatMessage]
required
List of messages in the conversation. Each message should have role (“user”, “assistant”, or “system”) and content fields.
model
Union[str, VariantInterface]
required
Model identifier or variant to use for completion
stream
bool
default:"False"
Whether to stream the response tokens
max_completion_tokens
Optional[int]
default:"2048"
Maximum number of tokens to generate
top_p
float
default:"0.9"
Nucleus sampling parameter
temperature
float
default:"0.6"
Sampling temperature
stop
Optional[Union[str, list[str]]]
Sequences where the API will stop generating further tokens
seed
Optional[int]
default:"42"
Random seed for reproducible outputs
system_prompt
str
System prompt to prepend to the conversation
Returns:
  • If stream=False: ChatCompletion object
  • If stream=True: Iterator of StreamingChatCompletionChunk objects
Examples: Non-streaming completion:
Basic Chat Completion
response =  client.chat.completions.create(
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct"
)
print(response.choices[0].message["content"])
Streaming completion:
Streaming Chat Completion
for chunk in client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Write a short poem"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    stream=True
):
    print(chunk.choices[0].delta.content, end="")

logits()

Compute token probabilities for the next token in a conversation. Parameters:
messages
list[ChatMessage]
required
List of messages in the conversation
model
Union[str, VariantInterface]
required
Model identifier or variant to use
top_k
Optional[int]
Limit response to top K most likely tokens
filter_vocabulary
Optional[list[str]]
List of tokens to compute probabilities for
Returns: LogitsResponse containing token probabilities Example:
Token Probabilities
logits =  client.chat.logits(
    messages=[
        {"role": "user", "content": "The capital of France is"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    filter_vocabulary=["Paris", "London", "Berlin"]
)
print(logits.logits)  

Response Objects

ChatCompletion

Response from a non-streaming chat completion. Properties:
id
str
Unique identifier for the completion
object
str
Object type identifier
created
Optional[int]
Unix timestamp of when the completion was created
model
str
ID of the model used
system_fingerprint
str
System fingerprint for the completion
choices
list[ChatCompletionChoice]
List of completion choices

StreamingChatCompletionChunk

Individual chunk from a streaming chat completion. Properties:
id
str
Unique identifier for the completion
object
str
Object type identifier
created
Optional[int]
Unix timestamp of when the chunk was created
model
str
ID of the model used
system_fingerprint
str
System fingerprint for the completion
choices
list[StreamingChoice]
List of completion choices in this chunk

LogitsResponse

Response from a logits computation request. Properties:
logits
dict[str, float]
Dictionary mapping tokens to their probabilities