Chat Completions - Goodfire SDK

The Chat API provides methods for interacting with Goodfire’s language models in a chat format. The base chat interface is OpenAI-compatible. It supports both streaming and non-streaming completions, as well as logits computation. Once you have a model variant, you can use it to create chat completions.

Examples

Basic Chat Completion

# Initialize the client
client = goodfire.Client(
    '{YOUR_API_KEY}',
)

# Create a non-streaming completion
response =  client.chat.completions.create(
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct"
)

print(response.choices[0].message["content"])

Streaming Chat Completion

# Stream the response tokens
for chunk in client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Write a short poem"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    stream=True
):
    print(chunk.choices[0].delta.content, end="")

Using with Model Variants

# Create a variant with feature modifications
variant = goodfire.Variant("meta-llama/Llama-3.3-70B-Instruct")
pirate_features = client.features.search(
    "talk like a pirate",
    model=variant,
    top_k=1
)
variant.set(pirate_features[0], 0.5)

# Use the variant in chat completion
for token in client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Tell me about the ocean."}
    ],
    model=variant,
    stream=True,
    max_completion_tokens=100,
):
    print(token.choices[0].delta.content, end="")

Computing Token Probabilities

logits =  client.chat.logits(
    messages=[
        {"role": "user", "content": "The capital of France is"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    filter_vocabulary=["Paris", "London", "Berlin"]
)
print(logits.logits)

Methods

create()

Create a chat completion with the model. Parameters:

messages

list[ChatMessage]

required

List of messages in the conversation. Each message should have role (“user”, “assistant”, or “system”) and content fields.

model

Union[str, VariantInterface]

required

Model identifier or variant to use for completion

stream

bool

default:"False"

Whether to stream the response tokens

max_completion_tokens

Optional[int]

default:"2048"

Maximum number of tokens to generate

top_p

float

default:"0.9"

Nucleus sampling parameter

temperature

float

default:"0.6"

Sampling temperature

stop

Optional[Union[str, list[str]]]

Sequences where the API will stop generating further tokens

seed

Optional[int]

default:"42"

Random seed for reproducible outputs

system_prompt

str

System prompt to prepend to the conversation

Returns:

If stream=False: ChatCompletion object
If stream=True: Iterator of StreamingChatCompletionChunk objects

Examples: Non-streaming completion:

Basic Chat Completion

response =  client.chat.completions.create(
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct"
)
print(response.choices[0].message["content"])

Streaming completion:

Streaming Chat Completion

for chunk in client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Write a short poem"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    stream=True
):
    print(chunk.choices[0].delta.content, end="")

logits()

Compute token probabilities for the next token in a conversation. Parameters:

messages

list[ChatMessage]

required

List of messages in the conversation

model

Union[str, VariantInterface]

required

Model identifier or variant to use

top_k

Optional[int]

Limit response to top K most likely tokens

filter_vocabulary

Optional[list[str]]

List of tokens to compute probabilities for

Returns: LogitsResponse containing token probabilities Example:

Token Probabilities

logits =  client.chat.logits(
    messages=[
        {"role": "user", "content": "The capital of France is"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    filter_vocabulary=["Paris", "London", "Berlin"]
)
print(logits.logits)

Response Objects

ChatCompletion

Response from a non-streaming chat completion. Properties:

str

Unique identifier for the completion

object

str

Object type identifier

created

Optional[int]

Unix timestamp of when the completion was created

model

str

ID of the model used

system_fingerprint

str

System fingerprint for the completion

choices

list[ChatCompletionChoice]

List of completion choices

StreamingChatCompletionChunk

Individual chunk from a streaming chat completion. Properties:

str

Unique identifier for the completion

object

str

Object type identifier

created

Optional[int]

Unix timestamp of when the chunk was created

model

str

ID of the model used

system_fingerprint

str

System fingerprint for the completion

choices

list[StreamingChoice]

List of completion choices in this chunk

LogitsResponse

Response from a logits computation request. Properties:

logits

dict[str, float]

Dictionary mapping tokens to their probabilities

Classes

​Examples

​Basic Chat Completion

​Streaming Chat Completion

​Using with Model Variants

​Computing Token Probabilities

​Methods

​create()

​logits()

​Response Objects

​ChatCompletion

​StreamingChatCompletionChunk

​LogitsResponse

Examples

Basic Chat Completion

Streaming Chat Completion

Using with Model Variants

Computing Token Probabilities

Methods

create()

logits()

Response Objects

ChatCompletion

StreamingChatCompletionChunk

LogitsResponse