The Chat API provides methods for interacting with Goodfire’s language models in a chat format. The base chat interface is OpenAI-compatible. It supports both streaming and non-streaming completions, as well as logits computation.

Once you have a model variant, you can use it to create chat completions.

Examples

Basic Chat Completion

Streaming Chat Completion

Using with Model Variants

Computing Token Probabilities

Methods

create()

Create a chat completion with the model.

Parameters:

messages
list[ChatMessage]
required

List of messages in the conversation. Each message should have role (“user”, “assistant”, or “system”) and content fields.

model
Union[str, VariantInterface]
required

Model identifier or variant to use for completion

stream
bool
default: "False"

Whether to stream the response tokens

max_completion_tokens
Optional[int]
default: "2048"

Maximum number of tokens to generate

top_p
float
default: "0.9"

Nucleus sampling parameter

temperature
float
default: "0.6"

Sampling temperature

stop
Optional[Union[str, list[str]]]

Sequences where the API will stop generating further tokens

seed
Optional[int]
default: "42"

Random seed for reproducible outputs

system_prompt
str

System prompt to prepend to the conversation

Returns:

  • If stream=False: ChatCompletion object
  • If stream=True: Iterator of StreamingChatCompletionChunk objects

Examples:

Non-streaming completion:

Basic Chat Completion
response =  client.chat.completions.create(
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct"
)
print(response.choices[0].message["content"])

Streaming completion:

Streaming Chat Completion
for chunk in client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Write a short poem"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    stream=True
):
    print(chunk.choices[0].delta.content, end="")

logits()

Compute token probabilities for the next token in a conversation.

Parameters:

messages
list[ChatMessage]
required

List of messages in the conversation

model
Union[str, VariantInterface]
required

Model identifier or variant to use

top_k
Optional[int]

Limit response to top K most likely tokens

filter_vocabulary
Optional[list[str]]

List of tokens to compute probabilities for

Returns: LogitsResponse containing token probabilities

Example:

Token Probabilities
logits =  client.chat.logits(
    messages=[
        {"role": "user", "content": "The capital of France is"}
    ],
    model="meta-llama/Llama-3.3-70B-Instruct",
    filter_vocabulary=["Paris", "London", "Berlin"]
)
print(logits.logits)  

Response Objects

ChatCompletion

Response from a non-streaming chat completion.

Properties:

id
str

Unique identifier for the completion

object
str

Object type identifier

created
Optional[int]

Unix timestamp of when the completion was created

model
str

ID of the model used

system_fingerprint
str

System fingerprint for the completion

choices
list[ChatCompletionChoice]

List of completion choices

StreamingChatCompletionChunk

Individual chunk from a streaming chat completion.

Properties:

id
str

Unique identifier for the completion

object
str

Object type identifier

created
Optional[int]

Unix timestamp of when the chunk was created

model
str

ID of the model used

system_fingerprint
str

System fingerprint for the completion

choices
list[StreamingChoice]

List of completion choices in this chunk

LogitsResponse

Response from a logits computation request.

Properties:

logits
dict[str, float]

Dictionary mapping tokens to their probabilities