Completions And Streaming
This page covers the two base execution modes:
client.complete()for one-shot responsesclient.stream()for incremental output
Complete Requests
Use complete() when you want one resolved response object.
import { LLMClient } from 'unified-llm-client';
const client = LLMClient.fromEnv({
defaultModel: 'gpt-4o',
});
const response = await client.complete({
maxTokens: 300,
temperature: 0.2,
messages: [
{ role: 'user', content: 'Write a two-line release note for a bug fix.' },
],
});
console.log(response.text);
console.log(response.finishReason);
console.log(response.usage);Request Options You Will Use Most Often
messagesCanonical chat historymodelThe model id for this requestproviderProvider override when you want to force a routesystemTop-level system promptmaxTokensMaximum generated output tokenstemperatureSampling controltoolsandtoolChoiceTool definitions and tool policysessionIdandtenantIdTracking fields used by persistence, routing, and usage loggingbudgetUsdEstimated spend cap for the request
Message Shapes
Plain text messages are the most common case:
const response = await client.complete({
messages: [
{ role: 'user', content: 'Summarise this ticket in one sentence.' },
],
});The library also supports structured multimodal parts:
const response = await client.complete({
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Describe what is in this image.' },
{
type: 'image_url',
url: 'https://example.com/diagram.png',
mediaType: 'image/png',
},
],
},
],
});The benefit of the canonical message format is that your application code does not have to branch deeply by provider once the request is inside the library.
Streaming Requests
Use stream() when the caller needs tokens as they arrive.
const stream = client.stream({
messages: [{ role: 'user', content: 'Stream a short product update.' }],
});
let text = '';
for await (const chunk of stream) {
if (chunk.type === 'text-delta') {
text += chunk.delta;
process.stdout.write(chunk.delta);
}
if (chunk.type === 'done') {
console.log('\nusage', chunk.usage);
}
}Stream Chunk Types
text-deltaIncremental text contenttool-call-startThe model started building a tool calltool-call-deltaPartial tool-call argument JSONtool-call-resultExecuted tool result surfaced back into the streamdoneFinal usage and finish reasonerrorTerminal error frame
Cancel A Stream
The returned stream is cancelable.
const stream = client.stream({
messages: [{ role: 'user', content: 'Write a long answer.' }],
});
setTimeout(() => {
stream.cancel(new Error('Client disconnected.'));
}, 200);
for await (const chunk of stream) {
if (chunk.type === 'text-delta') {
process.stdout.write(chunk.delta);
}
}This is especially useful in HTTP servers where the browser tab may close before the model finishes.
Estimated Cost And Token Helpers
For preflight estimates and display formatting, the library exports helpers from unified-llm-client/utils.
import {
estimateMessageTokens,
formatCost,
openaiCountTokens,
} from 'unified-llm-client/utils';
const messages = [{ role: 'user', content: 'Estimate token count for this request.' }];
console.log(estimateMessageTokens(messages));
console.log(formatCost(0.0132));
console.log(await openaiCountTokens({ messages, model: 'gpt-4o' }));Use estimateMessageTokens() for lightweight approximations and openaiCountTokens() when you want closer OpenAI-specific counting.
Error Handling
Provider-specific transport differences are normalized into library errors.
import {
AuthenticationError,
ProviderError,
RateLimitError,
} from 'unified-llm-client';
try {
await client.complete({
messages: [{ role: 'user', content: 'Hello' }],
});
} catch (error) {
if (error instanceof AuthenticationError) {
console.error('Check your provider API key configuration.');
} else if (error instanceof RateLimitError) {
console.error('Retry later or route to a fallback model.');
} else if (error instanceof ProviderError) {
console.error('Provider responded with an upstream error.');
} else {
throw error;
}
}When To Use complete() Vs stream()
- Use
complete()for background jobs, cron tasks, and simple server endpoints. - Use
stream()for chat UIs, CLI tools, and long-form responses where latency matters. - Use
conversation()instead of manually passing history once you need multi-turn state or tool loops.
Next Step
If you need persistent history, context management, or tool execution, continue with Conversations And Tools.