Production Guide

This page covers the parts of the library that matter once the first happy-path integration already works.

If you want the concrete production setup details for .env, Postgres wiring, and where embedding vectors are actually stored, read Production Setup alongside this guide.

Route Traffic With `ModelRouter`

ModelRouter lets you centralize model selection logic instead of scattering it across request handlers.

import { LLMClient, ModelRouter } from 'unified-llm-client';

const router = new ModelRouter({
  rules: [
    {
      name: 'tool-traffic',
      match: { hasTools: true },
      target: { provider: 'openai', model: 'gpt-4o' },
      fallback: [{ provider: 'openai', model: 'gpt-4o-mini' }],
    },
    {
      name: 'default-fast-path',
      target: { provider: 'openai', model: 'gpt-4o-mini' },
    },
  ],
});

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o-mini',
  modelRouter: router,
});

Common reasons to add a router:

Send tool-heavy traffic to a model with stronger tool support
Keep low-value requests on a cheaper model
Define fallback chains during provider outages
Run deterministic weighted experiments

Use Budget Policies Intentionally

Both request-level and conversation-level calls accept budget controls.

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
  budgetExceededAction: 'warn',
  onWarning: (message) => {
    console.warn('llm warning', message);
  },
});

You can also override budget behavior per call:

await client.complete({
  budgetUsd: 0.05,
  budgetExceededAction: 'throw',
  messages: [{ role: 'user', content: 'Write a short answer.' }],
});

Use:

throw when overspend is unacceptable
warn when you want observability without interruption
skip when you want a graceful no-call fallback

Choose The Right Runtime

The core client surface is safe for Edge-style runtimes:

LLMClient
Conversation
SessionApi
in-memory storage
routing and utility helpers

Node-only features are loaded lazily:

PostgresSessionStore
PostgresUsageLogger

Practical rule:

Use Edge for stateless request execution and streaming.
Use Node when you need Postgres-backed persistence or usage aggregation in-process.
Use Node for OpenAI speech-to-text uploads when your runtime does not provide stable Blob and FormData support.

Logging And Data Hygiene

The library sanitizes logged usage and error payloads before writing them through the built-in logging paths, but you still need to decide what your own application logs.

Speech In Production

Speech is available through client.speak() and client.transcribe() for OpenAI batch endpoints. Keep it separate from conversation persistence: the library returns audio bytes and transcript text, but it does not store audio files or transcript history automatically.

Use PostgresUsageLogger when you need speech cost attribution. Text completions continue to use the normal usage table, while speech events are written to a sibling table named ${tableName}_speech, such as llm_usage_events_speech. Query those totals with client.getSpeechUsage() or export them with client.exportSpeechUsage().

For budgets, pass explicit durations when they cannot be derived from the payload:

estimatedOutputSeconds or maxOutputSeconds for text-to-speech.
inputAudioSeconds for compressed speech-to-text inputs.

As with text usage, usage.costUSD is the numeric field for billing logic. usage.cost is display-only.

Testing Without Live Providers

Use LLMClient.mock() for deterministic tests.

import { LLMClient } from 'unified-llm-client';

const client = LLMClient.mock({
  responses: [
    {
      content: [{ type: 'text', text: 'MOCK_OK' }],
      finishReason: 'stop',
      model: 'mock-model',
      provider: 'mock',
      raw: null,
      text: 'MOCK_OK',
      toolCalls: [],
      usage: {
        cachedTokens: 0,
        cost: '$0.0000',
        costUSD: 0,
        inputTokens: 5,
        outputTokens: 2,
      },
    },
  ],
});

const response = await client.complete({
  messages: [{ role: 'user', content: 'Ping' }],
});

Use mock clients for:

unit tests
CI checks that must not depend on external APIs
deterministic examples and snapshots

Keep live-provider tests opt-in and separate from the default test suite.

Versioning And Reuse Across Projects

You can install directly from the GitHub repository:

bash

pnpm add github:07rjain/LLMlibrary

For more stable reuse across projects, create tags and install a specific version:

bash

pnpm add github:07rjain/LLMlibrary#v0.1.0

That gives consumers a pinned dependency instead of tracking main.

Rollout Checklist

Start with one provider and one model.
Confirm the first complete() path in production-like logs.
Add streaming only where it improves user experience.
Add session persistence only where continuity matters.
Add tool execution only when prompts alone are insufficient.
Add usage logging before you need billing or cost attribution.
Add routing rules after you have real traffic patterns to optimize against.
Tag versions before multiple projects begin depending on the library.

Supporting Docs

API reference: ./api/index.html
Session API contract: SESSION_API_REFERENCE.md
Production setup: PRODUCTION_SETUP.md
Provider comparison: PROVIDER_COMPARISON.md
Cost policy: COST_AND_PRICING.md

Production Guide ​

Route Traffic With ModelRouter ​

Use Budget Policies Intentionally ​

Choose The Right Runtime ​

Logging And Data Hygiene ​

Speech In Production ​

Testing Without Live Providers ​

Versioning And Reuse Across Projects ​

Rollout Checklist ​

Supporting Docs ​