Skip to content

Production Guide

This page covers the parts of the library that matter once the first happy-path integration already works.

If you want the concrete production setup details for .env, Postgres wiring, and where embedding vectors are actually stored, read Production Setup alongside this guide.

Route Traffic With ModelRouter

ModelRouter lets you centralize model selection logic instead of scattering it across request handlers.

ts
import { LLMClient, ModelRouter } from 'unified-llm-client';

const router = new ModelRouter({
  rules: [
    {
      name: 'tool-traffic',
      match: { hasTools: true },
      target: { provider: 'openai', model: 'gpt-4o' },
      fallback: [{ provider: 'openai', model: 'gpt-4o-mini' }],
    },
    {
      name: 'default-fast-path',
      target: { provider: 'openai', model: 'gpt-4o-mini' },
    },
  ],
});

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o-mini',
  modelRouter: router,
});

Common reasons to add a router:

  • Send tool-heavy traffic to a model with stronger tool support
  • Keep low-value requests on a cheaper model
  • Define fallback chains during provider outages
  • Run deterministic weighted experiments

Use Budget Policies Intentionally

Both request-level and conversation-level calls accept budget controls.

ts
const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
  budgetExceededAction: 'warn',
  onWarning: (message) => {
    console.warn('llm warning', message);
  },
});

You can also override budget behavior per call:

ts
await client.complete({
  budgetUsd: 0.05,
  budgetExceededAction: 'throw',
  messages: [{ role: 'user', content: 'Write a short answer.' }],
});

Use:

  • throw when overspend is unacceptable
  • warn when you want observability without interruption
  • skip when you want a graceful no-call fallback

Choose The Right Runtime

The core client surface is safe for Edge-style runtimes:

  • LLMClient
  • Conversation
  • SessionApi
  • in-memory storage
  • routing and utility helpers

Node-only features are loaded lazily:

  • PostgresSessionStore
  • PostgresUsageLogger

Practical rule:

  • Use Edge for stateless request execution and streaming.
  • Use Node when you need Postgres-backed persistence or usage aggregation in-process.
  • Use Node for OpenAI speech-to-text uploads when your runtime does not provide stable Blob and FormData support.

Logging And Data Hygiene

The library sanitizes logged usage and error payloads before writing them through the built-in logging paths, but you still need to decide what your own application logs.

Recommended production posture:

  • Log request ids, session ids, tenant ids, model ids, finish reasons, duration, and cost.
  • Avoid logging raw prompts or tool payloads unless you have a clear compliance reason.
  • Avoid logging raw audio, base64 audio, or full transcripts unless your product has explicit retention and consent controls.
  • Keep tool results narrow and structured so downstream logging stays predictable.

Speech In Production

Speech is available through client.speak() and client.transcribe() for OpenAI batch endpoints. Keep it separate from conversation persistence: the library returns audio bytes and transcript text, but it does not store audio files or transcript history automatically.

Use PostgresUsageLogger when you need speech cost attribution. Text completions continue to use the normal usage table, while speech events are written to a sibling table named ${tableName}_speech, such as llm_usage_events_speech. Query those totals with client.getSpeechUsage() or export them with client.exportSpeechUsage().

For budgets, pass explicit durations when they cannot be derived from the payload:

  • estimatedOutputSeconds or maxOutputSeconds for text-to-speech.
  • inputAudioSeconds for compressed speech-to-text inputs.

As with text usage, usage.costUSD is the numeric field for billing logic. usage.cost is display-only.

Testing Without Live Providers

Use LLMClient.mock() for deterministic tests.

ts
import { LLMClient } from 'unified-llm-client';

const client = LLMClient.mock({
  responses: [
    {
      content: [{ type: 'text', text: 'MOCK_OK' }],
      finishReason: 'stop',
      model: 'mock-model',
      provider: 'mock',
      raw: null,
      text: 'MOCK_OK',
      toolCalls: [],
      usage: {
        cachedTokens: 0,
        cost: '$0.0000',
        costUSD: 0,
        inputTokens: 5,
        outputTokens: 2,
      },
    },
  ],
});

const response = await client.complete({
  messages: [{ role: 'user', content: 'Ping' }],
});

Use mock clients for:

  • unit tests
  • CI checks that must not depend on external APIs
  • deterministic examples and snapshots

Keep live-provider tests opt-in and separate from the default test suite.

Versioning And Reuse Across Projects

You can install directly from the GitHub repository:

bash
pnpm add github:07rjain/LLMlibrary

For more stable reuse across projects, create tags and install a specific version:

bash
pnpm add github:07rjain/LLMlibrary#v0.1.0

That gives consumers a pinned dependency instead of tracking main.

Rollout Checklist

  • Start with one provider and one model.
  • Confirm the first complete() path in production-like logs.
  • Add streaming only where it improves user experience.
  • Add session persistence only where continuity matters.
  • Add tool execution only when prompts alone are insufficient.
  • Add usage logging before you need billing or cost attribution.
  • Add routing rules after you have real traffic patterns to optimize against.
  • Tag versions before multiple projects begin depending on the library.

Supporting Docs

Provider-agnostic LLM tooling for TypeScript applications.