Unified LLM Client

One client surface

Use the same request and response model across providers instead of rewriting app code for each SDK.

Built for product workflows

Ship one-shot completions, streaming UIs, tool calls, conversation state, and session persistence from the same library.

Responses-first OpenAI transport

OpenAI requests already use the stateless Responses API while library-owned history and session storage stay provider-agnostic.

Production-oriented primitives

Add budgets, routing, usage logging, speech usage logging, Postgres storage, Redis storage, and a framework-agnostic Session API when the app grows up.

Start Here

Install from GitHub:

bash

pnpm add github:07rjain/LLMlibrary

Then create a client:

import { LLMClient } from 'unified-llm-client';

const client = LLMClient.fromEnv({
  defaultModel: 'gpt-4o',
});

The fastest path through the docs is:

If you want the lower-level generated API surface, open API Reference.

If you want a concrete production deployment checklist for .env, explicit Postgres wiring, and where embeddings are persisted, read Production Setup.

If you need the providers' current live catalogs, use client.models.listRemote({ provider }) and treat the result as discovery data. It does not automatically replace the checked-in model registry.

Embeddings now ship in the library through client.embed() with the Google Embedding 2 path. Optional retrieval primitives are available from the package root and from unified-llm-client/retrieval, including InMemoryKnowledgeStore for local demos and tests, PostgresKnowledgeStore for app-owned pgvector retrieval, rerank hook support in the retrievers, and active-profile / reindex helpers for rollout-safe storage. Reusable text-prep helpers now ship from unified-llm-client/chunking, and retrieval score display can be formatted in a clearer, non-probabilistic way. Keep query embeddings and stored chunk embeddings on the same embedding profile; the library does not mix profiles for you.

For embeddings planning tied to the chatbot widget use case, see Embeddings Integration Report. For a cross-check of the recent follow-up review and the recommended post-v1 order, see Embeddings Review Cross-Check. For the detailed implementation plan covering lightweight stores, chunking helpers, Gemini batching, OpenAI embeddings, and extraction helpers, see Embeddings Follow-Up Fix Plan. For the broader retrieval architecture, storage model, and rollout strategy, see Embeddings And Retrieval Architecture Report. For the concrete multitenant retrieval API, safety, and scaling plan, see Retrieval API Integration Report. For speech-to-text and text-to-speech architecture planning, see Speech API Research Report. OpenAI batch speech is implemented through Speech with separate client.speak(), client.transcribe(), client.getSpeechUsage(), and client.exportSpeechUsage() APIs. Embeddings implementation work is tracked in the repository root as embeddings_todo.md. For provider-specific implementation planning, see Prompt Caching Report. Prompt caching work is tracked in the repository root as prompt_caching_todo.md. For the OpenAI transport migration specifically, see OpenAI Responses Migration Report.

Unified LLM ClientOne TypeScript client for OpenAI, Anthropic, and Gemini

One client surface

Built for product workflows

Responses-first OpenAI transport

Production-oriented primitives

Start Here ​

Start Here