One client surface
Use the same request and response model across providers instead of rewriting app code for each SDK.
Provider-agnostic completions, streaming, conversations, tools, persistence, routing, and usage tracking for real applications.
Install from GitHub:
pnpm add github:07rjain/LLMlibraryThen create a client:
import { LLMClient } from 'unified-llm-client';
const client = LLMClient.fromEnv({
defaultModel: 'gpt-4o',
});The fastest path through the docs is:
If you want the lower-level generated API surface, open API Reference.
If you want a concrete production deployment checklist for .env, explicit Postgres wiring, and where embeddings are persisted, read Production Setup.
If you need the providers' current live catalogs, use client.models.listRemote({ provider }) and treat the result as discovery data. It does not automatically replace the checked-in model registry.
Embeddings now ship in the library through client.embed() with the Google Embedding 2 path. Optional retrieval primitives are available from the package root and from unified-llm-client/retrieval, including InMemoryKnowledgeStore for local demos and tests, PostgresKnowledgeStore for app-owned pgvector retrieval, rerank hook support in the retrievers, and active-profile / reindex helpers for rollout-safe storage. Reusable text-prep helpers now ship from unified-llm-client/chunking, and retrieval score display can be formatted in a clearer, non-probabilistic way. Keep query embeddings and stored chunk embeddings on the same embedding profile; the library does not mix profiles for you.
For embeddings planning tied to the chatbot widget use case, see Embeddings Integration Report. For a cross-check of the recent follow-up review and the recommended post-v1 order, see Embeddings Review Cross-Check. For the detailed implementation plan covering lightweight stores, chunking helpers, Gemini batching, OpenAI embeddings, and extraction helpers, see Embeddings Follow-Up Fix Plan. For the broader retrieval architecture, storage model, and rollout strategy, see Embeddings And Retrieval Architecture Report. For the concrete multitenant retrieval API, safety, and scaling plan, see Retrieval API Integration Report. For speech-to-text and text-to-speech architecture planning, see Speech API Research Report. OpenAI batch speech is implemented through Speech with separate client.speak(), client.transcribe(), client.getSpeechUsage(), and client.exportSpeechUsage() APIs. Embeddings implementation work is tracked in the repository root as embeddings_todo.md. For provider-specific implementation planning, see Prompt Caching Report. Prompt caching work is tracked in the repository root as prompt_caching_todo.md. For the OpenAI transport migration specifically, see OpenAI Responses Migration Report.