Production Setup
This page is the practical production companion to the general Production Guide.
Use it when you want a concrete answer to:
- which env vars do I need
- which ones are optional
- how should I wire the client in production
- where conversation history is saved
- where embedding vectors are saved
- what the library does not persist automatically
1. Environment Variables
The library reads credentials and database configuration from your application environment.
Recommended production .env shape:
# Provider credentials: set only the ones you actually use.
OPENAI_API_KEY=
OPENAI_ORG_ID=
OPENAI_PROJECT_ID=
ANTHROPIC_API_KEY=
GEMINI_API_KEY=
# Shared Postgres database for sessions, usage logs, and optional retrieval storage.
# Prefer an explicit SSL mode in production.
DATABASE_URL=postgresql://USER:PASSWORD@HOST:5432/DB_NAME?sslmode=verify-fullImportant rules:
- Do not commit
.envfiles. - Set models in code, not in
.env. OPENAI_ORG_IDandOPENAI_PROJECT_IDare optional.DATABASE_URLis optional for simple stateless usage, but it is required for:PostgresSessionStorePostgresUsageLoggerPostgresKnowledgeStore
Minimal env combinations:
- OpenAI generation only:
OPENAI_API_KEY
- Anthropic generation only:
ANTHROPIC_API_KEY
- Gemini generation only:
GEMINI_API_KEY
- Durable conversations and usage logging:
- provider key(s)
DATABASE_URL
- Embeddings plus retrieval:
GEMINI_API_KEYDATABASE_URL- plus any additional provider key if you want a different model for final answer generation
2. Recommended Production Wiring
The library can auto-wire Postgres persistence when DATABASE_URL is present, but production applications should usually be explicit.
Recommended pattern:
import {
LLMClient,
PostgresSessionStore,
PostgresUsageLogger,
} from 'unified-llm-client';
const sessionStore = PostgresSessionStore.fromEnv();
const usageLogger = PostgresUsageLogger.fromEnv();
export const client = LLMClient.fromEnv({
defaultEmbeddingModel: 'gemini-embedding-2',
defaultModel: 'claude-sonnet-4-6',
sessionStore,
usageLogger,
});Why this is better than implicit wiring:
- startup configuration is obvious in code
- tests can swap stores more easily
- you decide whether persistence and analytics are enabled
- it avoids surprising behavior when
DATABASE_URLis present in one environment but not another
3. What Gets Stored Where
There are three separate persistence concerns in production:
Conversation history
If you use PostgresSessionStore, conversation snapshots are stored in:
- table:
public.llm_sessionsby default
This table stores:
tenant_idsession_idsnapshotmessage_countmodelprovidertotal_cost_usd- timestamps
This is conversation state, not retrieval state.
Usage analytics
If you use PostgresUsageLogger, usage events are stored in:
- table:
public.llm_usage_eventsby default
This table stores:
- provider
- model
- token counts
- cached token counts
- estimated cost
- finish reason
- duration
- tenant and session metadata
This is analytics data, not conversation state and not retrieval vectors.
Embeddings and retrieval data
If you use PostgresKnowledgeStore, retrieval data is stored in four tables by default:
public.knowledge_spacespublic.embedding_profilespublic.knowledge_sourcespublic.knowledge_chunks
The actual embedding vector is stored in:
- table:
public.knowledge_chunks - column:
embedding VECTOR NOT NULL
The chunk row also stores:
tenant_idbot_idknowledge_space_idsource_idembedding_profile_idchunk_textcitationmetadatasearch_documentfor lexical searchsource_typesource_nametitleurlscope_typescope_user_idstart_offsetend_offset
Practical meaning:
- the vector is saved in your Postgres database
- the library does not save vectors inside the provider
- the library does not save vectors inside conversation/session rows
- the library does not automatically persist vectors just because
client.embed()was called
4. What client.embed() Does And Does Not Do
client.embed() only generates embeddings.
It does:
- call the configured embedding transport
- return one or more vectors
- return provider usage metadata when available
It does not:
- create knowledge spaces
- create embedding profiles
- create sources
- write chunk rows
- choose your retrieval policy
- decide how to split documents
That means this code:
const result = await client.embed({
model: 'gemini-embedding-2',
input: 'Refunds are available for 30 days after purchase.',
});returns vectors in memory only.
Nothing is persisted until your app writes those vectors into a store such as PostgresKnowledgeStore.
5. Recommended Embeddings Storage Flow
For production retrieval, the intended flow is:
- Create a
PostgresKnowledgeStore. - Call
ensureSchema()once during startup or ingestion bootstrap. - Create a knowledge space.
- Create an embedding profile.
- Create a source record.
- Chunk your content.
- Call
client.embed()for those chunks. - Write the vectors with
upsertKnowledgeChunk(). - Mark the source ready and activate the embedding profile.
- Query through
createDenseRetriever()orcreateHybridRetriever().
Example:
import {
LLMClient,
createPostgresKnowledgeStore,
} from 'unified-llm-client';
import { chunkText, cleanText, stripHtml } from 'unified-llm-client/chunking';
const client = LLMClient.fromEnv({
defaultEmbeddingModel: 'gemini-embedding-2',
defaultModel: 'claude-sonnet-4-6',
});
const store = createPostgresKnowledgeStore({
connectionString: process.env.DATABASE_URL,
});
await store.ensureSchema();
await store.upsertKnowledgeSpace({
id: 'kb-support',
tenantId: 'tenant-1',
botId: 'bot-1',
name: 'Support Knowledge Base',
});
await store.upsertEmbeddingProfile({
id: 'profile-2026-04-25',
tenantId: 'tenant-1',
botId: 'bot-1',
knowledgeSpaceId: 'kb-support',
provider: 'google',
model: 'gemini-embedding-2',
dimensions: 3072,
});
await store.upsertKnowledgeSource({
id: 'refund-policy',
tenantId: 'tenant-1',
botId: 'bot-1',
knowledgeSpaceId: 'kb-support',
embeddingProfileId: 'profile-2026-04-25',
sourceType: 'pdf',
name: 'refund-policy.pdf',
status: 'processing',
});
const text = cleanText(stripHtml('<h1>Refund Policy</h1><p>Refunds last 30 days.</p>'));
const chunks = chunkText(text, { chunkSize: 900, overlap: 120 });
const embeddings = await client.embed({
model: 'gemini-embedding-2',
input: chunks.map((chunk) => chunk.text),
purpose: 'retrieval_document',
});
for (const [index, chunk] of chunks.entries()) {
await store.upsertKnowledgeChunk({
id: `refund-policy:${index}`,
tenantId: 'tenant-1',
botId: 'bot-1',
knowledgeSpaceId: 'kb-support',
sourceId: 'refund-policy',
embeddingProfileId: 'profile-2026-04-25',
chunkIndex: index,
text: chunk.text,
embedding: embeddings.embeddings[index]!.values,
startOffset: chunk.startOffset,
endOffset: chunk.endOffset,
sourceType: 'pdf',
sourceName: 'refund-policy.pdf',
title: 'Refund Policy',
});
}
await store.upsertKnowledgeSource({
id: 'refund-policy',
tenantId: 'tenant-1',
botId: 'bot-1',
knowledgeSpaceId: 'kb-support',
embeddingProfileId: 'profile-2026-04-25',
sourceType: 'pdf',
name: 'refund-policy.pdf',
status: 'ready',
});
await store.activateEmbeddingProfile({
tenantId: 'tenant-1',
botId: 'bot-1',
knowledgeSpaceId: 'kb-support',
embeddingProfileId: 'profile-2026-04-25',
});activateEmbeddingProfile() is fail-closed. If the knowledge space does not exist, or if the embedding profile belongs to a different tenant, bot, or knowledge space, it throws instead of silently doing nothing.
6. Retrieval Safety Rules
Production retrieval should always stay fully scoped.
For PostgresKnowledgeStore, dense and lexical search require:
tenantIdbotIdknowledgeSpaceIdembeddingProfileId
This is a deliberate fail-closed design.
Do not:
- trust tenant ids from the client
- search without all filters
- mix vectors from different embedding profiles
- reuse one profile id across different dimensions or models
Recommended rule:
- derive
tenantIdfrom auth middleware on the server - keep one active embedding profile per knowledge space
- reindex into a new profile when model or dimensions change
7. In-Memory Versus Postgres For Embeddings
createInMemoryKnowledgeStore() is useful for:
- local demos
- tests
- single-process development
But production retrieval should use PostgresKnowledgeStore because:
- vectors survive process restarts
- metadata and vectors stay queryable together
- you can index the
embeddingcolumn withpgvector - you can store retrieval metadata and source state in one database
If you use the in-memory store, vectors are saved only in process memory and disappear on restart.
8. Database Notes For Production
PostgresKnowledgeStore.ensureSchema() creates:
- schema if missing
vectorextension when enabled- retrieval tables
- standard lookup indexes
You should still make deliberate production decisions about:
- backups
- connection pooling
- RLS or app-enforced tenant isolation
- HNSW indexes per active embedding profile where needed
- schema migration ownership
If you want explicit HNSW index SQL, the library also exposes:
createPgvectorHnswIndexSql()
9. Recommended Production Checklist
- Keep provider credentials in your app environment, not in source control.
- Prefer explicit production wiring for session store and usage logger.
- Use
DATABASE_URLwith explicit SSL settings. - Treat
client.embed()as a vector generation step, not persistence. - Save vectors in
PostgresKnowledgeStoreif you need durable retrieval. - Keep retrieval fully scoped by tenant, bot, knowledge space, and embedding profile.
- Reindex into a new profile when the embedding model or dimensions change.
- Keep live-provider tests opt-in.
- Use the mock client for CI that must not hit external APIs.