Unified LLM Client Embeddings Integration Report
Date: April 23, 2026
Current project decision update: 2026-04-24
- The active implementation scope is now Google Embedding 2 only for embeddings in v1.
- OpenAI embedding references below should be treated as deferred roadmap material, not the current delivery plan.
- Anthropic remains unsupported for embeddings.
1. Executive Summary
The chatbot widget PRD requires a knowledge-base path where users upload PDFs, URLs, and FAQs; the system indexes them; and the chatbot later retrieves only the most relevant chunks before answering. That requires embeddings, but the current unified-llm-client implementation is completion-focused and does not expose a first-class embedding API.
Recommendation:
- Add a first-class
client.embed()API tounified-llm-client. - Support
googleonly withgemini-embedding-2, because the product PRD chooses Gemini Embedding 2 for multimodal/PDF knowledge ingestion. - Defer
openaiembeddings in v1. - Reject
anthropicfor embeddings unless a separate Voyage/Cohere-style provider is added later. - Keep chunking, PDF page splitting, vector storage, BM25/reranking, citations, ingestion jobs, and tenant isolation in the chatbot application, not in the unified client library.
The unified client should generate vectors. It should not become the full RAG pipeline.
2. Why Embeddings Are Needed
We should not pass full PDFs, websites, or FAQ collections into the chat model on every message. That would clog context, increase latency/cost, and degrade answer quality as the corpus grows.
The correct flow is retrieval-augmented generation:
- Ingest-time: turn each source into searchable units, embed those units, and store vectors with metadata.
- Runtime: embed the visitor question using the same embedding model/profile.
- Search: run
pgvectorsimilarity search with stricttenant_idandbot_idfilters. - Answer: pass only the top relevant snippets/citations into the chat model.
This matches the PRD and research docs:
docs/Onboarding_flow.mdstep 3 requires "Add knowledge" with ingestion progress and "ready to answer" status.docs/Chatbot_widget_PRD.mddescribespgvectorsimilarity search overknowledge_chunks.docs/Chatbot_Widget_Research.mdchooses Google Gemini Embedding 2 and Postgres + pgvector for the knowledge layer.CLAUDE.mdexplicitly says the widget uses Gemini Embedding 2 vectors in pgvector for uploaded documents.
3. Current Unified Client State
Implementation repo from CLAUDE.md:
- GitHub:
https://github.com/07rjain/LLMlibrary - Local inspection path used for this report:
/tmp/LLMlibrary - Package name:
unified-llm-client
Current capabilities:
LLMClient.complete()andLLMClient.stream()- Conversation/session APIs
- Tool calling
- Usage logging
- Model registry and pricing
- Provider adapters for Anthropic, OpenAI, and Gemini using raw
fetch - Edge-safe build, no provider SDK dependencies in core
Missing capabilities:
- No
client.embed()method. - No embedding request/response types in
src/types.ts. - OpenAI embeddings are intentionally out of scope for the first release.
- No Gemini embeddings transport in
src/providers/gemini.ts. - No embedding model metadata in
ModelInfo. - No model-kind validation to prevent calling
complete()with an embedding model. - No embedding usage/cost pathway.
- No mock embedding support for tests.
- No package/docs examples for embeddings.
Relevant current files:
src/types.tsalready hasDocumentPart,ImageBase64Part,ImageUrlPart, andAudioPart. This is useful because Gemini Embedding 2 accepts multimodal content.src/providers/gemini.tsalready translatesdocument,image_base64,image_url, andaudioparts into GeminiinlineData/fileData.src/providers/openai.tscurrently only calls/v1/responses.src/client.tsdispatches completion and streaming only.src/models/registry.tsonly validates completion-oriented capability flags.src/usage.tsstores completion-shaped usage events withfinish_reason.package.jsonandtsup.config.tsexport existing entries only; there is no embeddings entry.
Important correction:
/tmp/LLMlibrary/docs/EMBEDDINGS_REPORT.md already exists, but it is stale where it says Gemini's stable embedding model is gemini-embedding-001 and treats Gemini as text-only. Current Google docs list gemini-embedding-2 as GA, released April 22, 2026, with multimodal inputs including PDF.
4. Provider Reality Check
Google Gemini Embedding 2
Google's current Gemini Embedding 2 model details state:
- Model ID:
gemini-embedding-2 - Launch stage: GA
- Release date: April 22, 2026
- Inputs: text, images, audio, video, PDF
- Output: embeddings
- Default output size: 3072 dimensions
- Adjustable output dimensionality
- Maximum input tokens: 8,192
- PDF support:
application/pdf - PDF limit: 1 file per prompt, maximum 6 pages per file
- Supports document OCR
- Supports custom task instructions
Practical implication for the widget:
- We can embed PDF pages or small PDF page groups natively, without OCRing the full PDF ourselves first.
- We still need to split large PDFs, because the model has page and token limits.
- For best citations and progress reporting, the app should still extract/store text snippets or page labels alongside vectors.
Gemini API vs Vertex AI
The current unified client Gemini adapter uses GEMINI_API_KEY and the Gemini API style endpoint. The user-provided docs are Vertex AI model docs. For our demo, the Gemini API key path worked with gemini-embedding-2.
Recommended scope:
- Phase 1: implement Gemini API key support through
https://generativelanguage.googleapis.com/v1beta/models/{model}:embedContent. - Phase 2: add optional Vertex AI support if the product needs Google Cloud project/region/OAuth controls.
Do not block the product MVP on Vertex AI auth unless enterprise controls are needed immediately.
OpenAI Embeddings
OpenAI is deferred for embeddings in v1.
OpenAI supports text embeddings through POST /v1/embeddings.
Relevant behavior:
- Request accepts
input,model, optionaldimensions, optionalencoding_format, and optionaluser. - Response includes
data[].embedding,data[].index,model, and token usage. text-embedding-3-largesupports configurable dimensions and has a 3072-dimensional native size.
Practical implication:
- OpenAI remains useful as a reference point and future expansion option.
- It should not be used for native PDF/image/audio/video embeddings in this library.
- It is not part of the first embeddings release plan.
Anthropic
Anthropic should not be exposed as an embeddings provider in v1. The library should throw ProviderCapabilityError for provider: 'anthropic' in client.embed().
If a future Anthropic-adjacent embedding path is required, add a real provider such as Voyage AI. Do not fake Anthropic embeddings through Claude.
5. Recommended Public API
Add a first-class method:
const response = await client.embed({
provider: 'google',
model: 'gemini-embedding-2',
input: 'What is your return policy?',
dimensions: 3072,
taskInstruction: 'Embed this customer question for question-answering retrieval.',
tenantId: 'tenant_123',
botId: 'bot_123',
});Recommended core types:
export type EmbeddingProvider = Extract<
CanonicalProvider,
'google' | 'mock'
>;
export type EmbeddingPurpose =
| 'retrieval_query'
| 'retrieval_document'
| 'semantic_similarity'
| 'classification'
| 'clustering'
| 'question_answering'
| 'code_retrieval';
export type EmbeddingInput =
| string
| EmbeddingContent
| Array<string | EmbeddingContent>;
export interface EmbeddingContent {
metadata?: Record<string, unknown>;
parts: EmbeddingPart[];
title?: string;
}
export type EmbeddingPart =
| TextPart
| ImageBase64Part
| ImageUrlPart
| DocumentPart
| AudioPart
| VideoPart;
export interface VideoPart {
data?: string;
mediaType: string;
type: 'video';
url?: string;
}
export interface EmbeddingRequestOptions {
botId?: string;
dimensions?: number;
input: EmbeddingInput;
model?: string;
provider?: EmbeddingProvider;
providerOptions?: EmbeddingProviderOptions;
purpose?: EmbeddingPurpose;
signal?: AbortSignal;
taskInstruction?: string;
tenantId?: string;
}
export interface EmbeddingProviderOptions {
google?: GoogleEmbeddingOptions;
}
export interface GoogleEmbeddingOptions {
taskInstruction?: string;
taskType?: string; // legacy/compat option, not the primary Gemini 2 design
title?: string;
}
export interface EmbeddingResultItem {
dimensions: number;
index: number;
metadata?: Record<string, unknown>;
values: number[];
}
export interface EmbeddingUsageMetrics {
cost?: string;
costUSD?: number;
estimated?: boolean;
inputTokens?: number;
totalTokens?: number;
}
export interface EmbeddingResponse {
embeddings: EmbeddingResultItem[];
model: string;
provider: EmbeddingProvider;
raw: unknown;
usage?: EmbeddingUsageMetrics;
}Design notes:
inputmust support single and batch inputs.- For OpenAI, only string and string-array inputs should be accepted.
- For Gemini, multimodal
EmbeddingContentshould be accepted. taskInstructionshould be first-class because Gemini Embedding 2 docs highlight task instructions.purposeshould be canonical and provider-neutral. The adapter can map it to provider-specific behavior.- Keep
GoogleEmbeddingOptions.taskTypeonly for backwards compatibility with older Gemini embedding APIs and current generic Gemini API docs. Do not force the whole public API around the oldertaskTypeenum.
6. Gemini Adapter Changes
File: src/providers/gemini.ts
Add:
GeminiEmbeddingOptionsGeminiEmbeddingResponseGeminiEmbeddingUsageMetadataembed(options: GeminiEmbeddingOptions): Promise<EmbeddingResponse>- Request translation from
EmbeddingInputto GeminiContent - Response translation from Gemini
embedding.valuesorembeddings[] - Usage translation from
usageMetadatawhen returned
Recommended endpoint:
POST /v1beta/models/{model}:embedContentRequest shape for a single item:
{
"content": {
"parts": [
{ "text": "Embed this document for question-answering retrieval." },
{
"inlineData": {
"mimeType": "application/pdf",
"data": "<base64-pdf-page>"
}
}
]
},
"outputDimensionality": 3072
}Adapter behavior:
- Reuse the existing Gemini
translateGeminiPart()logic fordocument,image_base64,image_url, andaudio. - Add
VideoPartsupport if we want full Gemini Embedding 2 modality coverage. - If
taskInstructionis present, prepend it as a text part. - If
purposeis present and no explicittaskInstructionis supplied, map it to a default instruction. - Validate Gemini PDF limits at the app layer, not in the adapter, because the adapter cannot know page count from base64 safely.
- Allow
dimensionsto map tooutputDimensionality. - Normalize
models/gemini-embedding-2andgemini-embedding-2consistently, matching existing Gemini model naming behavior. - For batch inputs, start with sequential
embedContentcalls for correctness. AddbatchEmbedContentslater as an optimization.
Suggested default task instructions:
const googleTaskInstructions = {
retrieval_query:
'Embed this user question for question-answering retrieval.',
retrieval_document:
'Embed this knowledge-base document for question-answering retrieval.',
semantic_similarity:
'Embed this content for semantic similarity comparison.',
classification:
'Embed this content for classification.',
clustering:
'Embed this content for clustering.',
question_answering:
'Embed this content for question answering.',
code_retrieval:
'Embed this content for code retrieval.',
} satisfies Record<EmbeddingPurpose, string>;7. OpenAI Adapter Changes
File: src/providers/openai.ts
Add:
OpenAIEmbeddingOptionsOpenAIEmbeddingResponsePayloadembed(options: OpenAIEmbeddingOptions): Promise<EmbeddingResponse>- Request translation to
/v1/embeddings - Response translation from
data[].embedding - Usage translation from
usage.prompt_tokensandusage.total_tokens
Request shape:
{
"model": "text-embedding-3-large",
"input": ["What is your return policy?"],
"dimensions": 3072,
"encoding_format": "float"
}Adapter behavior:
- Accept string or string-array only.
- Reject
DocumentPart, image, audio, and video withProviderCapabilityError. - Map
dimensionsto OpenAIdimensions. - Map
providerOptions.openai.encodingFormattoencoding_format. - Map OpenAI
usage.prompt_tokenstoinputTokens. - Map OpenAI
usage.total_tokenstototalTokens.
8. Client Changes
File: src/client.ts
Add constructor options:
export interface LLMClientOptions {
defaultEmbeddingModel?: string;
defaultEmbeddingProvider?: EmbeddingProvider;
}Add public method:
async embed(options: EmbeddingRequestOptions): Promise<EmbeddingResponse>Add private methods:
resolveEmbeddingRequest()dispatchEmbed()logEmbeddingUsageEvent()if usage logging is extended
Resolution rules:
- Use
options.modelif provided. - Else use
defaultEmbeddingModel. - Else use provider default:
- Google:
gemini-embedding-2 - OpenAI:
text-embedding-3-largeortext-embedding-3-small, depending on product cost preference
- Google:
- Resolve provider from explicit
options.provider, model registry metadata, ordefaultEmbeddingProvider. - Reject unknown provider/model combinations.
- Reject completion models in
embed(). - Reject embedding models in
complete()andstream().
The embedding provider must be independent from the chat provider. The chatbot can use Claude for generation and Gemini for embeddings in the same request lifecycle.
9. Model Registry Changes
Files:
src/types.tssrc/models/registry.tssrc/models/prices.jsonsrc/models/prices.ts
Extend ModelInfo:
export interface ModelInfo {
cacheReadPrice?: number;
cacheWritePrice?: number;
contextWindow: number;
embeddingDimensions?: {
default: number;
max: number;
min?: number;
recommended?: number[];
};
id: string;
inputPrice: number;
kind?: 'completion' | 'embedding';
lastUpdated: string;
maxPdfPages?: number;
outputPrice: number;
provider: CanonicalProvider;
supportedInputModalities?: Array<
'text' | 'image' | 'document' | 'audio' | 'video'
>;
supportsStreaming: boolean;
supportsTools: boolean;
supportsVision: boolean;
}Compatibility rule:
- Existing models can omit
kind, and the registry treats missingkindas'completion'.
Add registry helpers:
assertModelKind(modelId: string, kind: 'completion' | 'embedding'): ModelInfoAdd model entries:
{
"gemini-embedding-2": {
"provider": "google",
"kind": "embedding",
"contextWindow": 8192,
"inputPrice": 0,
"outputPrice": 0,
"supportsStreaming": false,
"supportsTools": false,
"supportsVision": false,
"embeddingDimensions": {
"default": 3072,
"max": 3072,
"recommended": [3072, 1536, 768]
},
"supportedInputModalities": ["text", "image", "document", "audio", "video"],
"maxPdfPages": 6,
"lastUpdated": "2026-04-22"
}
}Also add:
text-embedding-3-smalltext-embedding-3-large
Do not invent pricing. Before implementation, update inputPrice from official provider pricing pages and make pnpm pricecheck pass. If pricing is not final or usage metadata is incomplete, return usage.estimated = true or leave cost undefined for that provider.
10. Usage Logging Changes
Current UsageEvent assumes completion-style requests with finishReason, input tokens, output tokens, and cached tokens.
Recommended change:
export type UsageOperation = 'completion' | 'embedding';
export interface UsageEvent extends UsageMetrics {
operation?: UsageOperation;
embeddingDimensions?: number;
finishReason?: CanonicalFinishReason;
}Postgres schema changes:
- Add
operation TEXT NOT NULL DEFAULT 'completion' - Add
embedding_dimensions INTEGER - Make
finish_reasonnullable or default it to'' - Add index
(tenant_id, operation, timestamp DESC)
Compatibility:
- Existing completion events should behave the same.
- Embedding events should not fake
finishReason: 'stop'just to fit the old schema. - If schema migration is too much for the first patch, return embedding usage in
EmbeddingResponseand skip persistent logging until the schema is updated.
11. Package Exports And Build Changes
Files:
src/index.tssrc/client.tssrc/types.tspackage.jsontsup.config.ts
Minimum:
- Export embedding types from the main package surface.
- Export
client.embed()through the existingunified-llm-clientandunified-llm-client/cliententries.
Optional:
- Add
src/embeddings.tsand package exportunified-llm-client/embeddingsif embedding helpers become large.
Recommended first patch:
- Keep types in
src/types.ts. - Keep method on
LLMClient. - Avoid a new entry point unless needed.
This keeps the API small and avoids unnecessary package export churn.
12. Test Plan
Unit tests:
test/client.test.ts:client.embed()resolves model/provider correctly.test/client.test.ts:client.embed()usesdefaultEmbeddingModel.test/client.test.ts:complete()rejects embedding models.test/client.test.ts:embed()rejects completion models.test/client.test.ts: Anthropic embedding requests throwProviderCapabilityError.test/gemini.adapter.test.ts: text embedding request maps toembedContent.test/gemini.adapter.test.ts:dimensionsmaps tooutputDimensionality.test/gemini.adapter.test.ts: PDFDocumentPartmaps toinlineDatawithapplication/pdf.test/gemini.adapter.test.ts:taskInstructionis prepended as a text part.test/gemini.adapter.test.ts: responseembedding.valuesmaps toEmbeddingResponse.embeddings[0].values.test/openai.adapter.test.ts: string input maps to/v1/embeddings.test/openai.adapter.test.ts: string array maps to a batch request.test/openai.adapter.test.ts: multimodal input throws provider capability error.test/model-registry.test.ts:kinddefaults to completion.test/model-registry.test.ts: embedding metadata is returned correctly.test/usage.test.ts: embedding operation is logged/aggregated if usage schema is extended.
Mock client tests:
- Add
embeddings?: Array<EmbeddingResponse | ((options) => EmbeddingResponse | Promise<EmbeddingResponse>)>toMockLLMClientOptions. - Default mock embedding can return a deterministic small vector.
Live tests:
- Gated behind
LIVE_TESTS=1. - Google text embedding smoke test using
GEMINI_API_KEY. - Google PDF embedding smoke test with a tiny one-page PDF fixture.
- OpenAI text embedding smoke test using
OPENAI_API_KEY. - Assert dimensions match requested dimensions.
Quality gates:
pnpm typecheckpnpm lintpnpm testpnpm buildpnpm edgecheckpnpm sizecheckpnpm pricecheck
13. Widget App Integration
The unified client should not own ingestion. The widget app should own source management, vector storage, retrieval policy, progress, and citations.
Recommended app tables:
embedding_profiles (
id uuid primary key,
tenant_id uuid not null,
bot_id uuid not null,
provider text not null,
model text not null,
dimensions int not null,
purpose text not null,
task_instruction text,
created_at timestamptz not null default now()
);
knowledge_sources (
id uuid primary key,
tenant_id uuid not null,
bot_id uuid not null,
type text not null,
name text not null,
status text not null,
progress_percent int not null default 0,
error_message text,
embedding_profile_id uuid references embedding_profiles(id),
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
knowledge_chunks (
id uuid primary key,
tenant_id uuid not null,
bot_id uuid not null,
source_id uuid not null references knowledge_sources(id),
embedding_profile_id uuid not null references embedding_profiles(id),
content text not null,
citation jsonb not null,
metadata jsonb not null default '{}',
embedding vector(3072) not null,
created_at timestamptz not null default now()
);Every query must include:
where tenant_id = $1
and bot_id = $2
and embedding_profile_id = $3Do not search across tenants, bots, or embedding profiles.
14. Onboarding Flow Impact
Current onboarding step:
Add knowledge. User uploads PDFs or adds URLs/FAQs; the system indexes them and shows ingestion progress plus "ready to answer" status.
Recommended implementation:
- User uploads PDF or adds URL/FAQ.
- App creates a
knowledge_sourcesrow withstatus = 'queued'. - Background job starts and sets
status = 'processing'. - For PDFs:
- Split large PDFs into page or <=6-page units for Gemini Embedding 2.
- Send the page PDF/document part to
client.embed(). - Extract or store display text/snippet for citations.
- For URLs:
- Fetch readable content with the app ingestion service.
- Chunk text.
- Call
client.embed()for each chunk.
- For FAQs:
- Treat each Q/A pair as one or more text chunks.
- Call
client.embed().
- Store vectors in
knowledge_chunks. - Update progress after each successful chunk/page.
- Mark source
readywhen all chunks are stored. - Preview chat uses
search_knowledge_baseto retrieve chunks and show citations.
Status model:
queuedprocessingreadyfailedneeds_reindex
User-facing statuses:
- "Waiting to index"
- "Indexing page 3 of 18"
- "Ready to answer"
- "Failed: unsupported PDF or API error"
- "Needs re-indexing: embedding settings changed"
15. Implementation Task Breakdown
Epic 1: Public Embedding Types
Tasks:
- Add
EmbeddingProvider,EmbeddingPurpose,EmbeddingInput,EmbeddingContent,EmbeddingPart,VideoPart. - Add
EmbeddingRequestOptions,EmbeddingResponse,EmbeddingResultItem,EmbeddingUsageMetrics. - Add provider-specific embedding option types.
- Export all types from the package root.
Acceptance criteria:
- TypeScript users can import embedding types from
unified-llm-client. - Existing completion type exports remain unchanged.
- No runtime dependency is added.
Epic 2: Model Registry Support
Tasks:
- Add optional
kindtoModelInfo. - Add embedding metadata fields.
- Add
assertModelKind(). - Default missing
kindtocompletion. - Add Gemini and OpenAI embedding models to prices/registry.
- Update pricing freshness checks if they assume all models have output pricing.
Acceptance criteria:
- Completion models still work without edits by consumers.
complete()rejects embedding models.embed()rejects completion models.- Registry lists embedding metadata.
Epic 3: Gemini Embedding Adapter
Tasks:
- Add
GeminiAdapter.embed(). - Add request translator for string and multimodal
EmbeddingContent. - Reuse Gemini part translation for document/image/audio.
- Add
VideoParttranslation. - Support
dimensionsasoutputDimensionality. - Support
taskInstruction. - Parse
embedding.values. - Parse
usageMetadatawhen present. - Map Gemini errors through existing error handling.
Acceptance criteria:
- Text embedding works.
- PDF page embedding works.
- Requested dimensions are respected.
- No SDK dependency is introduced.
- Existing Gemini completion and cache tests still pass.
Epic 4: OpenAI Embedding Adapter
Tasks:
- Add
OpenAIAdapter.embed(). - Add request translator for text and text batches.
- Add
/v1/embeddingstransport. - Parse
data[].embedding. - Parse
usage. - Reject multimodal inputs.
Acceptance criteria:
- Single text embedding works.
- Batch text embedding works.
- Dimensions are passed through.
- Usage metrics are returned.
- Existing OpenAI completion tests still pass.
Epic 5: LLMClient Dispatch
Tasks:
- Add
defaultEmbeddingModelanddefaultEmbeddingProvider. - Add
client.embed(). - Add embedding request resolution.
- Add embedding dispatch switch.
- Add authentication errors for missing provider keys.
- Add mock embedding queue.
Acceptance criteria:
LLMClient.fromEnv({ defaultEmbeddingModel: 'gemini-embedding-2' }).embed(...)works withGEMINI_API_KEY.- Generation and embedding providers can differ.
- Mock client supports deterministic embeddings in app tests.
Epic 6: Usage And Cost
Tasks:
- Decide whether to persist embedding usage in v1 or return it only in response.
- If persistent, add
operationandembedding_dimensionsto usage events/schema. - Parse OpenAI embedding token usage.
- Parse Gemini
usageMetadatawhen returned. - Leave cost undefined or estimated where pricing/usage is not exact.
Acceptance criteria:
- Usage logging does not break existing completion analytics.
- Embedding usage is not falsely represented as completion usage.
- Costs are marked estimated when they are estimated.
Epic 7: Docs And Examples
Tasks:
- Update README with text embedding example.
- Add Gemini PDF embedding example.
- Add widget RAG example showing ingest-time and runtime query embedding.
- Update API docs / TypeDoc.
- Update stale upstream
docs/EMBEDDINGS_REPORT.md.
Acceptance criteria:
- A developer can copy a minimal
client.embed()example. - Docs clearly state that vector storage and chunking are app responsibilities.
- Docs warn that query and corpus must use the same embedding profile.
Epic 8: Widget App Integration
Tasks:
- Replace raw Gemini demo calls with
unified-llm-clientonce embedding support lands. - Create
embedding_profiles. - Store source status/progress.
- Store vectors in pgvector.
- Ensure every retrieval query filters by tenant, bot, and embedding profile.
- Add citations from source/page/chunk metadata.
- Add re-index flow when model/dimensions/task instruction changes.
Acceptance criteria:
- User can upload a PDF and see ingestion progress.
- User sees "ready to answer" only after vectors are stored.
- Preview chat retrieves from the uploaded document.
- Citations point to the source/page/FAQ.
- Cross-tenant retrieval leakage is impossible by query filters and RLS.
16. Risks And Decisions
Risk: Native PDF Embedding Does Not Remove Ingestion Design
Gemini Embedding 2 can embed PDFs natively, but the product still needs page splitting, source tracking, citations, progress, retries, and vector storage. Native PDF support improves input quality; it does not replace RAG infrastructure.
Risk: Embedding Model Changes Break Search
Vectors from different models or dimensions should not be mixed in one index. The app must store an embedding_profile_id and only search matching profiles.
Risk: Vertex AI Requirements May Appear Later
The current library is API-key based for Gemini. If enterprise Google Cloud controls are required, add a Vertex-specific auth/base URL mode later.
Risk: Provider APIs Are Not Symmetric
Gemini supports multimodal embedding. OpenAI embeddings are text-oriented. The unified API should normalize common behavior but still throw capability errors when a provider cannot support an input.
17. Recommended First PR Scope
Keep the first PR focused:
client.embed()- Google
gemini-embedding-2 - OpenAI
/v1/embeddings - Embedding model registry metadata
- Basic usage metrics in response
- Unit tests and live smoke tests
- Docs examples
Defer:
- Full Vertex AI OAuth/project/region support
- Async/batch embedding jobs inside the library
- Vector DB abstractions
- Chunking abstractions
- Reranking
- Citation assembly
- UI ingestion progress
Those deferred items belong in the chatbot app or later optimization work.
18. Sources
- Project repo from
CLAUDE.md: https://github.com/07rjain/LLMlibrary - Google Gemini Embedding 2 model docs: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/embedding-2
- Google Gemini API embeddings reference: https://ai.google.dev/api/embeddings
- OpenAI embeddings guide: https://developers.openai.com/api/docs/guides/embeddings#obtaining-the-embeddings
- OpenAI embeddings API reference: https://platform.openai.com/docs/api-reference/embeddings/create