GoAgent Source Deep Dive 09: Embedding Service — The Engineering Boundary of Vector Generation
GoAgent Source Deep Dive 09: Embedding Service — The Engineering Boundary of Vector Generation
The Problem: How Do You Turn Text Into Vectors
GoAgent's retrieval depends on vector similarity search. PostgreSQL + pgvector can do vector search, but you need vectors first. How does text become a vector?
Limitations of Existing Approaches
CGO bindings for vector models in Go — Complex compilation, cross-platform difficulties, Go's ML ecosystem is immature. Python subprocess from Go — High latency, complex process management. External API (OpenAI Embedding) — External dependency, network latency and cost, not suitable for local deployment.
All three have engineering pain points. What GoAgent needs: vector generation decoupled from the Go main program, supporting local deployment and flexible backend switching.
GoAgent's Approach
Deploy embedding as an independent HTTP service:
- Python FastAPI implementation, leveraging Python's ML ecosystem.
- Go calls via HTTP, no CGO or subprocess dependency.
- Supports Ollama and SentenceTransformers backends, switchable via environment variable.
- Optional Redis cache, L2 normalization for retrieval consistency.
Architecture Naturally Emerges
Go Side: EmbeddingClient
type EmbeddingClient struct {
baseURL string
httpClient *http.Client
retries int
}
Two core methods: Embed (single text) and EmbedBatch (batch). Constructs JSON, POSTs to /embed or /embed_batch, parses response.
Python Side: FastAPI Service
Three endpoints: /embed, /embed_batch, /health.
Single text flow: check cache → generate vector → L2 normalize → write cache → return. Batch flow: filter cached → batch generate uncached → normalize and cache each → return.
Backend Switching
EMBEDDING_BACKEND environment variable switches Ollama or SentenceTransformers. No code changes. Dev uses sentence-transformers (zero config), production uses Ollama (high performance).
Vector Normalization
All vectors L2-normalized before return (norm = 1). After normalization, cosine similarity equals dot product — faster. Without normalization, long text vectors may have larger magnitudes, causing retrieval bias.
Cache
Redis optional: same text doesn't recompute, reduces backend pressure. Service works without Redis, just uncached.
Design Trade-offs
- Independent service vs embedded: Extra network hop and deployment component, but language decoupling, independent scaling, flexible backend.
- HTTP vs gRPC: Easy debugging, language-agnostic. For request-response patterns, HTTP overhead acceptable.
- Optional vs mandatory cache: Zero config for dev, on-demand for production. Service availability independent of Redis.
Summary
The Embedding Service is the prerequisite for GoAgent's retrieval. Deploying independently is the trade-off between engineering complexity and operational flexibility. Architecture naturally emerged from "how does text become vectors" — Python ecosystem for vector generation, Go for business logic, HTTP as the bridge.