How We Simplified Our AI Stack with LiteLLM at Evaneos

The Context: When Technology Becomes a Bit Too “Varied”

At Evaneos, we love experimenting with new technologies. Over the past few months, we’ve been having a lot of fun with generative AI models: multiple use cases, multiple providers, multiple approaches… In short, we’ve multiplied our experiments!

But at some point, we ran into a little problem: a collection of API keys that started to look like a concierge’s keyring 🔑. OpenAI here, Anthropic there, a bit of Gemini, a dash of Deepgram… You get the picture?

And with that came the existential questions of the modern developer:

“Uh… what’s the API key for this model again?”
“How much are we actually spending on requests this month?”
“What if OpenAI goes down?”
“How do we track what’s being sent to the models?”

That’s when we stumbled upon LiteLLM, and thought: “Well, this is exactly what we need!”. Basically, it’s a proxy that unifies access to 100+ AI providers with a single OpenAI-compatible API. One entry point, and behind it, we can juggle all the models we want.

In this article, I am sharing how we set it up on our Kubernetes infrastructure, GitOps-style with FluxCD.

Our Stack: Simple But Solid

We built our setup around a few tools we know well:

Kubernetes (on GKE): our usual orchestrator
FluxCD: to manage everything in GitOps (because kubectl apply by hand is so 2015)
Helm: to properly package LiteLLM
PostgreSQL: to store configs and metrics
Langfuse: the magic tool to trace all our LLM requests and understand what’s really going on

Nothing revolutionary, just pragmatic choices that work well together.

The Setup: Step by Step

1. Organize Your Files (Because We Like Order)

First step: tidy everything up properly in our FluxCD repo. We created a dedicated folder with all the necessary files:

platform/components/helm/litellm/
├── config.yaml                    # Proxy configuration
├── helmrelease.yaml               # Helm definition
├── helmrepository.yaml            # OCI source
├── kustomization.yaml             # Kustomize orchestration
├── ns.yaml                        # Dedicated namespace
└── rbac.yaml                      # Access permissions

2. Point to the Right Helm Chart

LiteLLM has an official Helm chart hosted on GitHub Container Registry. It’s straightforward to declare:

apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: litellm
spec:
  url: oci://ghcr.io/berriai
  interval: 1h
  type: oci

3. The Heart of the Deployment: The HelmRelease

Here, we configure the deployment with a few nice tricks to avoid headaches:

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: litellm
spec:
  releaseName: litellm
  interval: 60m
  driftDetection:
    mode: enabled
  rollback:
    cleanupOnFail: true
  values:
    pdb:
      enabled: true
      maxUnavailable: 1
    resources:
      requests:
        cpu: 200m
        memory: 1500Mi
      limits:
        memory: 4Gi

What Helped Us Out:

Drift Detection: If someone modifies something by hand in prod (we all know it happens 😅), FluxCD detects it
PodDisruptionBudget: To avoid downtime during updates. At least one pod always stays available
Automatic Rollback: If the deployment fails, no panic, back to the previous version
Well-Sized Resources: 1.5Gi of RAM nominally, up to 4Gi if needed. We learned the hard way that LLM proxies can eat a lot of memory!

4. PostgreSQL: We Externalize It (And We’re Glad We Did)

The Helm chart offers to install a little PostgreSQL database with LiteLLM. Spoiler: we didn’t do it. We prefer to have our own managed Postgres instance on the side, configured to connect to our external database with proper credentials stored securely.

Why We Did This:

Backups are managed with the rest of our databases
If we want to scale LiteLLM, we don’t need to worry about the database
Easier to monitor with our usual tools

In short, a separate database = fewer surprises.

5. Secrets (The Sensitive Part)

Let’s be honest, managing API keys is always a bit delicate. We use standard Kubernetes Secrets to store:

Database credentials
API keys from all our providers
The master key to authenticate to the proxy
The salt key to encrypt sensitive info

All sensitive information is managed through Kubernetes Secrets, keeping it properly isolated from your configuration files.

6. Proxy Configuration (The Important Stuff)

This is where the magic happens. We declare all our models in a YAML file deployed as a ConfigMap. The configuration includes:

Security settings: Master key, salt key for encryption
Data persistence: Store all configuration and request logs in the database for auditability
Observability integration: Hook up with tools like Langfuse to trace and understand request patterns
Model definitions: Map each model name to its provider and credentials
Provider credentials: Define API keys and connection details for each AI provider
Advanced features: Support for multiple modalities and flexible provider routing

The structure is clean and maintainable, allowing us to add, remove, or update models without touching any other part of the system.

Here’s the basic structure of our configuration file:

general_settings:
  master_key: os.environ/PROXY_MASTER_KEY
  salt_key: os.environ/LITELLM_SALT_KEY
litellm_settings:
  callbacks: ["langfuse"]
  json_logs: true
model_list:
  # Define your models here, mapped to providers
  - model_name: "provider/*"
    litellm_params:
      model: "provider/*"
      litellm_credential_name: provider_name
credential_list:
  # Store provider credentials
  - credential_name: provider_name
    credential_values:
      api_key: os.environ/PROVIDER_API_KEY

What This Enables:

Database Storage: All config is saved and versioned. Great for tracing history and rollbacks!
Request Logging: Super useful for debugging and understanding what’s happening in production
Observability Integration: Our monitoring tool helps us understand costs, latencies, and performance
Flexible Routing: We can use wildcards to support entire provider families without reconfiguring
Multi-Modal Support: The system handles different types of AI services (text, audio, etc.)

7. Kustomize to Tie It All Together

We use Kustomize to orchestrate everything and automatically generate the ConfigMap from our config file. This approach allows us to:

Keep all Kubernetes resources organized and properly referenced
Automatically inject configuration files as ConfigMaps without manual steps The advantage? Everything is versioned in Git. Config change = pull request = review = automatic deployment. GitOps in all its glory ✨

The Daily Workflow

With FluxCD, we don’t touch our cluster by hand anymore. The workflow is:

Commit: A dev modifies the config in Git (adding a model, changing a version…)
Pull Request: Review by the team (yes, we review configs too!)
Merge: Once validated, we merge
FluxCD Magic: FluxCD detects the change (every hour or via webhook if we’re in a hurry)
Auto Deployment: FluxCD applies the changes. If it breaks? Automatic rollback
Coffee: While FluxCD does the work ☕

It’s simple, traceable, and above all: reproducible.

What We Gain Concretely

More Clarity

Before, each project had its own API keys scattered all over the place. Now? One entry point, one config to manage. New arrivals get the system in 5 minutes.

Less Stress

Thanks to GitOps, we always know what’s running in prod. And if we break something? A simple Git revert and we’re back. No more kubectl apply panic at 2am.

Visibility (Finally!)

With Langfuse hooked up to LiteLLM, we see everything:

How much each request costs
Which model is most used
Where the latencies are
Which prompts work best

It’s like going from driving at night to broad daylight.

Flexibility

“Let’s test Claude instead of GPT-4 on this feature?” → 2 lines to change in a YAML file, a PR, and it’s deployed. We can even have multiple models in fallback.

Security Without Thinking About It

Secrets are in Kubernetes Secrets, access to the proxy is authenticated, communications are encrypted. And all without needing to be a security expert.

Concretely, What Can We Do With It?

A few generic examples of what can be passed through LiteLLM:

Content Generation: Texts, descriptions, suggestions…
Automatic Translation: Multilingual with a single entry point
Audio Transcription: With integrated Deepgram
Analysis and Classification: Sentiment, categorization…

And the best part? We can test a new model without turning everything upside down. Just a config to change.

The Little Things We Learned Along the Way

RAM Is Important

At first, we set 512Mi of RAM. Mistake. The proxy can quickly consume memory, especially if you log prompts (which we recommend). Set at least 1.5Gi, you’ll thank us later.

The Migration Job and FluxCD

Small technical detail: we set ttlSecondsAfterFinished on the migration job for the database. Why? Because otherwise the job stays in the database, and our PodDisruptionBudget counts it as an active pod. That completely messes up our HA! Define a cleanup duration appropriate for your context, it’s important for stability 😅

External Database > Included Database

The chart offers to install Postgres with LiteLLM. We’d advise you to use your own instance instead. It’s more stable, easier to backup, and avoids unpleasant surprises.

Credentials Are Well-Designed

LiteLLM’s credential_name system is really handy. You define your API keys once, then reference them in all your models. Clean and DRY.

To Conclude

Setting up LiteLLM with FluxCD is a bit like installing a good plumbing system: once it’s done, you don’t think about it and it just works. Everything is versioned in Git, deployed automatically, and we can finally focus on our use cases instead of managing API keys.

If you’re in a similar situation to ours (lots of AI, lots of providers, need for clarity), this stack could really make your life easier. The initial effort is definitely worth it.

The code presented here is obviously a simplified version of our actual config (we removed the sensitive stuff), but it should give you a good starting point.

And you, how do you manage your access to LLMs? Do you also use a proxy? Other tools? We’re curious to hear from other teams about this! 🚀

This article reflects our experience at Evaneos. We don’t claim to have THE perfect solution, but it’s what works well for us. If you have questions or feedback, the comments are open!

Useful Resources: