Skip to main content

EmbeddingServer

EmbeddingServer defines a containerized embedding model server managed by the ToolHive operator. The VirtualMCPServer optimizer references an EmbeddingServer to generate vector embeddings for tool discovery.

API: toolhive.stacklok.dev/v1beta1 · Scope: Namespaced · Short names: emb, embedding

Example

embeddingserver.yaml
apiVersion: toolhive.stacklok.dev/v1beta1
kind: EmbeddingServer
metadata:
name: my-embeddingserver
namespace: default
spec: {}

Schema

spec

EmbeddingServerSpec defines the desired state of EmbeddingServer

FieldTypeDescription
argsstring[]

Args are additional arguments to pass to the embedding inference server

envobject[]

Env are environment variables to set in the container

hfTokenSecretRefobject

HFTokenSecretRef is a reference to a Kubernetes Secret containing the huggingface token. If provided, the secret value will be provided to the embedding server for authentication with huggingface.

imagestring

Image is the container image for the embedding inference server. Images must be from HuggingFace Text Embeddings Inference (https://github.com/huggingface/text-embeddings-inference).


default "ghcr.io/huggingface/text-embeddings-inference:cpu-latest"
imagePullPolicystring

ImagePullPolicy defines the pull policy for the container image


default "IfNotPresent" · enum: Always | Never | IfNotPresent
modelstring

Model is the HuggingFace embedding model to use (e.g., "sentence-transformers/all-MiniLM-L6-v2")


default "BAAI/bge-small-en-v1.5"
modelCacheobject

ModelCache configures persistent storage for downloaded models When enabled, models are cached in a PVC and reused across pod restarts

podTemplateSpecobject

PodTemplateSpec allows customizing the pod (node selection, tolerations, etc.) This field accepts a PodTemplateSpec object as JSON/YAML. Note that to modify the specific container the embedding server runs in, you must specify the 'embedding' container name in the PodTemplateSpec.

portinteger

Port is the port to expose the embedding service on


default 8080 · format int32 · min 1 · max 65535
replicasinteger

Replicas is the number of embedding server replicas to run


default 1 · format int32 · min 1
resourceOverridesobject

ResourceOverrides allows overriding annotations and labels for resources created by the operator

resourcesobject

Resources defines compute resources for the embedding server

spec.env[]

Env are environment variables to set in the container

FieldTypeDescription
namerequiredstring

Name of the environment variable

valuerequiredstring

Value of the environment variable

spec.hfTokenSecretRef

HFTokenSecretRef is a reference to a Kubernetes Secret containing the huggingface token. If provided, the secret value will be provided to the embedding server for authentication with huggingface.

FieldTypeDescription
keyrequiredstring

Key is the key within the secret

namerequiredstring

Name is the name of the secret

spec.modelCache

ModelCache configures persistent storage for downloaded models When enabled, models are cached in a PVC and reused across pod restarts

FieldTypeDescription
accessModestring

AccessMode is the access mode for the PVC


default "ReadWriteOnce" · enum: ReadWriteOnce | ReadWriteMany | ReadOnlyMany
enabledboolean

Enabled controls whether model caching is enabled


default true
sizestring

Size is the size of the PVC for model caching (e.g., "10Gi")


default "10Gi"
storageClassNamestring

StorageClassName is the storage class to use for the PVC If not specified, uses the cluster's default storage class

spec.resourceOverrides

ResourceOverrides allows overriding annotations and labels for resources created by the operator

FieldTypeDescription
persistentVolumeClaimobject

PersistentVolumeClaim defines overrides for the PVC resource

serviceobject

Service defines overrides for the Service resource

statefulSetobject

StatefulSet defines overrides for the StatefulSet resource

spec.resourceOverrides.persistentVolumeClaim

PersistentVolumeClaim defines overrides for the PVC resource

FieldTypeDescription
annotationsmap<string, string>

Annotations to add or override on the resource

labelsmap<string, string>

Labels to add or override on the resource

spec.resourceOverrides.service

Service defines overrides for the Service resource

FieldTypeDescription
annotationsmap<string, string>

Annotations to add or override on the resource

labelsmap<string, string>

Labels to add or override on the resource

spec.resourceOverrides.statefulSet

StatefulSet defines overrides for the StatefulSet resource

FieldTypeDescription
annotationsmap<string, string>

Annotations to add or override on the resource

labelsmap<string, string>

Labels to add or override on the resource

podTemplateMetadataOverridesobject

PodTemplateMetadataOverrides defines metadata overrides for the pod template

spec.resourceOverrides.statefulSet.podTemplateMetadataOverrides

PodTemplateMetadataOverrides defines metadata overrides for the pod template

FieldTypeDescription
annotationsmap<string, string>

Annotations to add or override on the resource

labelsmap<string, string>

Labels to add or override on the resource

spec.resources

Resources defines compute resources for the embedding server

FieldTypeDescription
limitsobject

Limits describes the maximum amount of compute resources allowed

requestsobject

Requests describes the minimum amount of compute resources required

spec.resources.limits

Limits describes the maximum amount of compute resources allowed

FieldTypeDescription
cpustring

CPU is the CPU limit in cores (e.g., "500m" for 0.5 cores)

memorystring

Memory is the memory limit in bytes (e.g., "64Mi" for 64 megabytes)

spec.resources.requests

Requests describes the minimum amount of compute resources required

FieldTypeDescription
cpustring

CPU is the CPU limit in cores (e.g., "500m" for 0.5 cores)

memorystring

Memory is the memory limit in bytes (e.g., "64Mi" for 64 megabytes)

status

EmbeddingServerStatus defines the observed state of EmbeddingServer

FieldTypeDescription
conditionsobject[]

Conditions represent the latest available observations of the EmbeddingServer's state

messagestring

Message provides additional information about the current phase

observedGenerationinteger

ObservedGeneration reflects the generation most recently observed by the controller


format int64
phasestring

Phase is the current phase of the EmbeddingServer


enum: Pending | Downloading | Ready | Failed | Terminating
readyReplicasinteger

ReadyReplicas is the number of ready replicas


format int32
urlstring

URL is the URL where the embedding service can be accessed

status.conditions[]

Conditions represent the latest available observations of the EmbeddingServer's state

FieldTypeDescription
lastTransitionTimerequiredstring

lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.


format date-time
messagerequiredstring

message is a human readable message indicating details about the transition. This may be an empty string.


maxLength 32768
observedGenerationinteger

observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.


format int64 · min 0
reasonrequiredstring

reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.


pattern ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ · minLength 1 · maxLength 1024
statusrequiredstring

status of the condition, one of True, False, Unknown.


enum: True | False | Unknown
typerequiredstring

type of condition in CamelCase or in foo.example.com/CamelCase.


pattern ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ · maxLength 316

Referenced by: