Onyx Lite uses under 1GB of memory at baseline. Disk and memory usage scale with the number of files
users upload to the system, since PostgreSQL handles file storage in Lite mode.
OpenSearch enforces a read-only block on indices when disk usage hits the flood stage watermark (default 95%),
which effectively blocks all writes. Monitor disk usage and plan capacity accordingly.
For small to mid scale deployments, we recommend deploying Onyx to a single instance in your cloud provider of choice.When evaluating your instance, follow the Preferred resources in the table above.
For more efficient scaling, you can dedicate resources to each Onyx container using Kubernetes or AWS EKS.See the Onyx Helm chartvalues.yaml for our default requests and limits.
Component
CPU
Memory
api_server
1
2 Gi
background
2
8 Gi
indexing_model_server
2
4 Gi
inference_model_server
2
4 Gi
postgres
2
2 Gi
opensearch
2
4 Gi
nginx
250m (1/4)
128 Mi
If you are using cloud-based embedding models (e.g. OpenAI, Cohere, etc.) instead of locally hosted ones,
the indexing_model_server and inference_model_server will use significantly less memory.
All together, this comes out to a total available node size of at least ~12 CPU and ~24GB of Memory.
The main driver of resource requirements for Standard mode is the number of indexed documents.
This primarily affects the search index (OpenSearch), which is responsible for storing documents
and handling search requests.
OpenSearch memory is split roughly 50/50 between the JVM heap and the OS file system cache.
Both halves are critical — the heap handles indexing and search operations while the file system cache
keeps frequently accessed index segments in memory for fast reads.Key rules for JVM heap sizing:
Set Xms and Xmx to 50% of available RAM (the other 50% goes to OS/file cache)
OpenSearch resource requirements scale linearly with the volume of indexed data. The exact ratio depends
on deployment size — large distributed clusters are more efficient per GB than single-node deployments
due to fixed per-node overhead (cluster management, garbage collection, segment merging).Industry guidelines for large clusters suggest a memory-to-data ratio of around 1:16 for search-heavy
workloads. However, for the single-node deployments typical of self-hosted Onyx, the fixed overhead
per node is a much larger fraction of total resources.Based on our experience, we recommend the following for single-node or small-cluster deployments:
Scale
Memory per 1 GB of source docs
CPU per 1 GB of source docs
Small (< 5 GB)
~2 GB
~0.25 CPU
Medium (5–50 GB)
~1.5 GB
~0.25 CPU
Large (50+ GB)
~1 GB
~0.2 CPU
The per-GB cost decreases at larger scale because the fixed baseline overhead is amortized.
At very large scale, consider a dedicated OpenSearch cluster or a managed service.Other factors that may affect resource requirements include:
The embedding model and vector dimensions
Whether you have quantization and dimensional reduction enabled
For a deployment with 10GB of text content, your opensearch component will need:
Memory: 4 (base) + 10 × 1.5 = 19 GB
CPU: 2 (base) + 10 × 0.25 = 4.5 cores
If deploying in a single instance, this would be in addition to the base requirements. Overall, that would take us to
= 9 CPU and >= 35GB of memory.
Given these requirements, a m7g.2xlarge or c5.4xlarge EC2 instance would be appropriate.If deploying with Kubernetes or AWS EKS, this would give a per-component resource allocation of:
Component
CPU
Memory
api_server
1
2 Gi
background
2
8 Gi
indexing_model_server
2
4 Gi
inference_model_server
2
4 Gi
postgres
2
4 Gi
opensearch
5
19 Gi
Total available node size: ~14 CPU and ~41GB of Memory.