Replicante Core is designed as a distributed system meant to scale based on users demand.
Scaling is an advanced topic that requires time and effort.
To achieve optimal value for money when considering the size of the cluster and number of tasks, a degree of familiarity with each Replicante Core component is required.
Replicante Core is at the early stages of development.
While scaling is a core feature of the platform, the limitations and requirements are not yet known. Evolution of the system is likely to lead to changes in the scaling needs and configurations.
Replicante Core stores its state out of process and using existing technologies designed to scale (databases, messaging systems, etcetera …). Process coordination also ensures that exclusive operations are performed safely regardless of the number of processes running.
As a result Replicante Core processes themselves are stateless and can generally be scaled by increasing the number of processes running.
The desired number of processes depends on the user’s deployment configuration and their needs from the cluster.
Signals of the need to scale vary for each component. The list below provides suggestions of what to look at for each component.
components.grafana
and components.webui
):
look at the number of HTTP requests and their duration.
Long running HTTP requests are an indication that something is not well.
If other components and the datastores are healthy, long running HTTP requests may indicate
a need to scale the API components.components.discovery
):
these components only need a single instance running at any given time.
To ensure all functionality remains available more then one instance of each service
should be deployed so if the active instance fails another can take its place.
Running 3 instances of each component should provide high reliability form most situations.
These components should be lightweight enough not to need scaling.
If they do, vertical scaling is the only option at this time.components.workers
):
task queues are backing up (rate of incoming tasks is higher then task processing rate).
Scaling the number of task workers is as easy as running more instances.
If scaling the worker instances is not enough users may need to scale the
task queues system.The more complex aspect of scaling tends to be at the state layer.
In most cases this means that the documentation of the dependencies will be the primary source of information but some replicante-specific details are presented in these pages: