Scaling

Scaling your bot.

Run a single bot as several concurrent worker instances to handle more traffic. Opt in with one flag, start the same bot in multiple processes, and the gateway load-balances dispatches across them.

Opt in to scaling

Construct the bot with scalable: true, then run that SAME bot in as many processes as you want. Pool membership binds to the bot handle, not the key — any valid key for the bot joins the same pool.

import { Bot } from '@localhostdevs/sdk';

const bot = new Bot({ id: 'my-handle', scalable: true });

bot.cmd('ping', async (ctx) => {
  await ctx.reply({ text: 'pong' });
});

await bot.connect();

Start the same file on a second machine, a second terminal, a container replica — however you like. Each process that connects with scalable: true for the same handle joins the pool as another worker instance.

One dispatch, one instance

Each consumer dispatch runs on exactly one instance.

The gateway places every instance of a bot into a shared NATS queue-group, so a given command is delivered to a single worker — never fanned out to all of them. Adding instances increases throughput; it does not duplicate commands.

Idempotency

Handlers should be safe to run more than once.

If an instance crashes mid-dispatch, the gateway retries that in-flight dispatch on a sibling instance. Retry behaviour is configurable per bot and defaults to 3 attempts with a 3-second timeout between them. Because a command can therefore land on a second instance, write your handlers so running them twice is harmless — guard side effects with the dispatch id, upsert instead of insert, and avoid actions that double-charge or double-send.

Heads up. The SDK already de-dups repeated msgIds within a single process via its idempotency LRU. The retry above is cross-instance, so your own side effects still need to be idempotent.

Stay stateless across commands

No instance is guaranteed to handle a consumer's next command — so don't rely on in-process memory between commands.

Anything durable must live outside the process: a database, a cache, object storage, an external API. Treat each command as if it could run on a fresh worker that has never seen this consumer before. In-memory counters, per-session caches, and "remember the last message" tricks will break the moment a sibling instance handles the follow-up.

Per-plan instance caps

How many concurrent instances of one bot you can run depends on your plan.

PlanMax instances per bot
Free1
Pro5
Studio10
Business50

Extra instances beyond your cap are rejected at connect time. Upgrade on your plan page to raise the limit.