Scaling

Scaling your bot.

Run a single bot as several concurrent worker instances to handle more traffic. Opt in with one flag, start the same bot in multiple processes, and the gateway load-balances dispatches across them.

Opt in to scaling

Construct the bot with scalable: true, then run that SAME bot in as many processes as you want. Pool membership binds to the bot handle, not the key — any valid key for the bot joins the same pool.

import { Bot } from '@localhostdevs/sdk';

const bot = new Bot({ id: 'my-handle', scalable: true });

bot.cmd('ping', async (ctx) => {
  await ctx.reply({ text: 'pong' });
});

await bot.connect();

Start the same file on a second machine, a second terminal, a container replica — however you like. Each process that connects with scalable: true for the same handle joins the pool as another worker instance.

One dispatch, one instance

Each consumer dispatch runs on exactly one instance.

The gateway places every instance of a bot into a shared NATS queue-group, so a given command is delivered to a single worker — never fanned out to all of them. Adding instances increases throughput; it does not duplicate commands.

Idempotency

Handlers should be safe to run more than once.

If an instance crashes mid-dispatch, the gateway retries that in-flight dispatch on a sibling instance. Retry behaviour is configurable per bot and defaults to 3 attempts with a 3-second timeout between them. Because a command can therefore land on a second instance, write your handlers so running them twice is harmless — guard side effects with the dispatch id, upsert instead of insert, and avoid actions that double-charge or double-send.

Heads up. The SDK already de-dups repeated msgIds within a single process via its idempotency LRU. The retry above is cross-instance, so your own side effects still need to be idempotent.

Stay stateless across commands

No instance is guaranteed to handle a consumer's next command — so don't rely on in-process memory between commands.

Anything durable must live outside the process: a database, a cache, object storage, an external API. Treat each command as if it could run on a fresh worker that has never seen this consumer before. In-memory counters, per-session caches, and "remember the last message" tricks will break the moment a sibling instance handles the follow-up.

Per-plan instance caps

How many concurrent instances of one bot you can run depends on your plan.

Plan	Max instances per bot
Free	1
Pro	5
Studio	10
Business	50

Extra instances beyond your cap are rejected at connect time. Upgrade on your plan page to raise the limit.