Scaling is one of the most important considerations when developing custom software. Scaling is the process of increasing or decreasing the capacity of the system by changing the number of processes available to service requests. In this article, I will take a high level look at scaling, some architectural design considerations and address some common misconceptions.

Three Types of Scaling

When developing a system, there are different types of scaling to work from. Choosing the right type of scaling depends on the project and the project’s needs. 

The first type of scaling is probably the most basic type, which is vertical scaling, or scaling up. Vertical scaling involves a smaller server scaling to a bigger server, or going from a server with less CPU, ram, and resources to a server that has more CPU, Ram, and resources. We have had an example of this with a client whose application server needed just two more gigabytes of RAM. We didn’t need to add more servers to the mix, that would just cause headaches. By scaling up, we used an easy way to scale the application that gave it more resources. 

Another type of scaling is horizontal scaling, or scaling out. Horizontal scaling increases the number of servers you have running concurrently. What is happening on one server is being mirrored to multiple servers to handle a way to direct some consumers to each of these servers to split the load. If you are doing horizontal scaling, you should also incorporate a load balancing strategy. 

The last type of scaling is splitting one application into multiple applications. Take that monolith application that you have and break it up based on concerns. For example, one of our clients has a lot of email processing needs. They may send out tens of thousands or even hundreds of thousands emails a week. If your main API server is sending emails during different hours of the day, plus handling all other requests, it may not be able to keep up. Instead, introduce a new server with dedicated resources for specific responsibilities. This can add in complexity, like how are the main API server and new email processing server going to communicate? It can be handled a number of different ways, like HTTP requests or queues. 

Some Architecture Considerations

One of the most important things to consider when it comes to scaling is how you design your application. Architecture is critical to implement your scaling strategy. This list of considerations isn’t by any means exhaustive, but you should consider enhancements that will help you with future scalability. 

Queues make workflow asynchronous and easily scalable to handle any load. The queue design requires some thought on how to maintain which messages go to which users. Queues can fan out messages to multiple consumers via a pub/sub model or multiple worker processes can dequeue messages independently and handle the workload concurrently. Their versatility is one reason queues should be in every developer's toolbox for scaling.  

Another thing to consider in your architectural design is how you serve assets. You do not want to serve all static assets from just a web server, instead, put assets on a content delivery network (CDN) to be geographically closer to users for snappier response time. CDNs can be used to your advantage for static assets. With so many front-ends becoming javascript static assets, you can serve whole frontend applications from your CDN. 

Caching is another important element to architectural design. Database queries that rarely see results change can be cached with definable expirations to reduce trips to the database and speed up response time to the client. For example, business pages that list their store hours typically do not change those store hours, so why have a round trip to the database to run a sql query when you could tell caching a key and server to give you a corresponding value to key. This helps reduce the load on other systems. 

Database splitting, or write and read-only databases, is another element to architectural design to reduce database load. For example, if you need databases to be replicated one to another, new values are written to one database server and it can be replicated to a subsequent database server where all read transactions occur. This can help reduce collisions, like attempting to update a table and people trying to read the data at the same time. Database splitting can be a great way to avoid dead locks. 

Microservices breaks the application into distinct processes that cannot affect one another by shared hardware. It allows scaling of only the pieces of an application that need to be scaled. Implementation can vary from containers and Kubernetes to AWS Lambda and Azure Functions. To spin up a new instance of a function could take a few seconds including a cold start, but will be worlds faster then spinning up new server instances. However, microservices can also be overly “micro.” If updating one thing makes you update something else, maybe you should not be breaking them apart.

As an application grows, there can be important database architecture considerations. When you’re first designing your application database, think big. Starting out Int32 IDs may sound fine, but as someone once said, “Using bigint is confidence that your application will last.” How many users do you want to have for your application and how long will your application last? As your application grows, continue to profile slow queries and add missing indexes. Queries that used to run fast due to small amounts of data may become nightmarishly slow. 

Common Misconceptions About Scaling

Have you heard of some of these misconceptions when it comes to scaling? 

  • I can scale automatically because I can use the cloud. 
  • Scaling just means increasing the slider bar to allocate more servers
  • I don’t have to worry about scaling because I use language X, like Elixir or Erlang

Certainly the cloud makes scaling easier. You can complete horizontal and vertical scaling directly from a cloud management portal. However, to scale well is never as simple as increasing a slider bar or changing an instance size. You need to consider your application’s specific needs. Will the actions that can be completed in the cloud actually lead to a performance improvement?

Additionally, just because a language is great at scaling, it does not mean that the code you've written scales effectively.  Developers should avoid preconceived notions of a language’s scalability. Just because WhatsApp uses Erlang to support 2 million active connections, does not mean your application will be able to reach that level of performance more or less automatically simply by using Erlang. Regardless of the language selected, it will still require a lot of effort to ensure your application is hyper-optimized. Preconceptions can go the opposite way as well. Many people say that .NET is terrible for scaling, and so rule it out for large scale applications, while forgetting that StackExchange scales to over 66 million page loads and over 100 million API requests a day, all while using .NET. 


Scaling is a critical consideration when building your application or system. While there are many types of scaling, your application should also be well designed and take into consideration your plans for future growth. No matter your choice of language, you should avoid misconceptions and focus on the key architectural decisions that will allow your application to grow into the future.