As the volume of data in your Salesforce org grows, so can pain points around performance. This is Part 1 of a 4 part series on how to optimize your Salesforce org for performance.
If you are constantly hitting your storage limits and have millions of rows of data inside your Salesforce org, you may benefit from some optimizations that will make your org more performant and your deployments smoother. Many of the tips in the next series of posts are expected knowledge for any Salesforce professional seeking a Data Architecture and Management Designer certification or anyone who’s working with large data volumes.
Before we start, here’s a refresher on how the Salesforce platform works at a high level. Salesforce has a whitepaper about their architecture and I think it’s important to know how it all ties together before starting your optimization. Combining that fundamental knowledge with techniques applied at the metadata and configuration levels can solve performance issues for large Salesforce orgs.
Salesforce is a multi-tenant platform which employs different technologies under the hood than a traditional relational database. As a multi-tenant and metadata driven platform with polymorphic abilities, the platform uses a runtime engine that surfaces application data from the metadata itself. Salesforce contains three different layers: the database engine (kernel), tenant data and the metadata of the application. Salesforce leverages all three to structure each client’s specific org.
Salesforce uses a Universal Data Dictionary (UDD) to store Objects, Fields, Triggers, VisualForce Pages, and Lightning components that are all registered as metadata solely. This is how Salesforce can handle 2 billion transactions per day while offering a relational database management system and multi-tenant aware query optimizations on top of it. (Salesforce has implemented many optimizations: for example, Apex code is compiled and stored as metadata and interpreted at runtime and cached for scalability.)
When you create application schemas, the UDD keeps track of metadata concerning your objects, their fields, their relationships, and other object attributes. Meanwhile, a few large database tables store the structured and unstructured data for all virtual tables. A set of related multi-tenant indexes, implemented as simple pivot tables with denormalized data, make the combined data set extremely functional.
Salesforce also uses multiple search engine optimizations to handle usage and distribute the load of the platform. For better application response times these engines support full text indexing and searches. As applications evolve, the background services in these search engines create user and tenant specific indexes.
Traditional database systems rely on native database indexes for specific information but Salesforce uses techniques like a single flex column to store the data of many fields with varying structured data types. To ensure your Salesforce application scales performantly, here are a few tips to help decrease operation times dramatically:
Salesforce also offers Big Objects as a way to store and manage massive amounts of data on the platform. You can archive data from other objects or bring massive datasets from other systems. We are going to talk about optimizing standard and custom objects, but Big Objects can be utilized by using BULK API or Batch Apex.
When organizations create custom objects, the platform tracks metadata about the objects and their fields, relationships, and other object definition characteristics. Salesforce stores the application data for all virtual tables in a few large database tables, which are partitioned by tenants and serve as heap storage. The platform’s engine then materializes virtual table data at runtime by considering the corresponding metadata.
If there are many runaway queries and performance is not optimized, you will feel it throughout your org (and especially at deployment time). Here are a few telltale signs:
Although data models are hard to change for existing orgs, designing your data model anticipating data growth and building according to best practices will effectively cover your org from the performance issues listed above.
One issue commonly happens when you have more than ten thousand child records in a parent object. Standard objects like Account and Opportunity have built in data relationships and unique sharing models. A common anti-pattern is to have one parent object associated with many children which will cause record locking and low query performance if queries and searches are not selective enough. If this parent is also being updated by various points such as external APIs and manually by users, it will create bottlenecks and failures as the record count increases. If your org relies heavily on visibility and sharing settings based on parent-child relationships, using alternative methods will help with the processing times:
To avoid data skew, try to divide up records among multiple parents and avoid clustering too many records on one user - especially an integration or API user as so often happens. You can also consider using different approaches to role based visibility and sharing by using Case or Lead Assignment rules.
If you are already working with an org with these issues the best thing to do when starting out is to pick a user to assign these child records and move that user above the role hierarchy so that ownership changes and processing does not happen each time.
When you change Sharing Rules, despite the criteria it will recalculate new changes for all relevant records in the system. Role hierarchy and sharing logic changes also get executed each time during deployments, increasing the deployment times.
Another scenario to consider is lookup skew. Lookup skew occurs when you have too many child records against the record you are searching. It’s very easy to create new fields and relationships in Salesforce, but these come with performance costs if you’re dealing with a high volume of data. To pinpoint this performance issue, you need to carefully look at objects with too many insert and update operations.
Salesforce’s search architecture is heavily engineered and optimized for its own data storage. There are many places where it is utilized under the hood:
Salesforce automatically indexes most text fields, this returns indexed searches in the result set first and then narrows it down by permissions, filters and search limits. SOSL also works in a similar fashion where you can utilize it by API or Apex.
What makes these operations different from a search function operating on a traditional database is that if you apply traditional performance tuning based on the schema of your org, you will see that these are not effective because the platform uses virtual database tables using metadata, data and pivot tables.
Another important point to consider is that for data to be shown in the result set, it must be indexed first. This is why you might not see new records in search results if you have made a large data load operation because records are not indexed yet. (There are many platform quirks like this we want to cover and ease your burden in our next posts.)
The Lightning Platform Query Optimizer helps the database’s own optimizer produce effective queries by providing efficient data access in Salesforce.
The Lightning Platform Query Optimizer works on automatically generated queries that handle reports, list views, and SOQL queries. The optimizer:
For example, you can also use these specific tips to optimize underperforming reports and dashboards in your org:
We’re going to continue in the next part in detail on how to write better queries, utilizing indexes, skinny tables and database statistics. Check out Part 2 of this series for more information.