Optimizations for Salesforce Orgs with Lots of Data (Part 1)

If you are constantly hitting your storage limits and have millions of rows of data inside your Salesforce org, you may benefit from some optimizations that will make your org more performant and your deployments smoother. Many of the tips in the next series of posts are expected knowledge for any Salesforce professional seeking a Data Architecture and Management Designer certification or anyone who’s working with large data volumes.

Platform Architecture

Before we start, here’s a refresher on how the Salesforce platform works at a high level. Salesforce has a whitepaper about their architecture and I think it’s important to know how it all ties together before starting your optimization. Combining that fundamental knowledge with techniques applied at the metadata and configuration levels can solve performance issues for large Salesforce orgs.

Salesforce is a multi-tenant platform which employs different technologies under the hood than a traditional relational database. As a multi-tenant and metadata driven platform with polymorphic abilities, the platform uses a runtime engine that surfaces application data from the metadata itself. Salesforce contains three different layers: the database engine (kernel), tenant data and the metadata of the application. Salesforce leverages all three to structure each client’s specific org.

Salesforce uses a Universal Data Dictionary (UDD) to store Objects, Fields, Triggers, VisualForce Pages, and Lightning components that are all registered as metadata solely. This is how Salesforce can handle 2 billion transactions per day while offering a relational database management system and multi-tenant aware query optimizations on top of it. (Salesforce has implemented many optimizations: for example, Apex code is compiled and stored as metadata and interpreted at runtime and cached for scalability.)

When you create application schemas, the UDD keeps track of metadata concerning your objects, their fields, their relationships, and other object attributes. Meanwhile, a few large database tables store the structured and unstructured data for all virtual tables. A set of related multi-tenant indexes, implemented as simple pivot tables with denormalized data, make the combined data set extremely functional.

Salesforce also uses multiple search engine optimizations to handle usage and distribute the load of the platform. For better application response times these engines support full text indexing and searches. As applications evolve, the background services in these search engines create user and tenant specific indexes.

Traditional database systems rely on native database indexes for specific information but Salesforce uses techniques like a single flex column to store the data of many fields with varying structured data types. To ensure your Salesforce application scales performantly, here are a few tips to help decrease operation times dramatically:

Use the most efficient database operation for accomplishing a given task - e.g. for speed of comparison - using Insert is better than Update which is better than Upsert.
Use optimization techniques provided by Salesforce - e.g. custom indexes, skinny tables etc.
Use admin features like Defer Sharing Maintenance or Parallel Sharing Calculation for the right scenario.
Use best practices to accommodate metadata, schema and data load changes.

Salesforce also offers Big Objects as a way to store and manage massive amounts of data on the platform. You can archive data from other objects or bring massive datasets from other systems. We are going to talk about optimizing standard and custom objects, but Big Objects can be utilized by using BULK API or Batch Apex.

When organizations create custom objects, the platform tracks metadata about the objects and their fields, relationships, and other object definition characteristics. Salesforce stores the application data for all virtual tables in a few large database tables, which are partitioned by tenants and serve as heap storage. The platform’s engine then materializes virtual table data at runtime by considering the corresponding metadata.

If there are many runaway queries and performance is not optimized, you will feel it throughout your org (and especially at deployment time). Here are a few telltale signs:

Slow data management operations, record sharing locks and end user locking
Low performance on list views and reports
API integrations and other operations timing out due to governance limits (timeout and CPU especially)
Slow SOQL and SOSL performance, inefficient deployments
Sandbox refresh length increase and inefficient change management
Slow Lightning and VisualForce page and components
Imprecise search results, some records will not show before they are processed

Data Skew

Although data models are hard to change for existing orgs, designing your data model anticipating data growth and building according to best practices will effectively cover your org from the performance issues listed above.

One issue commonly happens when you have more than ten thousand child records in a parent object. Standard objects like Account and Opportunity have built in data relationships and unique sharing models. A common anti-pattern is to have one parent object associated with many children which will cause record locking and low query performance if queries and searches are not selective enough. If this parent is also being updated by various points such as external APIs and manually by users, it will create bottlenecks and failures as the record count increases. If your org relies heavily on visibility and sharing settings based on parent-child relationships, using alternative methods will help with the processing times:

‍

Try using public, read/write sharing as an org-wide default for non-confidential data so that we don’t have to process rules for data that can be accessible for everyone in the org.
Configure child objects to be controlled by the parent wherever possible.
If you are experiencing record locking, retry logic should be in place in case a record is locked.
Process your operations with ParentID to enforce uniqueness and different threading.
Utilize batch Apex and consider timeout values, Bulk API limits, and optimization techniques.

To avoid data skew, try to divide up records among multiple parents and avoid clustering too many records on one user - especially an integration or API user as so often happens. You can also consider using different approaches to role based visibility and sharing by using Case or Lead Assignment rules.

If you are already working with an org with these issues the best thing to do when starting out is to pick a user to assign these child records and move that user above the role hierarchy so that ownership changes and processing does not happen each time.

When you change Sharing Rules, despite the criteria it will recalculate new changes for all relevant records in the system. Role hierarchy and sharing logic changes also get executed each time during deployments, increasing the deployment times.

Another scenario to consider is lookup skew. Lookup skew occurs when you have too many child records against the record you are searching. It’s very easy to create new fields and relationships in Salesforce, but these come with performance costs if you’re dealing with a high volume of data. To pinpoint this performance issue, you need to carefully look at objects with too many insert and update operations.

‍

Search Performance in Salesforce

Salesforce’s search architecture is heavily engineered and optimized for its own data storage. There are many places where it is utilized under the hood:

Web-to-lead and web-to case
Duplicate rules and matching rules
Salesforce Knowledge and Solutions
Lookup fields and find boxes
Global search
The sidebar

Salesforce automatically indexes most text fields, this returns indexed searches in the result set first and then narrows it down by permissions, filters and search limits. SOSL also works in a similar fashion where you can utilize it by API or Apex.

What makes these operations different from a search function operating on a traditional database is that if you apply traditional performance tuning based on the schema of your org, you will see that these are not effective because the platform uses virtual database tables using metadata, data and pivot tables.

Another important point to consider is that for data to be shown in the result set, it must be indexed first. This is why you might not see new records in search results if you have made a large data load operation because records are not indexed yet. (There are many platform quirks like this we want to cover and ease your burden in our next posts.)

Lightning Platform Query Optimizer

The Lightning Platform Query Optimizer helps the database’s own optimizer produce effective queries by providing efficient data access in Salesforce.

The Lightning Platform Query Optimizer works on automatically generated queries that handle reports, list views, and SOQL queries. The optimizer:

Determines the best index from which to drive the query, if possible, based on filters in the query.
Determines the best table from which to drive the query, if no good index is available.
Determines how to order the remaining tables to minimize cost.
Injects custom foreign key value tables that are required to create efficient join paths.
Influences the execution plan for the remaining joins, including sharing joins, to minimize database input and output (I/O).
Updates statistics.

For example, you can also use these specific tips to optimize underperforming reports and dashboards in your org:

Use selective filters and narrow down the data to be queried.
Utilize System Indexing (External Id) and Custom Indexing for the best use case.
Asynchronous reporting to have a report running in the background.
Empty the recycle bin often.
Use URL parameters to filter reports.
Use email delivery methods to receive long running reports.

We’re going to continue in the next part in detail on how to write better queries, utilizing indexes, skinny tables and database statistics. Check out Part 2 of this series for more information.

Table of Contents

Platform Architecture

Data Skew

Search Performance in Salesforce

Lightning Platform Query Optimizer

More like this