Optimizations for Salesforce Orgs with Lots of Data (Part 4)

This is an intro and here are some key takeways.

Key Takeaways

Foo
Bar
boom

Alright?

This is Part 4 of 4 in our series on optimization tricks and techniques for large Salesforce orgs. You can find Part 3 here.

Optimizing Search

Here’s a common scenario, the customer wants to search large volumes of data across multiple objects using filter values, including wildcards. In this scenario, there’s a UI page where users can enter fields and search using SOQL on the combination of these fields.

One thing to notice here is that there’s no mention of SOSL, you can speed up your queries a lot here by using SOSL because it’s going to run in multiple objects at the same time. If you had to use SOQL only, you would need to query objects multiple times to get the appropriate result set. The optimization is challenging because there is a WHERE clause with multiple filters and wildcards.

To optimize this, try to use only essential fields and reduce the number of fields that could be searched, restricting the number of simultaneous fields that can be used during a single search to the common indexed fields we have mentioned previously.

You can make a single custom object and denormalize data from multiple objects and avoid making multiple queries. You can also dynamically determine whether to use SOQL or SOSL and perform searches based on the number of fields searched and the types of values entered. For example, very specific filter keys should be used for SOQL and wildcards can be used with SOSL instead.

Indexing with Nulls

Let’s say you want to allow nulls in a field and run search queries against them. We have mentioned how single column indexes for picklists and foreign key fields exclude rows in which the column is equal to null, so an index could not be used for the null queries.

The best practice would be to run a query to skip null values initially. You can use a placeholder string in place of null. If you cannot do that, possibly because records already exist in the object with null values, you can create a formula field that displays the text for nulls and index that formula field.

For example, assume the Status field is indexed and contains nulls. Issuing a SOQL query similar to the following prevents the index from being used:

SELECT Name FROM Object WHERE Status__c = ''

Instead, you can create a formula called Status_Value:

Status_Value__c = IF(ISBLANK(Status__c), "blank", Status__c)

This formula field can be indexed and used when you query for a null value:

SELECT Name FROM Object WHERE Status_Value__c = 'blank'

You can also do this with multiple fields:

SELECT Name FROM Object WHERE Status_Value__c = '' OR Email = ''

Related Lists with Large Data

Let’s say you have close to a million accounts and tens of millions of invoices, which reside within a custom object in a master-detail relationship with the account. Each account record takes a long time to display because the records in the related list have a length of rendering time.

We have talked about how data skew can occur and we should break down data to multiple objects instead of a single one and how one record should not have too many child records. To reduce the delay, account records with too many children objects can be broken down to multiple ones and this helps to keep data skew to a minimum. You can also Enable Separate Loading of Related List option to allow for the account details to load first and related list query working separately in the background.

Deletion and Extraction with the Bulk API

The Bulk API supports a hard delete (physical delete) option, which allows records to bypass the Recycle Bin and immediately become available for deletion. Using the Bulk API’s hard delete function is a recommended strategy for deleting large data volumes to free up space and keep extraneous material from affecting performance. Note that the hard delete option is disabled by default and must be enabled by an administrator.

Queries are split into 100k-record chunks by default. You can use the chunkSize header field to configure smaller chunks or larger ones up to 250k during your API calls to the Bulk API. This would enable loading more data, but the processing takes longer. If your org has minimal metadata and sharing you can benefit from using large chunk sizes. At extremely high volumes—hundreds of millions of records—defining these chunks by filtering on field values may not be practical. The number of rows that are returned may be higher than the selectivity threshold of Salesforce’s query optimizer. The result could be a full table scan and slow performance, or even failure. Then you need to employ a different strategy.

You can use PK Chunking when the attribute filtering doesn’t help you break the data up into smaller chunks. PK stands for Primary Key which is an object’s recordId. The system always indexes this by default. PK chunking splits bulk queries on very large tables into chunks based on the recordIds of the queried records.

Enable PK chunking when querying tables with more than 10 million records or when a bulk query constantly times out. PK Chunking is a supported feature of the Salesforce Bulk API, so it does all the work of splitting the queries into manageable chunks. Just enter a few parameters on your Bulk API job, and the platform automatically splits the query into separate chunks, executes a query for each chunk, and returns the data.

You can use PK Chunking with most standard objects. It’s supported for Account, Campaign, CampaignMember, Case, Contact, Lead, LoginHistory, Opportunity, Task, and User, as well as all custom objects. To enable the feature, specify the header Sforce-Enable-PKChunking on the job request for your Bulk API query.

To choose a chunk size, simply specify it in the header. For example, this header enables PK chunking with a chunk size of 50,000 records: `Sforce-Enable-PKChunking: chunkSize=50000`. Each chunk is processed as a separate batch that counts toward your daily batch limit, and its results must be downloaded separately. You can perform filtering while using PK Chunking by including a WHERE clause in the Bulk API query. Using this method, there may be fewer records returned for a chunk than the number you have specified in chunkSize.

API Performance

Imagine a scenario in which a custom integration syncs Salesforce data to external applications.

This process involves querying Salesforce data in a given object, loading this to an external system and querying Salesforce again to get IDs of all the data, so the integration can determine which records have been processed and deleted.

The objects might contain several million records. The integration would use a specific API user that was part of the sharing hierarchy to limit the records retrieved. The queries might take minutes to complete. In Salesforce, sharing is a very powerful mechanism for making certain records visible to certain users, and it works very well for UI interactions. However, when used as a high-volume data filter in a SOQL query, performance can suffer because data access is more complex and difficult to process when you use sharing as a filter, especially if you are trying to filter out records in a large data volume situation.

The solution to overcome this is to give users running the code access to all the data and use selective filters to get the appropriate records. For example, using an administrator as the API user would have provided access to all of the data and prevented sharing from being considered in the query.

An additional solution is to create a delta extraction, lowering the volume of data that needs to be processed.

Truncation

Truncation is a really fast way to delete records in a sandbox org’s custom objects. Truncation removes the records but keeps the metadata intact for future use.

Truncating custom objects causes some irreversible changes to the truncated object and its records. They cannot be returned to their original state.

If you have created a custom object and filled it with test records, truncation can be really useful when you’re done with test data. To truly test the application performance, mock data is generated and used. Instead of creating the object in production you can truncate the object, purge the test records and deploy to production. This is much faster than batch deleting records.

You can enable truncation by entering the User Interface section in s=Setup and enable the permission. You can enable truncation for an object by going to its detail page and clicking the Truncate button. After reviewing the warning for the operation, enter the name of the object.

When you truncate a custom object, all of the object’s records are removed permanently, but the object’s definition remains. The records no longer count against your org limits. In contrast, if you delete a custom object, the object moves to the Recycle Bin for 15 days (as described above). After that, the object and its records are permanently deleted.

Conclusion

Thanks for reading. We hope that these performance tips will be helpful for improving performance and speed of your Salesforce applications!