Best Practices

Optimizations for Salesforce Orgs with Lots of Data (Part 3)

Part 3 of our 4 part series on data optimizations for large Salesforce orgs.

Last Update:
Published:
June 9, 2021

Table of Contents

Custom Indexes

Welcome to Part 3 of our series on optimizing large Salesforce orgs (you can check out Part 1 here and Part 2 here). Today, we’re continuing on with use cases for custom indexed fields. Custom indexes can be created on formula fields with static values (referencing other lookup fields, dynamic date and time functions like TODAY). The static values can be long text area fields, binary fields, multi-select picklists.

One caveat is if you change a formula field after a custom index has been created for it, you would need to reach out to Salesforce Support to recreate it for you.

people drawing on a white board


Two Column Indexes

This type of index is an example of a composite index which is a specialized feature of the platform. In traditional databases, the columns used in composite indices are concatenated together, and those concatenated keys are stored in the database. When you perform a search, concatenation of your search keys is matched against those of the composite index. Then if there is any mismatch between the ordering of your search keys and ordering of the composite index columns, the index can’t be used. 

Composite indexes in Salesforce are useful for list views and situations where you want to use one field to select records to display and another field to sort them. If these two fields are usually searched together, you can reach out to Salesforce Support to create a two column index for you to increase performance. Example query below:

SELECT Name FROM Account WHERE f1__c = 'foo' AND f2__c = 'bar'

Usually two column indexes have the same restrictions as single column indexes except they can have nulls in the second column by default, whereas if you want this to be enabled for a single column index you would have to reach out to Salesforce Support.

Divisions

A rather obscure method for increasing performance is enabling Salesforce Divisions in your org. Divisions have been around for some time and they can be useful if you want to partition your data. To use Divisions, your organization needs to have more than 1 million records in a single object and have more than 35 licences.

For example, Divisions could be used to create regional filters called “US” and “EMEA” or any number of other Divisions you can imagine. These smaller groups will perform better.

A few things to consider:.

  • Data can be divided into three or more logical sections that do not share data.
  • Most Divisions will be roughly equal in size and no single division will be dramatically larger than the others.
  • A drawback of using Divisions is that you can’t disable divisions after they are eanbled. 
  • Shared Accounts and Opportunities cannot belong to more than one Division.
  • Not recommended if you have users that will need access to multiple division data. 
  • Records without a parent record will be automatically routed to org’s default global Division. 
  • Divisions cannot be used for hiding records.

If you are sectioning less than 5,000 records into Divisions, the records will be transferred immediately. If you are sectioning 5,000 or more records into Divisions, you will receive an email notification when the transfers are complete.

Mashups

You can also choose to host your data elsewhere and have the representation of the data presented in Salesforce. For example, in a Google maps integration, you can view the data in your browser by integrating Google Maps with Salesforce. Another example might be integrating current sales statistics residing in an ERP system onto an account screen.

There can be integrations with Google Adwords or LinkedIn where this method may call web services or invoke a Salesforce API to perform a step in a business process. The functionality is usually reduced however, as you cannot report on this data in your organization. You can also do this the other way around and use Salesforce as the client and expose data in an external system. 

Deleting Data

Empty your recycle bin! Salesfoce uses the metaphor in the same way you would on a PC. When you remove data, unless you are using Bulk API’s hard delete option, Salesforce flags the data as deleted and makes it visible in the recycle bin. Records present in the bin still affect performance and deleted data stays for 15 days or until the recycle bin grows to a specific size. The data is then hard deleted until the date or size limit is reached or when the recycle bin is emptied by the API, UI or Apex. Emptying the recycle bin is low hanging fruit for many organizations.

Reporting

For reporting, you can use divisions and partition data. You can also minimize the number of records per object. Try to use a filter value to segment your queries. (e.g. Filter by Country or State). Reducing the number of joins and objects queried in a report is also going to increase performance. You can use fields like roll-up summaries to sum data up to parent rather than grouping the data by parent and decrease processing times. You can reduce the number of fields queried in the data by removing them from the selection, you can move the unused data to a custom object for a commonly used report to query only the records that are relevant. Try to use index fields in reports as much as possible and use custom index fields if you create one for the supported fields.

Data Load

Try to use the Bulk API when you have more than a few hundred thousand records. Be sure to reduce errors in your dataset as much as possible before data load. It’s important to know if you have an error during your data load that the processing becomes single row and makes that processing much longer. Also for the API, try to follow the fastest operation. Insert > Update > Upsert.

Try to update only the fields that have changed when you are doing a DML operation by API. Authentication also matters so try to authenticate once per load. There’s a low limit on authenticating API calls on Salesforce, so after a few thousand records you will get an error for the login API limit. Try to use a public read/write security setting to avoid sharing calculations. When the system is set with public read/write, it does not evaluate sharing rules also when the Grant Access Using Hierarchies is not selected.

Populate roles before populating sharing rules. Load users first into roles > load record data with owners > configure public groups and queues.

Disable Apex triggers, workflow rules and validations during data load and investigate if batch Apex is processing records as expected. Balance the batch Apex load size if you are getting timeout errors. 

You can use the Bulk API to process jobs either in serial mode or in parallel mode. Processing batches serially means running them one after another, and processing batches in parallel means running multiple batches at the same time. Serial mode is an alternative that should be used when there’s a high chance of record locking. (This happens most frequently when creating new users, updating user roles, updating territories, changing ownership for records with a Private Sharing model.)

When changing child records, group them by parent using parentId in the same batch to avoid a locking conflict. You can also use Defer Sharing Calculations and mashups.

Suppose that you are inserting AccountTeamMember records and you have references to the same Account Id within multiple batches. You risk lock timeouts as these multiple batches process (for example, in parallel) and attempt to lock the Account record. To avoid these lock contentions, organize your data by Account Id such that all AccountTeamMember records referencing the same Account Id are in the same batch.

Extracting Data from the API & Searching

You can use the getUpdated() SOAP API method to sync an external system with Salesforce. You can also use the outbound messaging feature for more frequent syncing. This will enable you to get more frequent data updates and decrease the total operation time for all automation and queries. Try to reduce the number of records to return on a given query. Similar to batch sizing, it’s essential to not cause any database locking and return a few records frequently instead of returning a massive dataset to process.

Try to keep searches as specific as possible, remember that deleted data will still be queried and impact performance. Single object searches are generally faster than multi-object searches and use features like enhanced lookups.

As for deleting large amounts of data, remember to delete children first if you’re going to remove hierarchical data. You should also use the hard delete option of the Bulk API if you are going to delete more than 1 million records.

Large Data Volume Case Studies

Reports:

Let’s say a customer creates a report that has more than one related object: Accounts, Sales Orders, Sales Details (Child of Sales Orders), and Account Ownership, with record counts of 350k, 750k, 2.300k, 1200k respectively. 

  • To optimize this report, you can add additional filters to make the query more selective and ensure most fields that are filtered are indexable.
  • Reduce data in each object, remove or transfer archive data to a different object.
  • Keep the recycle bin empty since it affects reports as well.
  • Keep minimum sharing rules and logic between these objects.

For a second example, let’s say a customer needs to aggregate regular interval metrics using standard report types. The customer’s monthly and yearly details were stored in custom objects with 4 million and 9 million records, respectively. The reports were aggregating across millions of records across two objects and performance was less than optimal.

  • Create a custom object to store for aggregation of the summarized time interval values into required format for reports. This way, the reports can be executed from the aggregated custom object. (You can do this processing via batch Apex).

We will be continuing with case studies for specific scenarios about optimizing search, report performance and how to leverage Bulk API by using platform quirks and features in Part 4 of this series.


More like this