First, load the salesforcer and dplyr packages and login, if needed.
For really large inserts, updates, deletes, upserts, and queries you can just add api_type = "Bulk 1.0"
to most functions to get the benefits of using the Bulk API instead of the SOAP or REST APIs. The change you have to make from using the REST API to the Bulk 1.0 API as simple as adding api_type="Bulk 1.0"
to your function arguments. First, let’s build a tbl_df
with two new records to be created.
n <- 2
prefix <- paste0("Bulk-", as.integer(runif(1,1,100000)), "-")
new_contacts1 <- tibble(FirstName = rep("Test", n),
LastName = paste0("Contact-Create-", 1:n),
My_External_Id__c=paste0(prefix, letters[1:n]))
new_contacts2 <- tibble(FirstName = rep("Test", n),
LastName = paste0("Contact-Create-", 1:n),
My_External_Id__c=paste0(prefix, letters[1:n]))
There are some differences in the way the REST API returns response information vs. the Bulk 1.0 API. However, the changes in Salesforce are exactly the same for these two calls.
# REST
rest_created_records <- sf_create(new_contacts1, object_name="Contact", api_type="REST")
rest_created_records
#> # A tibble: 2 x 3
#> id success errors
#> <chr> <lgl> <list>
#> 1 0033s000012MKVdAAO TRUE <list [0]>
#> 2 0033s000012MKVeAAO TRUE <list [0]>
# Bulk
bulk_created_records <- sf_create(new_contacts2, object_name="Contact", api_type="Bulk 1.0")
bulk_created_records
#> # A tibble: 2 x 4
#> Id Success Created Error
#> <lgl> <lgl> <lgl> <chr>
#> 1 NA FALSE FALSE DUPLICATE_VALUE:duplicate value found: My_External_Id__…
#> 2 NA FALSE FALSE DUPLICATE_VALUE:duplicate value found: My_External_Id__…
To show a more lengthy example of using the Bulk 1.0 API, below is a workflow of that creates 2 records, queries them, and deletes them. This is just an example. Typically, you’d want to use the Bulk APIs over the REST or SOAP APIs when dealing with over 10,000 records.
object <- "Contact"
created_records <- sf_create(new_contacts, object_name=object, api_type="Bulk 1.0")
created_records
#> # A tibble: 2 x 4
#> Id Success Created Error
#> <chr> <lgl> <lgl> <lgl>
#> 1 0033s000012MKViAAO TRUE TRUE NA
#> 2 0033s000012MKVjAAO TRUE TRUE NA
# query bulk
my_soql <- sprintf("SELECT Id,
FirstName,
LastName
FROM Contact
WHERE Id in ('%s')",
paste0(created_records$Id , collapse="','"))
queried_records <- sf_query(my_soql, object_name=object, api_type="Bulk 1.0")
queried_records
#> # A tibble: 2 x 3
#> Id FirstName LastName
#> <chr> <chr> <chr>
#> 1 0033s000012MKViAAO Test Contact-Create-1
#> 2 0033s000012MKVjAAO Test Contact-Create-2
# delete bulk
deleted_records <- sf_delete(queried_records$Id, object_name=object, api_type="Bulk 1.0")
deleted_records
#> # A tibble: 2 x 4
#> Id Success Created Error
#> <chr> <lgl> <lgl> <lgl>
#> 1 0033s000012MKViAAO TRUE FALSE NA
#> 2 0033s000012MKVjAAO TRUE FALSE NA
There is one limitation to Bulk queries is that it does not support the following operations or structures of SOQL:
The salesforcer package also implements the Bulk 2.0 API which has better speed than the Bulk 1.0 API but sacrifices consistency in the ordering of the result records since they are batched and processed asynchronously. In the example below we first create 10 records in five batches of two records using both the Bulk 1.0 API and then the Bulk 2.0 API. After looking at the results you can see how the returned results are ordered when we process them using the Bulk 1.0 API because after the data is split into batches each batch is processed synchronously which preserves the order of the rows in the returned output. This is not necessarily true of the Bulk 2.0 API which is the point (it’s asynchronicity).
Bulk 1.0
n <- 10
new_contacts <- tibble(FirstName = rep("Test", n),
LastName = paste0("Contact-Create-", 1:n),
test_number__c = 1:10)
created_records_v1 <- sf_create(new_contacts, "Contact", api_type="Bulk 1.0", batch_size=2)
created_records_v1
#> # A tibble: 10 x 4
#> Id Success Created Error
#> <chr> <lgl> <lgl> <lgl>
#> 1 0033s000012MKVnAAO TRUE TRUE NA
#> 2 0033s000012MKVoAAO TRUE TRUE NA
#> 3 0033s000012MKW7AAO TRUE TRUE NA
#> 4 0033s000012MKW8AAO TRUE TRUE NA
#> 5 0033s000012MKVsAAO TRUE TRUE NA
#> 6 0033s000012MKVtAAO TRUE TRUE NA
#> 7 0033s000012MKVxAAO TRUE TRUE NA
#> 8 0033s000012MKVyAAO TRUE TRUE NA
#> 9 0033s000012MKW2AAO TRUE TRUE NA
#> 10 0033s000012MKW3AAO TRUE TRUE NA
# query the records so we can compare the ordering of the Id field to the
# original dataset
my_soql <- sprintf("SELECT Id,
test_number__c
FROM Contact
WHERE Id in ('%s')",
paste0(created_records_v1$Id , collapse="','"))
queried_records <- sf_query(my_soql)
queried_records <- queried_records %>%
arrange(test_number__c)
# same ordering of rows!
cbind(created_records_v1 %>% select(Id), queried_records)
#> Id Id test_number__c
#> 1 0033s000012MKVnAAO 0033s000012MKVnAAO 1
#> 2 0033s000012MKVoAAO 0033s000012MKVoAAO 2
#> 3 0033s000012MKW7AAO 0033s000012MKW7AAO 3
#> 4 0033s000012MKW8AAO 0033s000012MKW8AAO 4
#> 5 0033s000012MKVsAAO 0033s000012MKVsAAO 5
#> 6 0033s000012MKVtAAO 0033s000012MKVtAAO 6
#> 7 0033s000012MKVxAAO 0033s000012MKVxAAO 7
#> 8 0033s000012MKVyAAO 0033s000012MKVyAAO 8
#> 9 0033s000012MKW2AAO 0033s000012MKW2AAO 9
#> 10 0033s000012MKW3AAO 0033s000012MKW3AAO 10
The Bulk 2.0 API returns every single field that was included in the call so if you have an identifying key your dataset, then it should not be a problem to join on that key with your original data to bring in the newly assigned Salesforce Id that is generated when the record is created in Salesforce. However, I have find it wasteful to transfer all of the field information back after the query and have not found a significant performance improvement between the Bulk 1.0 and Bulk 2.0.
Bulk 2.0
The result in Bulk 2.0 API returns all fields so that you can still associate the records if the ordering was changed during processing. However, in this simple case the ordering is still preserved because the number of records processed in this example was not enough to create separate batches to process asynchronously. Finally, note that the field names are different from the Bulk 1.0 API, this is the Salesforcer convention for the Bulk 2.0 API.
created_records_v2
#> # A tibble: 10 x 6
#> sf__Id sf__Created FirstName LastName test_number__c sf__Error
#> <chr> <lgl> <chr> <chr> <dbl> <chr>
#> 1 0033s000012MKW… TRUE Test Contact-Creat… 1 <NA>
#> 2 0033s000012MKW… TRUE Test Contact-Creat… 2 <NA>
#> 3 0033s000012MKW… TRUE Test Contact-Creat… 3 <NA>
#> 4 0033s000012MKW… TRUE Test Contact-Creat… 4 <NA>
#> 5 0033s000012MKW… TRUE Test Contact-Creat… 5 <NA>
#> 6 0033s000012MKW… TRUE Test Contact-Creat… 6 <NA>
#> 7 0033s000012MKW… TRUE Test Contact-Creat… 7 <NA>
#> 8 0033s000012MKW… TRUE Test Contact-Creat… 8 <NA>
#> 9 0033s000012MKW… TRUE Test Contact-Creat… 9 <NA>
#> 10 0033s000012MKW… TRUE Test Contact-Creat… 10 <NA>
For these above I typically prefer using the Bulk 1.0 API when creating or updating records. That way I can be confident in that the order of the records returned from the process matches the order of the original dataset I put into the process. All of this does not matter when deleting records, so if you want minor performance improvements then switch to the Bulk 2.0 when deleting records.
sf_delete(c(created_records_v1$Id, created_records_v2$sf__Id),
object_name = "Contact", api_type="Bulk 2.0")
#> # A tibble: 20 x 4
#> sf__Id sf__Created Id sf__Error
#> <chr> <lgl> <chr> <chr>
#> 1 0033s000012MKVnAAO FALSE 0033s000012MKVnAAO <NA>
#> 2 0033s000012MKVoAAO FALSE 0033s000012MKVoAAO <NA>
#> 3 0033s000012MKW7AAO FALSE 0033s000012MKW7AAO <NA>
#> 4 0033s000012MKW8AAO FALSE 0033s000012MKW8AAO <NA>
#> 5 0033s000012MKVsAAO FALSE 0033s000012MKVsAAO <NA>
#> 6 0033s000012MKVtAAO FALSE 0033s000012MKVtAAO <NA>
#> 7 0033s000012MKVxAAO FALSE 0033s000012MKVxAAO <NA>
#> 8 0033s000012MKVyAAO FALSE 0033s000012MKVyAAO <NA>
#> 9 0033s000012MKW2AAO FALSE 0033s000012MKW2AAO <NA>
#> 10 0033s000012MKW3AAO FALSE 0033s000012MKW3AAO <NA>
#> 11 0033s000012MKWCAA4 FALSE 0033s000012MKWCAA4 <NA>
#> 12 0033s000012MKWDAA4 FALSE 0033s000012MKWDAA4 <NA>
#> 13 0033s000012MKWEAA4 FALSE 0033s000012MKWEAA4 <NA>
#> 14 0033s000012MKWFAA4 FALSE 0033s000012MKWFAA4 <NA>
#> 15 0033s000012MKWGAA4 FALSE 0033s000012MKWGAA4 <NA>
#> 16 0033s000012MKWHAA4 FALSE 0033s000012MKWHAA4 <NA>
#> 17 0033s000012MKWIAA4 FALSE 0033s000012MKWIAA4 <NA>
#> 18 0033s000012MKWJAA4 FALSE 0033s000012MKWJAA4 <NA>
#> 19 0033s000012MKWKAA4 FALSE 0033s000012MKWKAA4 <NA>
#> 20 0033s000012MKWLAA4 FALSE 0033s000012MKWLAA4 <NA>