# Databricks Spark TLS Connections
Connecting Databricks to a source database that requires TLS encryption with a custom root CA bundle is a little less trivial than I expected, especially using Apache Spark Connectors. To connect to AWS RDS, you need the RDS root CAs. The AWS instructions, unfortunately, didn’t contain any guidance for Spark.
There are two truststore bundles that need updating with the AWS certs:
- OS truststore bundle - e.g.
/etc/ssl/certs/ca-certificates.crt - Java truststore bundle - e.g.
/usr/lib/jvm/zulu17-ca-amd64/lib/security/cacerts
This Databricks Knowledge Base article provides guidance on updatingboth truststores with specific certificates. This is helpful, but the AWS bundle contains around 150 certificates - you can’t reasonably plug those into an init script, as it’d take over 1 minute each time the cluster starts up (trust me, I tried).
A more sensible approach to this:
- Pre-build your authoritative bundles
- Save them onto persistent storage
- Use an init script to copy the pre-built bundles over their fresh cluster equivalents
When implemented, this will reduce cluster startup time to a couple of seconds, ever if you’re working with really big custom certificate bundles.
Required scripts
Databricks “Bundle Build” Notebook
Contains a single cell. The notebook can be scheduled to create a fresh bundle semi-regularly to ensure updates to the imported certificates are picked up.
1%sh
2CERT_BUNDLE_URL="https://example.com/ca-bundle.pem"
3STORAGE_PATH="/Volumes/dbx_security/certs"
4JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
5
6# Download bundle and install into system truststore
7echo "Downloading certificate bundle ..."
8curl -fSL -o /usr/local/share/ca-certificates/custom-bundle.crt "$CERT_BUNDLE_URL"
9update-ca-certificates
10
11# Import each cert into copy of Java truststore
12echo "Importing into Java keystore ..."
13
14cp "$JAVA_HOME/lib/security/cacerts" /tmp/cacerts
15
16CERTS=$(grep -c 'END CERTIFICATE' /etc/ssl/certs/custom-bundle.pem)
17
18for N in $(seq 0 $(($CERTS - 1))); do
19 awk "n==$N { print }; /END CERTIFICATE/ { n++ }" /etc/ssl/certs/custom-bundle.pem | \
20 keytool -noprompt -import -trustcacerts \
21 -alias "custom-ca-${N}" \
22 -keystore /tmp/cacerts \
23 -storepass "changeit" > /dev/null 2>&1 || true
24done
25
26echo "Copying to $STORAGE_PATH ..."
27
28cp /tmp/cacerts "$STORAGE_PATH/cacerts"
29cp /etc/ssl/certs/ca-certificates.crt "$STORAGE_PATH/ca-certificates.crt"
Cluster Init Script
1#!/bin/bash
2
3STORAGE_PATH="/Volumes/dbx_security/certs"
4JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
5
6# Overwrite fresh cluster bundles with pre-built versions
7cp "$STORAGE_PATH/cacerts" "$JAVA_HOME/lib/security/cacerts"
8cp "$STORAGE_PATH/ca-certificates.crt" "/etc/ssl/certs/ca-certificates.crt"
9
10# Apply env vars
11echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
12echo "export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
Implementation Notes
- If you’re using Standard (nee Shared) Compute, you can’t use Global init scripts - you must reference your init script on each cluster
STORAGE_PATHcan be a Databricks Volume, or any other storage available to init scripts


