Databricks Naming Conventions

Introduction#

Consistent naming across env (dev, test, prod), layers (bronze/silver/gold), and domains is critical in Databricks. It prevents confusion, enforces governance, and supports automation with Unity Catalog and Delta Lake.


General Best Practices#

  • Separate dev / test / prod workspaces.
  • Apply RBAC + Unity Catalog.
  • Use modular notebooks; reuse with %run.
  • Version control all code.
  • Prefer job clusters; auto-terminate.
  • Vacuum Delta tables; use optimize + z-order.
  • Allow schema evolution only when intentional.

Environment‑Aware Medallion Naming#

Unity Catalog is the governance backbone. Inconsistent names break access policies and automation. Use env prefixes, clear domains, and snake_case. cf. Unity Catalog docs .

Multi-module build based on sbt

import sbt.{Compile, Test, *}
import Keys.{baseDirectory, libraryDependencies, *}

// sbt.version = 1.6.2
ThisBuild / trackInternalDependencies := TrackLevel.TrackIfMissing

lazy val welcome = taskKey[Unit]("welcome")

val sparkVersion = "2.4.0-cdh6.2.1"
val hiveVersion = "2.1.1-cdh6.2.1"

lazy val commonSettings = Seq(
  //organization := "com.nnz",
  version := "0.1.0-SNAPSHOT",
  welcome := { println("Welcome !")},
  scalaVersion := "2.11.12",
  javacOptions ++= Seq("-source", "15.0.10", "-target", "15.0.10"),
  libraryDependencies ++= sparkDependencies,
  resolvers ++= Seq("Cloudera Versions" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
  )
)

lazy val root = (project in file("."))
  .settings(
    name := "multimodule-project",
    commonSettings,
    update / aggregate :=  true,
  )
  .aggregate(warehouse, ingestion, processing)

lazy val warehouse = (project in file("warehouse"))
  .settings(
    name := "warehouse",
    commonSettings,
    Compile / scalaSource := baseDirectory.value /"." / "src" / "main" / "scala",
    Test / scalaSource := baseDirectory.value  /"." / "src" / "test" / "scala",
  )

lazy val ingestion = (project in file("ingestion"))
  .dependsOn(warehouse)
  .settings(
    name := "ingestion",
    commonSettings,
    Compile / scalaSource := baseDirectory.value /"." / "src" / "main" / "scala",
    Test / scalaSource := baseDirectory.value  /"." / "src" / "test" / "scala",
  )

lazy val processing = (project in file("processing"))
  .dependsOn(warehouse, ingestion)
  .settings(
    name := "processing",
    commonSettings,
    Compile / scalaSource := baseDirectory.value /"." / "src" / "main" / "scala",
    Test / scalaSource := baseDirectory.value  /"." / "src" / "test" / "scala",
  )

/**
 * Spark Dependencies
 */
val sparkCore = "org.apache.spark" %% "spark-core" % sparkVersion
val sparkSQL = "org.apache.spark" %% "spark-sql" % sparkVersion
val sparkHive = "org.apache.spark" %% "spark-hive" %  sparkVersion

lazy val sparkDependencies = Seq(sparkCore, sparkSQL, sparkHive)

https://gist.github.com/Non-NeutralZero/d5be154ee38962176bcc0bf49182c691

Git commands I often use

Add#

# only add files with .scala extension
git ls-files [path] | grep '\.scala$' | xargs git add
git stash --keep-index

Log & History#

# compact, visual branch graph
git log --oneline --graph --decorate --all

# search commits by message
git log --grep="keyword"

# show changes introduced by each commit
git log -p --follow -- path/to/file

# who changed what line
git blame -L 10,20 file.txt

Diff#

# diff staged changes (what's about to be committed)
git diff --staged

# diff between two branches
git diff main..feature-branch

Undo / Fix#

# undo last commit but keep changes staged
git reset --soft HEAD~1

# amend last commit (message or content)
git commit --amend --no-edit

# discard changes in a file
git checkout -- file.txt

# recover a dropped stash or deleted commit
git reflog

Branches#

# delete remote branch
git push origin --delete branch-name

# rename current branch
git branch -m new-name

# show which branch a commit is in
git branch --contains <commit-hash>

Stash#

# stash with a name
git stash push -m "wip: auth refactor"

# apply specific stash
git stash apply stash@{2}

# list stashes
git stash list

Productivity#

# find which commit introduced a bug (binary search)
git bisect start
git bisect bad        # current is broken
git bisect good v1.0  # last known good

# apply a single commit from another branch
git cherry-pick <commit-hash>

# rebase interactively (squash, reorder, edit)
git rebase -i HEAD~3

How to document your code?

Comment documenter ?#

Les mêmes principes et critères d’un bon code devraient s’appliquer à la documentation:

  • Conventionnelle
  • Simple
  • Facile à comprendre

En plus des critères d’un bon code, une bonne documentation devrait aussi être:

  • Explicative (intention du code, règles métiers, clarification du code, mise en garde sur les conséquences d’une mauvaise utilisation, indications pour le testing)
  • Non-redondante
/**
* Returns the temperature.
*/
int get_temperature(void) {
return temperature;
}
  • Non-bruitée
/**
* Always returns true.
*/
public boolean isAvailable()
{ return false;}

Bonnes pratiques#

Introduire son code.#

Décrire le contexte ou le background du code est une bonne pratique qui permettra aux lecteurs de se positionner par rapport aux conditions dans lesquelles le code a été généré et à ses objectifs.

Hive

Snippets#

-- set identifiers to none for the query below to work and 
-- set it back to column once it's done
set hive.support.quoted.identifiers = none;

HIVE 3#

  • BI Code typically use db.table - needs to change to db.table
  • Default path : /warehouse/tablespace/external/hive/default.db/test_table

ACID + HIVE