Development

Development

Multi-module build based on sbt

March 15, 2024

import sbt.{Compile, Test, *}
import Keys.{baseDirectory, libraryDependencies, *}

// sbt.version = 1.6.2
ThisBuild / trackInternalDependencies := TrackLevel.TrackIfMissing

lazy val welcome = taskKey[Unit]("welcome")

val sparkVersion = "2.4.0-cdh6.2.1"
val hiveVersion = "2.1.1-cdh6.2.1"

lazy val commonSettings = Seq(
  //organization := "com.nnz",
  version := "0.1.0-SNAPSHOT",
  welcome := { println("Welcome !")},
  scalaVersion := "2.11.12",
  javacOptions ++= Seq("-source", "15.0.10", "-target", "15.0.10"),
  libraryDependencies ++= sparkDependencies,
  resolvers ++= Seq("Cloudera Versions" at "https://repository.cloudera.com/artifactory/cloudera-repos/",
  )
)

lazy val root = (project in file("."))
  .settings(
    name := "multimodule-project",
    commonSettings,
    update / aggregate :=  true,
  )
  .aggregate(warehouse, ingestion, processing)

lazy val warehouse = (project in file("warehouse"))
  .settings(
    name := "warehouse",
    commonSettings,
    Compile / scalaSource := baseDirectory.value /"." / "src" / "main" / "scala",
    Test / scalaSource := baseDirectory.value  /"." / "src" / "test" / "scala",
  )

lazy val ingestion = (project in file("ingestion"))
  .dependsOn(warehouse)
  .settings(
    name := "ingestion",
    commonSettings,
    Compile / scalaSource := baseDirectory.value /"." / "src" / "main" / "scala",
    Test / scalaSource := baseDirectory.value  /"." / "src" / "test" / "scala",
  )

lazy val processing = (project in file("processing"))
  .dependsOn(warehouse, ingestion)
  .settings(
    name := "processing",
    commonSettings,
    Compile / scalaSource := baseDirectory.value /"." / "src" / "main" / "scala",
    Test / scalaSource := baseDirectory.value  /"." / "src" / "test" / "scala",
  )

/**
 * Spark Dependencies
 */
val sparkCore = "org.apache.spark" %% "spark-core" % sparkVersion
val sparkSQL = "org.apache.spark" %% "spark-sql" % sparkVersion
val sparkHive = "org.apache.spark" %% "spark-hive" %  sparkVersion

lazy val sparkDependencies = Seq(sparkCore, sparkSQL, sparkHive)

https://gist.github.com/Non-NeutralZero/d5be154ee38962176bcc0bf49182c691

...

Building a website using Hugo and Hosting it on GitHub Pages

October 26, 2023

Development, Tutorials

Markdown, Development

Installations #

Install Git - Link
Install Hugo - Link

Configuration #

To create a new Hugo website, run:

hugo new site mynewsite

then cd to the directory

cd mynewsite

Initialize the site as a git repository

git init

Choose the hugo theme that suits you.
Hugo offer a selection of themes developed by the community. This site for example was built using Hugo-Book.
Add the theme as a submodule

# For example:
git submodule add https://github.com/alex-shpak/hugo-book themes/hugo-book

Add the theme to your site configuration file

# Could be config.toml OR config.yaml OR hugo.toml OR hugo.yaml
echo "theme = 'hugo-book'" >> config.toml

You will be able to see a first version of your website locally by running:

hugo server --minify

Edit your configuration file

baseURL = 'http://example.org/'
languageCode = 'en-us'
title = 'My New Hugo Site'

Theme ConfigurationGuidelines
Themes’ publishers offer guidelines to configure your webiste in accordance to the theme. Check your theme publisher page on hugo themes or their theme github repo for guidance and help.

Hosting on Github Pages #

On your project settings, go to Pages. You’ll be able to see your site’s link.
Choose a Build and deployment source (Github actions OR deploy from branch).
You can also choose to publish it on a custom domain.
Edit your configuration file

baseURL = 'https://username.github.io/repository'
languageCode = 'en-us'
title = 'My New Hugo Site'
theme = 'hugo-book'

Other Great Tools For Building Static Websites #

Sphinx https://www.sphinx-doc.org/en/master/index.html
VuePress https://vuepress.vuejs.org/
Read the docs https://about.readthedocs.com/features/

Prompt Engineering

September 12, 2023

Development, Documentation

Llm, Langchain

RAG Architecture #

Running PySpark & Jupyter With Docker

June 8, 2023

Development, Tutorials

Spark, Docker, Jupyter

Thanks to the Jupyter community, it’s now much easier to run PySpark on Jupyter using Docker. There are two ways you can do this : 1. the “direct” way and 2. the customized way.

The “direct” way #

verify your local settings are aligned with the pre-requisites to run this container, grosso modo make sure docker is installed, of course !
You have to have about 4 GB of free space
...

How to document your code?

July 12, 2019

Development, Tutorials

Scala, Templates, Development

Comment documenter ? #

Les mêmes principes et critères d’un bon code devraient s’appliquer à la documentation:

Conventionnelle
Simple
Facile à comprendre

En plus des critères d’un bon code, une bonne documentation devrait aussi être:

Explicative (intention du code, règles métiers, clarification du code, mise en garde sur les conséquences d’une mauvaise utilisation, indications pour le testing)
Non-redondante

/**
* Returns the temperature.
*/
int get_temperature(void) {
return temperature;
}

Non-bruitée

/**
* Always returns true.
*/
public boolean isAvailable()
{ return false;}

Bonnes pratiques #

Introduire son code. #

Décrire le contexte ou le background du code est une bonne pratique qui permettra aux lecteurs de se positionner par rapport aux conditions dans lesquelles le code a été généré et à ses objectifs.

...