Multi-module build based on sbt

March 15, 2024
import sbt.{Compile, Test, *} import Keys.{baseDirectory, libraryDependencies, *} // sbt.version = 1.6.2 ThisBuild / trackInternalDependencies := TrackLevel.TrackIfMissing lazy val welcome = taskKey[Unit]("welcome") val sparkVersion = "2.4.0-cdh6.2.1" val hiveVersion = "2.1.1-cdh6.2.1" lazy val commonSettings = Seq( //organization := "com.nnz", version := "0.1.0-SNAPSHOT", welcome := { println("Welcome !")}, scalaVersion := "2.11.12", javacOptions ++= Seq("-source", "15.0.10", "-target", "15.0.10"), libraryDependencies ++= sparkDependencies, resolvers ++= Seq("Cloudera Versions" at "", ) ) lazy val root = (project in file(". ...


November 29, 2023
BibTex # Limit number of authors in IEEEtran # In the .bib file configure your IEEEtran as follows: @IEEEtranBSTCTL{IEEEexample:BSTcontrol, CTLuse_forced_etal = "yes", CTLmax_names_forced_etal = "3", CTLnames_show_etal = "2" } Cheat-sheets # Overleaf, Bibliography management with bibtex Sébastien Merkel, Reference sheet for natbib usage LaTeX/Bibliography Management. (2023, June 5). Wikibooks. Discipline Specific Listings of BibTeX Journal Styles


November 20, 2023
Memory Usage # def memory(): with open('/proc/meminfo', 'r') as mem: ret = {} tmp = 0 for i in mem: sline = i.split() if str(sline[0]) == 'MemTotal:': ret['total'] = int(sline[1]) elif str(sline[0]) in ('MemFree:', 'Buffers:', 'Cached:'): tmp += int(sline[1]) ret['free'] = tmp ret['used'] = int(ret['total']) - int(ret['free']) return ret No Hang Up # nohup jupyter notebook --no-browser > notebook.log 2>&1 & Workaround: no cells output # se = time. ...

VS Code Configuration & Set-up

November 17, 2023
Configuration # Remote SSH # Host machine Hostname User user_name IdentityFile path/to/ssh/key Remote SSH - SSH Tunnel # Host tunnel_machine Hostname User user_name IdentityFile path/to/ssh/key Host machine_after_tunnel Hostname User user_name IdentityFile path/to/ssh/key ForwardAgent yes ProxyJump tunnel_machine PC Configuration # Authorize your windows local machine to connect to remote machine. $USER_AT_HOST="your-user-name-on-host@hostname" $PUBKEYPATH="$HOME\.ssh\" $pubKey=(Get-Content "$PUBKEYPATH" | Out-String); ssh "$USER_AT_HOST" "mkdir -p ~/.ssh && chmod 700 ~/. ...

Run plotly in JupyterLab

October 24, 2023
1 pip uninstall plotly 2 jupyter labextension uninstall @jupyterlab/plotly-extension 3 jupyter labextension uninstall jupyterlab-plotly 4 jupyter labextension uninstall plotlywidget 5 jupyter labextension update --all 6 pip install plotly==5.17.0 7 pip install "jupyterlab>=3" "ipywidgets>=7.6" 8 pip install jupyter-dash 9 jupyter labextension list Useful Links # What is Right extension for Plotly in JupyterLab?

Install python packages offline

June 20, 2023
1- Download packages locally using a requirements file or download a single package pip download -r requirements.txt ## Example - single package python -m pip download \ --only-binary=:all: \ --platform manylinux1_x86_64 --platform linux_x86_64 --platform any \ --python-version 39 \ --implementation cp \ --abi cp39m --abi cp39 --abi abi3 --abi none \ scipy 2- Copy them to the a temporary folder in your remote machine 3- On your machine, Activate conda and then install them using pip - specify installation options ...

Running PySpark & Jupyter With Docker

June 8, 2023
Thanks to the Jupyter community, it’s now much easier to run PySpark on Jupyter using Docker. There are two ways you can do this : 1. the “direct” way and 2. the customized way. The “direct” way # verify your local settings are aligned with the pre-requisites to run this container, grosso modo make sure docker is installed, of course ! You have to have about 4 GB of free space pull image from docker hub https://hub. ...