Machine Learning

Machine Learning #

Section renders pages in section as definition list, using title and description. Optional param summary can be used to show or hide page summary

Decision Trees
Decision Trees # On each node, a decision tree asks a question and answers it with yes/no to classify observations under that node. The question could be based on a specific True/False (is obs.feature = something) question or could be based on numeric data (is obs.feature > 10) The classification can be categorical or numerical The first node is called the Root node, the nodes in the middle are called internal nodes, and the results are called leaf nodes
Explainability Snippets
version 2018 Why Model interpretation? # Understanding how a model makes decisions — model interpretation — has been on the front burner since the end of 2017. Decision support systems and models they are based on don’t explain which features influenced their decisions were known as black boxes. Model interpretability is not only important for companies that need to fulfill legal obligations to customers. It serves a technical purpose as well.
Random Forests
Random Forest # Random forests are built from decision trees Initially, the original data is bootstrapped by randomly sampling the data and creating a new dataset with the same size as the original one (to be able to do that, duplicated obs are allowed - aka random sampling with replacement) Build a decision tree based on the bootsrapped data Randomly select features (typically sqrt(n_features)) from the bootsrapped data when splitting nodes (this is called random subspace method) Go back to step 1 and repeat does all the original data end up in the sampled subsets?
SVM # P: We want to figure out a way to separate data into classes S: A linear classifier can help. Its objective would be to divide data using a hyperplane, and since the data points, from each class, that are closer to the classifier will be helping us decide on the orientation and position of the classifier, we can give them a fancy name (Support vectors) and call our linear classifier, the Support Vector Classifier.