Summary
Keywords
Full Transcript
Just like we saw in CatBoost Part 1, Ordered Target Encoding, we're going to use the training data one row at a time to build and calculate the output values from trees. This is part of CatBoot's determined effort to avoid leakage like there is no tomorrow. We'll also learn how CatBoost makes predictions once the trees made. NOTE: This StatQuest is based on the original CatBoost manuscript... https://arxiv.org/abs/1706.09516 ...and an example provided in the CatBoost documentation... https://catboost.ai/en/docs/concepts/algorithm-main-stages_cat-to-numberic English This video has been dubbed using an artificial voice via https://aloud.area120.google.com to increase accessibility. You can change the audio track language in the Settings menu. Spanish Este video ha sido doblado al español con voz artificial con https://aloud.area120.google.com para aumentar la accesibilidad. Puede cambiar el idioma de la pista de audio en el menú Configuración. Portuguese Este vídeo foi dublado para o português usando uma voz artificial via https://aloud.area120.google.com para melhorar sua acessibilidade. Você pode alterar o idioma do áudio no menu Configurações. For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer 0:00 Awesome song and introduction 1:10 Building the first tree 6:05 Quantifying the effectiveness of the first threshold 6:56 Testing a second threshold 9:05 Building the second tree 10:21 The main idea of how CatBoost works 12:15 Making predictions 13:02 Symmetric Decision Trees 14:56 Summary of the main ideas Corrections: 2:05 Red should have gone into bin 0 instead of bin 1. 7:23 I should have said that the cosine similarity was 0.71. #StatQuest #CatBoost #DubbedWithAloud
