A Checklist For The Right ML Algorithm

Introduction

In machine learning (ML) there are a plethora of different algorithms to choose from. Algorithms are the in-between processing that happens when you give the model data and get a result out of it. From algorithm to algorithm, styles vary, the consumption of resources varies, the accuracy varies, and they all have their advantages and disadvantages. But which one is the best algorithm? Which algorithm should be crowned king of all the algorithms? Which algorithm should you use when if you are tasked with solving a problem using ML? Well, the answer is complicated and has a lot of variables involved. However, Dr. Roi Yehoshua in his article titled “Which ML Algorithm to Choose?” tries to answer this question (the article can be found here).

Summary of Article

Dr. Yehoshua answers this question, not in the expected way of giving a specific algorithm, but rather with a set of 10 questions that can help you assess which algorithm is best for your specific project and needs. The questions are wide-ranging, covering everything about the algorithm from the algorithm itself, to which types of problems it can solve, to any preprocessing that needs to be done to the data, to the toll that this algorithm takes on your computer, to the amount of data that is needed to train the model, risk of overfitting, ways to deal with overfitting, ease of interpretation of the results, the number of hyperparameters that the algorithm has, and much more. The article then uses those 10 questions to assess the effectiveness of two very common algorithms (decision trees and neural networks). Finally, the article ends by adding an interesting fact about the popularity of each algorithm in winning competitions on Kaggle.

My Take

Overall, I found this article really interesting. I also really liked how the author answered the question of which algorithm is best with a series of questions and not with a definitive answer of this algorithm because in ML the algorithm that is best heavily depends on the specific project and the data you have for that project. Also, the questions are like a checklist that you can go through at any time to decide which algorithm is the best for your specific project. I find this helpful because I used to not be sure which algorithm to choose for my project but now this checklist made is very easy. Dr. Yehoshua also mentions that the most frequently used algorithms by Kaggle competitors in 2015 that most of them used neural networks solely, or neural networks coupled with something else. This makes sense since they are so versatile but take a long time to build and a long time to find the perfect parameters to ensure that you aren’t overfitting the data. However, this data was from 2015 and I agree with Dr. Yehoshua that it would be really interesting to find out which ones are the most used now.

Conclusion

All in all, this article was a really good article. It delivered on its purpose in an innovative way and one that was true to the field. It used examples to hammer the main points home. Its content was explained in a thorough and easy-to-understand way. The king of all algorithms is – all of them? Finally, I will definitely be using the content in this article throughout my journey in this field. I recommend you read it (the article can be found here).