Introduction
There are two types of machine learning algorithms – classification and regression. The regression models are used to predict continuous values, whereas classification algorithms are used to classify a value into distinct areas. Logistic regression is a very fundamental classification algorithm in machine learning. It is based on the use of the sigmoid function to classify if a piece of data is in one group or another. An article titled “Logistic Regression with Python” uses this function to classify brain tumors into malignant or benign(the article can be found here).
Summary of Article
The article starts out by explaining what logistic regression is and how it is used. It then goes into some basic math of the equation for logistic regression and how the curve is changed based on the individual problem and dataset at hand. After this, the article gets into the analysis of brain tumors using logistic regression. It first does some housekeeping things such as loading libraries and data, defining variables, as well as initializing the data. Next, the article actually trains the model on the training data (about 80% of the overall dataset). The article then defines another function – the cost function which “count[s] the sum of the metric distances between our hypothesis and real labels on the training data”. We want to minimize this which all comes down to an optimization problem between the number of parameters and the distance between the prediction and actual value. We solve this through differentiation. We then update the weights of our model through a hyperparameter called the learning rate which “sets the intensity of the training process”. The learning rate is the amount of change in each iteration of data to the weights. Have the learning rate be too high and you will skip over the correct weights, have it been too low and you will never reach them. Now the article tests the model which comes out to have around a 90% accuracy which is pretty good.
My Take
This article was very interesting. It is a bit higher of a level than some of the other articles on this page, but if you are ready for a challenge then you will thoroughly enjoy this article. The reason I enjoyed this article so much was because it not only took you through everything that they did with code samples to show you the syntax for it (which really helped) but it also introduced (at least for me) some new techniques such as optimization for the number of parameters and the distance between the true and predicted value. I really found this to be interesting because up until now I had only thought of just minimizing the distance, never the flipside of the amount of resources it will take to do that with bigger datasets. The solution for this problem is also very elegant, an optimization problem through differentiation. Another thing that I really liked about this article was the graph provided for the cost function to further illustrate and help the reader understand the concept.
Conclusion
Overall this article was a good one and I recommend that you read it. Its use of visuals, sample code, and explanation to help the reader understand and take away key concepts presented in the article is really good. It also introduced me to new concepts such as optimization for determining the nature of the cost function and the mathematical art behind these functions and how they are used. I really recommend you give it a read(the article can be found here).