AI, Crowd-Sourced: Machine Learning for Beginners
In this series, we help our readers learn Machine Learning through interesting challenges on AIcrowd. Through our open-sourced challenges and dataset, we provide the users with a plethora of resources, techniques, and tools.
For this segment, we present multiple beginner-friendly fundamental Machine Learning algorithms-based challenges. These challenges and algorithms serve as a stepping stone for the advanced application and integration of ML. This will prepare you to explore other fields such as Computer Vision and Natural Language Processing.
The term Machine Learning was first coined in 1959 by Arthur Samuel, one of the pioneers of Artificial Intelligence at IBM. Machine Learning acts as the core subarea of Artificial Intelligence and is the most essential piece to achieve “intelligence” in machines.🤖
At its core Machine Learning is developing computer programs that can receive various types of data as input (images, text, signals, numeric tables, etc), and recognize patterns in the data; outputting insights and predictions.🔮
Let us look at some hand-selected challenges from our platform that will jumpstart your ML journey!🔍
Red Wine is one of the most popular drinks. Its ever-popular demand has increased in recent times. The wine industry has stepped up to meet the demand. To stand out among the large competition, sellers are promoting product quality certification. But this process is expensive and time-consuming, often done with the help of human experts.
With human evaluation being a somewhat abstract concept, this challenge urges the participants to devise ways to classify the quality of wine solely based on its chemical composition.
The dataset provides the users with 11 different chemical statistics using which participants are required to assign the wine a score from 1-10. All the data rows contain the last attribute which communicates if the wine is either “good quality” (a score of 7 or higher) or not (a score below 7). The dataset itself serves as a great outlet to look deeper into data analytics with over 1600 rows of data spread across 12 columns.
The AIcrowd team starts off the participants with a beginner-friendly starter kit. The starter kit shows different ways to approach the prediction while also introducing users to Google Colab, the free GPU-based ML model training platform provided by Google.
The code shows an implementation of Support Vector Classifier to predict the “quality” of the wine samples. SVC serves as a go-to as a supervised learning method that is used for classification, regression, and outliers detection. These classifiers are highly versatile and are proven to be really effective in high-dimensional datasets.
Let us look at the runner’s up during the original run of the challenge. Aayush Mishra’s submission saw that changing the model training from SVCs to a Random Forest Classifier performed better. The Random Forest, as the name implies, is made up of a large number of individual decision trees that work together as an ensemble. Each individual tree in the random forest produces a class prediction, and the class with the most votes becomes the prediction of the model.
Participants may play around with more such Machine Learning algorithms by importing models from SciKit Learn.
Let us know which technique worked the best for you in the comments!🔮
The working class serves as the very backbone of a strong economy. But often individuals in this class who are not part of any union tend to be exploited and neglected. Therefore, knowing the condition of such people is essential to ensure happy and thus productive individuals.
AIcrowd presents Labor, a challenge aimed at predicting the condition of labor using data like working hours, pay rise in the last year, vacations, pension, and much more. The data contains over 40,000 individual entries spread over 17 categories. For this challenge, the participants are required to predict the condition of the workers using this data.
The starter kit from AIcrowd provides a baseline that uses SVC for training the model. With its highly versatile nature, SVC serves as the go-to and is considered one of the most famous algorithms for tasks like classification, regression, and outliers detection. In this implementation, however, the algorithm is used with its bare minimum parameters which can be further tweaked and improved from the documentation.
Team TODO that secured the first rank during the first run of the challenge was able to achieve an F1 score of 0.969 using XGBoost. XGBoost is an ensemble learning method, which means instead of using just a single model for predictions, the method concatenates predictions of multiple models to deliver a better result. Due to the multiple models that are trained in separate blocks, XGBoost enables a rather powerful parallel computing possibility which reduces the training time by multiple folds.
Wanna look into more ML algorithms to get a better understanding and results? Head over to SciKit Learn’s list of pre-written ML Algorithms.
Let us know what you used!🙋🏼♂️
Ever since IBM’s Deep Blue defeated Gary Kasparov in a game of Chess, the first instance in history where a reigning human champion of a game lost to an AI, games have served as one of the biggest benchmarks to assess a computer’s intelligence.
Conventionally these prompts are mostly tackled using Reinforcement Learning, a method where the agent learns to achieve a goal through a series of decisions that have consequences in a complex environment. At AIcrowd, we created a different challenge that urged participants to use standard ML algorithms.
The dataset provided is the KRK Dataset, which consists of the positions of White King, White Rook, and Black King, wherein the first few moves of these pieces help devise the only possible outcomes which are either a draw or White King wins. The dataset comes with over 28,000 instances of different attributes like White King file (column), White King rank (row), etc. which are eventually used to predict the optimal depth-of-win for White in 0 to 16 moves, or, draw(-1).
AIcrowd presents a starter kit for the challenge, which chooses an implementation of Linear Regression to solve the prompt as a classification problem. Linear regression is one of the oldest and widely used methods of tackling a prompt where a relationship has to be established between the prediction and one or more variables, which are 6 in the case of this prompt.
Team TODO, a team scoring one of the highest scores on the evaluations with an F1 score of 0.952, went for an interesting approach using FastAI. The team uses the tabular learner function that automatically sieves through your data and comes up with a model. It also infers the right loss function for training.
However, if you feel like experimenting with a different approach, check out the plethora of ML algorithms available on SciKit Learn.
What field would you like us to explore on our platform for you next? Comment below or tweet us @AIcrowdHQ to let us know!🗣
Continue the learning! ⬇️