My First Boot Camp Experience: AMMI (AIMS Ghana) 2019/2020 Machine Intelligence Bootcamp

6 minute read

Published:

The two-week long boot camp organised for the first set of African Master of Machine Intelligence (AMMI) students at African Institute for Mathematical Sciences (AIMS) Ghana, anchored by Ala Eddine Ayadi, came to an end today, October 24, 2019. Common adjectives used describe the experience by the participants include: intensive, rigorous and enlightening. It was entirely hands-on, consisting of class exercises after each module and projects.

The students were of diverse backgrounds: mathematics, computer science, physics, information system etc. Some students had prior knowledge of machine learning and data science through online courses, AI communities, bootcamps etc. but there were others who had zero knowledge. The course modules were structured to be all-inclusive, as they cover areas including python introduction (pandas, numpy, matplotlib and seaborn visualization), algorithm analysis and data structures, machine learning fundamentals, natural language processing, deep learning and time series concept.

The highlight of the bootcamp was the in-house kaggle competition on wine price prediction. The competition, which lasted 6 days, further exposed the students to various techniques of tackling machine learning problems. The class was divided into 12 teams with maximum of 3 students, 35 competitors, and 298 entries. We were expected to apply the concepts we learned during the bootcamp in designing models that would be tested on given datasets in a 70/30 training/test ratio.

The kaggle competition was truly competitive, as seen in the amount of time and efforts everyone put into it. Students were found in class late into the night, training models that ran for over 6 hours. It was exciting to see how students strove to be at the top of the Leaderboard, which was evident in the close range of scores they had after submissions (every team was expected to select two of their best submissions for submission). The instructor, Alaeddinne, was always available, moving from one team to another to offer help to students whenever they had questions. The tutors for the cohort, Jerry and Kossi, were always on ground and were huge support throughout the entire bootcamp, organising tutorials and granting one-on-one sessions to interested students. By the end of the competition, the Private Leaderboard was opened and the winner was revealed. Team 1 (Deborah, Salomon and Nando) came top (kudos to them) with a score of 16.80090, beating the 39.58473 benchmark. They were ushered to the front of the class amidst applauds and loud cheers, as they received their prize, a deep learning book, presented by Alaeddine as a way of encouraging the team. Everyone was also appraised for a job well done.

I (Tunde Ajayi) was paired with two other devoted students (Abubakar from Sudan and Tatiana from Cameroon) who were very committed to winning the competition. Although we didn’t win, we learned a lot. We met everyday, for at least 3 hours after dinner, to work on the project. The training dataset contained 175,000 rows and 14 features. Only 2 features were numeric, while others were string. These attributes pushed us to adopt different techniques such as K-fold CountVectorizer, Word Embedding, LSTM Encoder, One Hot Encoding, Grid Search and PCA, in order to prepare the data set for training our model. Most of these techniques were strange to us at first, but we had to learn them before applying them. One interesting thing I learned was using google’s pre-trained model to train our data, so as to transfer attributes and semantics of the pretrained model to a specified feature of our dataset (consisting of string values) which converts it to vectors with meanings that make sense to the computer.

We also performed visualization of the dataset using pairplot, jointplot, barplot and heatmap in order to gain insight into the data, how they relate to one another and their relationship with the target feature (price). This led us to discover that apart from most of the features being strings, some of them were redundant. We also detected some noisy data and took care of outliers in order to clean up the data.

For the actual training of the data, we tried different algorithms. We started by using linear regression, just for the fun of it. We actually used it to explain some concepts to each other during our meet up. Of course, linear regression didn’t give us the perfect score we wanted. So we adopted other techniques like random forest, decision tree and neural network. Training models was another interesting phase for us. It is not uncommon to find students training models for over 6 hours. In fact, we took turns to stay awake at nights, to keep watch while training the models. It was then we truly discovered that the core of developing a model is in feature engineering and not model training. How fast your model will train depends a lot on what is being done before training.

Using root mean squared error to check the accuracy of our model, out of our 2 submissions, Random forest gave us our best scores both in the private and public leaderboards.

The next day, the entire teams made presentation on how they got their score. It was another learning experience for us as we discovered other algorithms and techniques adopted by other teams and how they boosted their scores. We decided to continue learning after the bootcamp and also participate in more kaggle competitions in future.

So far, the bootcamp was an awesome learning experience for my team and the entire class. We could only imagine how much we would learn by the end of the cohort if we could amass such wealth of knowledge in the span of 2 weeks. The realization was both humbling and exciting, making us more grateful to the Founder and Director of AMMI, Moustapha Cisse, his entire team and our amiable sponsors, for the privilege to be part of the cohort and filling us up with anticipation of having more interesting learning experiences in the subsequent classes.