I have just lost my first win (rank 1) of a Kaggle competition! It’s due to an expected data leak at the end and two other teams exploited it! So finally we ranked 3rd.

It is a 3-month contest about Deep Learning/Hidden Markove modelling to predict the number of open channels during the duration of time-series human electrophysiological signal data. It requires some finesse in time-series analysis, spotting the pivoting distinctive features from signal by systematic approaches (mainly unsupervised modelling like kNN and tSNE), then use an ensemble of methods (RNNs, decision trees, random forest) to avoid overfitting.

leaderboard

It was a really fun and helpful experience in the whole journey of this competition with my teammates. Each of us wrote their part about various aspects of the competition:

  • I wrote about the forward-backward algorithm that we used here

  • Gilles published our complete solution on Medium here

  • Zidmie wrote about the data leak here

  • My code about the best non-leak solution here

  • Our full code here

Have fun!