Kaggle: Jigsaw Multilingual Toxic Comment Classification - top solutions

Before we start Two of my previous post might be helpful in getting a general understanding of the top solutions of this competition. Please feel free to check them out. Knowledge Distillation clearly explained Common Multilingual Language Modeling methods (M-Bert, LASER, MultiFiT, XLM) Jigsaw Multilingual Toxic Comment Classification Use TPUs to identify toxicity comments across multiple languages. Overview of the competition Jigsaw Multilingual Toxic Comment Classification is the 3rd annual competition organized by the Jigsaw team....

August 11, 2020 · 9 min

Kaggle: Tweet Sentiment Extraction - top solutions

Note This post is the second part of overall summarization of the competition. The first half is here. Noteworthy ideas in 1st place solution Idea First step: Use transformers to extract token level start and end probabilities. Second step: Feed these probabilities to a character level model. This step gives the team a huge improve on the final score since it handled the “noise” in the data properly. Last step:...

July 2, 2020 · 14 min

Kaggle: Tweet Sentiment Extraction - common methods

Note This post is the first part of overall summarization of the competition. The second half is here. Before we start I attended two NLP competition in June, Tweet Sentiment Extraction and Jigsaw Multilingual Toxic Comment Classification, and I’m happy to be a Kaggle Expert from now on :) Tweet Sentiment Extraction Goal: The objective in this competition is to “Extract support phrases for sentiment labels”. More precisely, this competition asks kagglers to construct a model that can figure out what word or phrase best supports the given tweet from the labeled sentiment....

July 1, 2020 · 11 min

Kaggle: Google Quest Q&A Labeling - my solution

Kaggle: Google Quest Q&A Labeling summary General Part Congratulations to all winners of this competition. Your hard work paid off! First, I have to say thanks to the authors of the following three published notebooks: https://www.kaggle.com/akensert/bert-base-tf2-0-now-huggingface-transformer, https://www.kaggle.com/abhishek/distilbert-use-features-oof, https://www.kaggle.com/codename007/start-from-here-quest-complete-eda-fe. These notebooks showed awesome ways to build models, visualize the dataset and extract features from non-text data. Our initial plan was to take question title, question body and answer all into a Bert based model....

April 4, 2020 · 7 min