The ultimate data science challenge- Experience of being a finalist
By : Jobin Wilson
Principal R&D Architect - Data Science
KDD CUP overview
KDD Cup is a global annual interdisciplinary competition in the fields of Knowledge Discovery and Data Mining. The annual event is organised by ACM SIGKDD (Special Interest Group on Knowledge Discovery and Data Mining – a leading professional organization of data miners) along with the KDD Conference. The KDD CUP is a highly regarded competition in the area of data science. A complex new challenge is offered every year and data scientists from top universities and industry research labs all around the world compete to win the prestigious KDD Cup.
Flytxt representation in KDD Cup 2016 challenge and conference
It was a privilege to represent Flytxt in the KDD Cup 2016 and to attend the ACM KDD 2016 conference. Flytxt team had three members – Ram Mohan, Muhammad Arif and I, who worked under the guidance of Prof. Santanu Chaudhury and Dr. Brejesh Lall from the Dept. of Electrical Engineering, IIT Delhi.
This year´s challenge was to accurately forecast the impact of institutions in different conferences based on the number of accepted papers, utilizing any public data on the web. The contest consisted of 3 phases and spanned 3 calendar months, starting from 27th February 2016. Out of the 554 teams that participated, Flytxt team was among the 12 finalists who were invited to the KDD Cup 2016 Workshop to present a paper outlining our successful approach to the ranking problem.
The workshop was a part of the 2016 KDD Conference which was held at San Francisco, California from Aug 13th – Aug 17th 2016. Fellow KDD cup finalists were from NTT DOCOMO (Japan); Trend Micro (Taiwan); Intel (USA); Alibaba (China); CoreLogic (USA); Georgia Tech (USA); IISc Bangalore (India) etc. The pre-print version of our paper is available at https://docs.com/alex-wade/1504/ranking-academic-institutions-on-potential-paper.
Enriching experience @ KDD 2016 conference
Few gems from the KDD 2016 conference
The Turing Lecture by Whitfield Diffie, ACM Turing award winner for the year 2015, was enlightening and took us through the history of cryptography all the way back from Caesar Ciphers to Homomorphic encryption. The key intuitions behind his solution to public key cryptography problem (that he is most known for) and how he arrived at it based on a misunderstanding that he had, was fascinating. I particularly liked his quote “Misunderstanding is the seed of invention”.
The current popularity of deep learning research was reflected in the conference with the keynote talk by Nando de Freitas (Learning to learn and compositionality with deep recurrent neural networks), plenary panel “Is Deep Learning the New 42?” by Andrei Broder et al. In fact Microsoft had a
tutorial session by Amit Agarwal and Frank Seide on CNTK, Microsoft’s open-source deep learning toolkit. There were several interesting research papers as well on areas related to deep learning.
Large-scale Data Mining was as usual another key theme. Jeff Stribling from Verizon gave a talk titled “Large Scale Machine Learning at Verizon: Theory and Applications”, covering how Verizon applies large-scale machine learning on massive real-world data sets to support new revenue generating products and services. Linkedin’s talk titled “Business Applications of Predictive Modeling at Scale” by Yan Liu was also interesting and covered challenges, key technologies, and lessons learned from their experience in deploying predictive models at scale at Linkedin.
Streaming analytics and temporal evolution was another popular theme. A tutorial session titled “Streaming Analytics” by Ashish Gupta from Linkedin covered building streaming systems using open source technologies. A tutorial on IoT Big Data Stream Mining by Albert Bifet (Telecom Paris Tech) covered data stream learners for classification, regression, clustering, and frequent pattern mining as well as scalability issues associated with IoT applications.
Hands-on Spark 2.0 tutorial from Databricks was also stimulating and provided an overview of core APIs for using Spark 2.0, such as DataFrames, Datasets, SQL, streaming and machine learning pipelines.
Many of the KDD 2016 conference talks got published on videolectures.net as well as on the KDD 2016 YouTube channel and can be viewed at your convenience.
Overall, attending KDD 2016 was a fantastic learning experience and provided a great exposure and opportunity to closely understand the latest developments in the field of knowledge discovery and data mining, directly from renowned researchers and practitioners across the globe.