SHUBHAM JAIN
¯ LinkedIn
github.com/shubhamjn1
-
R-
EDUCATION
Indian Institute of Technology (BHU), Varanasi
B.Tech in Ceramic Engineering (CGPA : 8.15)
Varanasi, India
July 2014 - July 2018
Central Academy, CBSE Class XII
Percentage : 88.4%
Jodhpur, India
April 2014
PROFESSIONAL EXPERIENCE
Data Science Associate Consultant, ZS Associates
Pune, India (July 2018 - Present)
• Revenue Forecasting: Project revenue for financial teams to enable better planning
- Designed an automated forecasting pipeline generating 21k ensemble models using statistical & ML modeling
- Best models were selected by evaluating error using rolling quarter based validation techniques to avoid overfitting
- Projected monthly revenue for a $3.2B business with MAPE of 6.8% as compared to 15.1% for finance forecast
• Intelligent Document Information Retrieval
- Developed a pipeline which leverages NLP models to extract relevant key-value pairs from the Rate Loading docs
- Fine-tuned RoBERTa based NER models and used in combination with Graph CNNs model
- Achieved 75% overall automation in the process with an accuracy of 95%
• Customer Lapse & Lifetime Value Prediction
- Developed a two-staged LSTM model capturing temporal information to predict customer’s lapse or lifetime
- Utilized impairment-based claims clustering and statistical distribution to estimate customer’s lifetime value
- Beat the in-house algorithm by achieving 86% recall for 0-3 months lapsers; 88% overall recall
• Lead Scoring and Prioritization for optimized sales targeting
- Developed and deployed a ML model ecosystem to improve prioritization of 10M leads over a year for 13 Sales
team across the world
- Model ecosystem consists of a combination of lead scoring (binary classification model) and deal size prediction
(weighted multi-class classifier) using XGBoost
- Achieved an overall AUC of 88%, resulting in $30M+ incremental revenue in 6months of deployment
• Automated QA engine to answer queries from RFPs
- Designed an AI framework to enable the hospitality client to respond RFP questions automatically
- Leveraged a combination of semantic models, topic modeling, key entity scoring using TF-IFD weights to
generate NLG based responses
- Achieved a precision of 96% and recall 60%; overall reducing the human efforts by 10x
• B2B Pricing Recommender : Streamlining pricing operations for a US steel Manufacturer
- Developed a Customer-SKU pricing recommendation model using Hedonic Mixed-Modeling regression techniques
- Improved profitability by presenting an uplift of 2-9% in overall projected revenue through recommended pricing bounds
Data Scientist Intern, Analytics Vidhya
Gurugram, India (April 2017 - July 2017)
• Blog Recommendation System: Designed a system to recommend personalized blogs to the readers using Collaborative Filtering approach
• Technical Blogs: Wrote technical tutorial blogs on Regression models, Genetic Algorithm, Multi-label classification,
NLP processing, visualization, etc., overall accounting for 1M+ views [LINK]
Data Analyst Intern, Reliance Industries - Analytics and Strategic Initiatives
Mumbai, India (Dec 2016)
• Forecasting Consumer Price Index : Utilized regression models and vanilla NNs to forecast the inflation rates at
QoQ basis
CONFERENCE AND PAPER ACCEPTANCE
Elsevier’s 5th World Research Summit for Tourism and Hospitality
Orlando, FL (Dec 2019)
- Presented our research work on Bayesian Behavioural Recommender: Personalize search ranking by utilizing customerattribute level willingness-to-pay using heterogeneous choice data at the conference [LINK]
- Full research paper is published in Elsevier’s International Journal of Hospitality and Management (IJHM)
[LINK]
HACKATHONS/COMPETITIONS
• ZS Young Data Scientist 2017: Finished second Runner-up in the final on-site round and ranked in top 0.2%
(Rank 20 out of 9,000) in the first round in the ZS Young Data Scientist Challenge [LINK]
• Kaggle: Secured Bronze Medal (Rank 181 out of 2,157) in Recruit Restaurant Visitor Forecasting Challenge, 2018
[LINK]
• BrainWaves 2017-18: Qualified for the on-site round by securing a rank in top 1% (Rank 60 out of 5,000) in BrainWaves
Machine Learning Challenge hosted by Societe Generale [LINK]
• Yes Bank Datathon 2018: Qualified for the on-site round and finished in top 21 solutions (out of 1,700) by creating
time-series clustering of their customers for effective campaign design and targeting [LINK]
TECHNICAL SKILLS
Programming Languages
Libraries & Technologies
Area of Expertise
Cloud Technologies
Python, R, RShiny, SQL, PySpark, HTML, CSS
Pandas, NumPy, Seaborn, Keras, Pytorch, Spacy, Tensorflow, Sklearn
Machine Learning, Deep learning, NLP, Forecasting, MLOps, Docker
Azure (AMLS, Synapse, DataBricks), AWS (Sagemaker, Lambda, S3, CloudWatch)
EXTRA CURRICULAR ACTIVITIES
• Organizer, DataHack Summit: Part of the organizing team of DataHack Summit 2017 hosted by Analytics Vidhya,
spanning 3 days with over 30+ speakers and 10 hack session focused on AI and ML
• Technical Head, Spardha’16 (Sports fest): Responsible for creating and managing the website for online registration
of over 1000+ participants
• Senior Advisor, Departmental Fest: Part of a 6 member team responsible for conducting a series of lectures &
demonstration