400 Nagle Street, College Station,TX




Seeking full time opportunities in the field of Data Analytics/Machine Learning/Operations Research starting from May, 2017.

EDUCATION Texas A&M University, College Station, Texas, USA

Master of Science in Industrial Engineering (Focus on Data Science and Analytics) GPA: 3.88/4.00 with InstateScholarship. Courses: Applied Multivariate Statistical Analysis, Engineering Data Analysis, R and Big Data Applications, Theory of Statistics,

Linear Programming & Optimization, Time Series Analysis, Non- Linear and Dynamic Programming. Indian Institute of Technology (IIT Roorkee) India Bachelor of Technology, First Division Merit Scholar

CORE COMPETENCIES Programming: R (3 years), Python (NumPy,SciPy, Pandas, Scikit-learn, NTLK) (3 years), SQL (1 year) , JAVA Visualization: ggplot2, Matplotlib, Seaborn, Tableau Big Data tools: Spark (SparkSQL, MLlib), Hadoop, Sqoop, Flume, Hive, Pig, MapReduce ( 1 year) Statistics and Machine Learning : Linear Regression, Logistic Regression, Decision Tree, Random Forest, SVM, Boosting, Neural Networks, K-Means, Hierarchical Clustering, kNN Strong fundamentals in Computer Science, Algorithms & Data Structures PROFESSIONAL EXPERIENCE

Aug 2015-May 2017

May 2010-May 2014

Data Scientist Co-op, Danaher Labs, Santa Clara,CA

Provided actionable insights for sales and marketing teams using customer characteristics like market segmentation , geographical location and buying behaviour. Used NLP tools like NLTK and tm library in R to segment customer feedback into positive and negative signals and predicted customer satisfaction. Visualized data in matplotlib and ggplot2 and built models predicting probability of customer retention using techniques like Random Forests, Boosted trees and logistic regression.

Aug 2016-Dec 2016

Part of a 3 member data science team predicting customer retention rate for Nobel Biocare,a Zurich based dental company.

Data Science Intern, Optimal Asset Management, Los Altos, CA

June 2016-Aug 2016

Built predictive models to optimize financial returns and created financial portfolios using factor analysis dealing with

highly multicollinear data using Principal Component Analysis(PCA) and other methods. Used R scripts to generate reports containing portfolio visualization and performance metrics.

Utilized data provided by financial firms like Blackrock to build and maintain SQL databases

Wrote, reviewed and managed production level code in a git framework. Data Analyst Engineer, Reliance Industries Limited July 2014 July 2015 Performed time series analysis and other forecasting methods like ARIMA to predict crude oil prices Handled large data sets including data in unusual formats, transforming data into a usable form, and aggregating data as needed using a variety of tools including Python and R Wrote MySQL scripts and built indices to extract data efficiently from remote databases

May, 2013-July, 2013

Data Analyst Internee, Reliance Industries Limited

Developed framework for Supplier Management in Strategic Sourcing of spare parts considering multiple strategic and operational factors PROJECTS

Semiconductor Fabrication Testing - Classification, Texas A&M University

April 2016 May 2016

Working on fault prediction in semiconductor manufacturing using feature selection techniques, random forest (cross validation), considering causal relationships with a view to identifying the key features to enable an increase in process throughput. Fraud Detection in Credit Card Transactions, Capital One March 2016April 2016 Predicted fraud in credit card transactions and built robust models using machine learning algorithms. Used Extreme Gradient Boosting based on decision trees and subsequently performed cost-benefit analysis to determine threshold for classification using domain knowledge and historical transaction data provided by Capital One Company. Data Science Project Group Leader - Drug Repositioning using Microarray Data Analysis, TAMU Jan 2016 April 2016 Worked on data cleaning, statistical analysis, modeling and testing of gene microarrays expression data using R programming in LINUX environment, to detect target drugs’ potential indications to treat new diseases, based on drug-disease information obtained from public databases. Source Code for projects :

CERTIFICATIONS “Big Data Basics: Hadoop, MapReduce, Hive, Pig & Spark”

“Advanced Databases and SQL Querying” “Machine Learning in Python”

Udemy Course Certificate No UC-M1FVUGW4 Udemy Course Certificate No UC-SXKMPL4V Udemy Course Certificate No UC-S22KPO0O

AWARDS Merit scholarship with Full Fee waiver at IIT Roorkee (Undergraduate university) Test score of 333 out of 340 in Graduate Record Examination(GRE) Only student awarded In-state Scholarship among 140 students in Industrial Engineering at Texas A&M University Secured 99.66 percentile in All India Engineering Entrance Examination (1723 out of 500,000 students) in IIT-JEE 2010