Experience of working on Python or R. Java or C# experience is
a plus. Necessary knowledge of working on Jupyter notebooks, spyder and google colab. Credible knowledge of statistics and linear algebra and how they provide the basis for most of machine learning algorithms. Understanding of the data pre-processing techniques such as feature extraction and dimensionality reduction. Solid understanding of machine learning techniques and algorithms, such as k-nearest neighbour, Naive Bayes, K-means clustering and SVM etc. Experience of working on common data science related modules such as NumPy, Scikit learn, matplotlib and pandas. Good understanding of relational databases like MySql as well as NoSql (Big data) systems like MongoDB and Hbase (preferred). Understanding of ETL pipeline and protocols. Data engineering skills would be a plus. Good understanding of Artificial neural networks such as Auto- encoders, Convolutional neural networks (CNNs) and Long short-term memory (LSTM). Some understanding of working on Keras and TensorFlow APIs. Good analytical and programming skills. The candidate must have a Bachelor’s degree in Computer Science or any related discipline and preferably a Master’s degree in Computer Science/ Data Science / Artificial Intelligence or any of the related fields. Minimum of 1 year relevant experience of working in the field of software development or database related domain. Data Analytics or Business Intelligence related experience is a plus. Relevant certifications in any of the tools mentioned above would be a plus.
Key Responsibilities
In-depth knowledge of SQL and other database solutions
Data Engineers need to understand database management, and as such, in-depth knowledge of SQL is required. Likewise, other database solutions, such as Mongo, Cassandra or Bigtable, are great.
Data warehouse architecture and ETL tools
Data warehousing and ETL experience is essential to this position. Data warehousing solutions like Redshift or Panoply, as well as familiarity with ETL Tools, such as with StitchData or Segment is hugely valuable. Similarly, experience with data storage and retrieval is equally vital, as the amount of data being dealt with is simply astronomical Hadoop based Analytics (Hbase, Hive, Mapreduce etc) Strong understanding of apache Hadoop-based analytics are very common requirements in this space, with knowledge of Hbase, Hive, and Mapreduce often considered a requirement. CODING Expertise in Python, C/C++, Java, Perl, Golang, or other such languages is required. MACHINE LEARNING While mainly the focus of data scientist, some level of understanding of how to act upon this data is also invaluable for Data Engineers. For this reason, some knowledge of statistical analysis and the basics data modeling are hugely valuable. While machine learning is technically something relegated to the Data Scientist, knowledge in this area is helpful to construct solutions usable by your cohorts. This knowledge has the added benefit of making you extremely marketable in this space, as being able to “put on both hats” in this case makes you a formidable tool.
Preferred Qualifications:-
1+ years of experience in industry as a Data Scientist, Machine
Leaning Engineer, Business Intelligence or in a related field Experience with software engineering and machine learning as well. Be adept with the basics of Python. You have fluency in most of the following topics: Probability, Statistics; Python, Manipulation of large data sets; Data visualization techniques; Algorithms; Machine learning Teaching Data Basics, Sampling, Study Design, Exploratory Data Analysis, Descriptive Statistics, Statistical Inference, Hypotheses Testing, Supervised & Unsupervised Machine Learning, Association Rule Mining, Principal Component Analysis, Predictive Modelling, Univariate/Multivariate Regression, Decision Trees, Random Forests, xgboost, Clustering (K-means, K-modes, DBSCAN), etc. Hands-on experience building production models with Python, Jupyter,Keras, Matplotlib, Numpy,Pandas, Plotly, Python, PyTorch, scikit-learn, SciPy, Seaborn, TensorFlow, XGBoost
Skill:
Knowledge of handling of databases is required.At least 1 years
hands-on python programming andMust have good hands-on experience with: BS/MS in Computer Science / Data Science oremonstrated commitment to learning about AIExperience building AI models in platforms suchMust be hands-on with Linux OS.strong foundation in a statistical platform suchWorking experience on Hadoop is a big plus.Demonstrated proficiency in multiple programming