Sei sulla pagina 1di 8

Mirza Mujtaba Baig

Big Data Developer


Mobile#: 6024611706
Email: mujtaba961989@gmail.com

Professional Summary:
• Over 6+ years of IT work experience with 5+ years of relevant experience in the field of
Big Data, Hadoop, Hive, Pig, Java, Spark, Sqoop related technologies, 1+ year
experience in Oracle Technologies and 8+ months of experience on Java development
• Seeking a challenging position in Software development industry that needs innovation,
creativity and dedication and enable me to continue to work in a demanding and fast
paced environment, leveraging my current knowledge and fostering creativity with many
learning opportunities.
• Problem solving capability with main design goals of ER modeling for OLTP and OLAP.
And, implemented the best solutions suitable for the business needs. Developed core
modules in large cross-platform applications using JAVA, J2EE, JVM
• Repurposed Python scripts to Java/Python/AWS Lambda environments.
• Monitoring the Cassandra cluster using Splunk.
• Hands-On Experience in Hadoop/HDFS, MapReduce, Hive, HBase, Pig, Sqoop,
Amazon Elastic Map Reduce (EMR), Spark, Cloudera (CDH 3, 4, 5) sandbox
environments
• Having rich experience on big data, data warehouse, database, and business
intelligence experience using the following technologies: Oracle Database (10g, 11g).
And, involved in the project activities related to Requirements Gathering, Systems
Analysis and Design, Code Generation, Testing, Implementation, Support and
Maintenance.
• Experience in developing and automating application’s using Unix Shell Scripting in the
field of Big Data using Map-Reduce Programming for batch processing of jobs on a
HDFS cluster, Hive and Pig.
• Developed real-time Big Data solutions using No-SQL Column Oriented Databases
using Hbase, Cassandra, MongoDB, CouchDB which can handle petabytes of data at
one time.
• Working on Spark, SparkSQL environments using Scala programming. Involved in the
operations of Spark Streaming, Spark SQL, Scala Programming and Performance
Tuning
• Involved in the activities of RDD Creations and Operations of Data Frames and
Datasets for the Use-Case.
• Worked on HBase Shell, CQL, HBase API developing ingestion and clustering
frameworks with respect to Kafka, Zookeeper, YARN, Spark, Mesos and Kafka.
• Capturing data from existing databases that provide SQL interfaces using Sqoop and
processing stream data using Kafka, Spark Streaming and Flume.
• Hands-On experience in setting up Zookeeper for providing High Availability to clusters.
Hands on programming with Oozie and having Good knowledge of programming with
log data using Apache Flume.
• Developed Python based API for converting the files to Key-Value pairs for getting the
files sourced to the Splunk Forwarder.
• Developed a fully automated continuous integration system using Git, Jenkins, Splunk,
Hunk, Oracle and custom tools developed in Python and Bash.
• Strong experience in RDBMS using Oracle 10g, SQL Server, PL-SQL programming,
schema development, Oracle performance tuning.
• Active participation in tomcat server, webserver and Oracle problems (killing
instances, debugging server logs, applications logs).
• Written SQL queries, stored procedures, modifications to existing database structure
as required per addition of new features.
• Designed and developed Enterprise Eligibility business objects and domain objects with
Object Relational Mapping framework such as Hibernate.
• Experienced in design and development of various web and enterprise applications
using J2EE technologies like JSP, Servlets, JSF, EJB, JDBC, Hibernate, Spring MVC,
XML, JSON, AJAX, ANT and Web Services (SOAP, REST, WSDL).
• Experienced in WEB and GUI development using HTML, DHTML, XHTML, CSS,
JavaScript, JSP, Angular JS, JQuery, AJAX technologies.
• Working knowledge of Spring and Hibernate framework.
• Experience utilizing best practices for getting data into Splunk and the Common
Information Model.
• Led team to plan, design, and implement applications and software.
• Collaborated with business analysts, developers, and technical support teams to define
project requirements and specifications. Designed, developed, and managed map-
reduce-based applications, integrating with databases, establishing network
connectivity, and developing programs.
• Experience in provisioning Amazon Web Services (AWS) like EC2, ELB, S3, EBS, VPC,
RDS, DynamoDB, IAM, SNS, SQS, SWF, Route 53, Auto Scaling, Lambda, CloudFront,
CloudWatch, CloudFormation, Security Groups, ACL, NACL.
• Good knowledge at SOAP/WSDL and REST-FUL interfaces in Java. Created and
executed both load and functional tests for web services.
• Assisted project manager in defining project scope, time & effort estimates and
deliverable management.
• Developed a fully automated continuous integration system using Git, Jenkins, Splunk,
Oracle and custom tools developed in Python and Bash.
• Developed a proof of concept for using Spark and Kafka to store and process data
• Capturing data from existing databases that provide SQL interfaces using Sqoop.
• Importing and exporting data in HDFS and Hive using Sqoop
• Knowledge of machine learning, data science, artificial intelligence platform and
algorithms within the Big Data Python environment using Tensorflow.
• Familiar with machine learning algorithms related to supervised (Classification,
Regression), unsupervised (Dimensionality Reduction, Clustering), reinforcement (Real-
time decision making) algorithms.
• Involved in doing research in the areas of Cloud Computing, Data Science, Machine
Learning, Artificial Intelligence, Robotics, Automation
• Involved in the activities of doing research in the areas of Autonomous Vehicles, and,
Robotic Programming

Education Details:

• Ph. D. in Computer Science and Information Systems, University of the Cumberlands,


currently pursuing
• Master’s in Computer Science and Information Systems, University of Michigan-Flint,
2014
• Bachelor’s in Computer Science, Birla Institute of Technology, Pilani – Dubai, 2011
Technical Skills:

Big Data/ Hadoop Cloudera CDH 5.1.3, Hortonworks HDP 2.0, MongoDB, Python,
shell script, Hadoop, HDFS, MapReduce (MRV1, MRV2 YARN),
HBase, Pig, Hive, Sqoop, Flume, ZooKeeper, Oozie, Lucene,
Cassandra, CouchDB, MongoDB, Kafka, Scala, R, Kafka
Languages Java, C, HTML, SQL, PL/SQL, Scala
OS Windows 8, Windows 7, Windows XP/98, UNIX/LINUX, MAC
Databases Oracle (SQL / PLSQL), MySQL, NoSQL, Teradata
Web Technologies HTML, DHTML, XML, WSDL, SOAP, Joomla, Apache Tomcat

Databases Oracle 9g/10g/11g, SQL Server, MS Access


Build Tools Ant, Maven
Development Tools Adobe Photoshop, Adobe Illustrator, Eclipse, Linux/Mac OS
environment, MS Visio, Crystal Reports
Business Domains Distributed Systems, Online advertising, Social media
advertising
Data Analytics Python, R
ETL Tools Talend/Informatica

Professional Experience:

Accenture/Johnson Control Inc. Sep’18-till date


Atlanta, GA
Big Data Application Developer

Anthem, Inc. is an American health insurance company previously known as WellPoint Inc. It is
the largest for-profit managed health care company in the Blue Cross Blue Shield Association. It
was formed when Anthem Insurance Company acquired WellPoint Health Networks Inc., with the
combined company adopting the name WellPoint Inc.; trading on the NYSE for the combined
company began under the WLP symbol. In 2014, the company changed its corporate name to
Anthem Inc., and its NYSE ticker changed from WLP to ANTM.

Job Responsibilities:
• Worked on the Oracle, Teradata, Cloudera (CDH) for doing operations on the storage,
processing, migration of data.
• Involved in development of processing of the files stored in HDFS for the analytical
purposes.
• Performance optimizations based on SQL, PL/SQL, HiveQL scripts, python scripts, shell
scripts and CRON schedule.
• Performed testing activities related to Performance Testing, Unit Testing, Load Testing,
Functional Testing, Automated Testing for the HiveQL, Spark-SQL, PySpark scripts
developed relate to performance, scalability, reliability, availability, maintainability.
• Involved in the activities of RDD Creations and Operations of Data Frames and
Datasets for the Use-Case
• Proficient knowledge in bash shell scripting.
• Worked on parallel processing of data using in-built functions within shell script in the
UNIX environment
• Discovered the applications of data science algorithms to the big data platform within
Spark environment.
Environment:
Apache Hadoop, Apache Hive, Cloudera (CDH 5), Ubuntu, HDFS, MapReduce, Amazon Web
Services (AWS), Python, Splunk, Spark, Teradata, Oracle.
Servers: Redhat Enterprise Linux

Walmart Inc. Apr’18 - August’18


Bentonville, AR
Big Data Engineer

Wal-Mart Stores, Inc. doing business as Walmart, is an American multinational retailing


corporation that operates as a chain of hypermarkets, discount department stores, and grocery
stores. Walmart's operations are organized into four divisions: Walmart U.S., Walmart
International, Sam's Club and Global eCommerce. The company offers various retail formats
throughout these divisions, including supercenters, supermarkets, hypermarkets, warehouse
clubs, cash-and-carry stores, home improvement, specialty electronics, restaurants, apparel
stores, drugstores, convenience stores, and digital retail.

Job Responsibilities:

• Interact with clients to elicit architectural and non-functional requirements like


performance, scalability, reliability, availability, maintainability.
• Created tables, views in Teradata, SQL, Python, Shell Scripting according to the
requirements.
• Involved in the operations of Spark Streaming, Spark SQL, Scala Programming, RDD
Creations and Operations of Data Frames and Datasets for the Use-Case
• Building filters, parameters, visualizations, publishing customized reports and
dashboards, report scheduling using Tableau server.
• Having experience in Cassandra database modeling and queries development using
CQL and administration.
• Good conceptual understanding and experience in cloud computing applications using
Amazon Web Services (AWS)-EC2, S3, EMR and Amazon RedShift platforms.
• Experience in managing multi-tenant Hadoop clusters on public cloud environment -
Amazon Web Services (AWS)-EC2.
• Key space creation, Table creation, Secondary index creation, User creation & access
administration.
• Query tuning & performance tuning on cluster & suggesting best practice for
developers.

Environment: Apache Hadoop, Apache Hive, Apache Pig, Cloudera (CDH 5), MapR, Ubuntu,
HDFS, MapReduce, Amazon Web Services(AWS), Python, Splunk, Elastic-Search, Logstash,
Kibana (ELK), Tableau, Spark, Cassandra, Teradata, Oracle.
Servers: Redhat Enterprise Linux, Mainframe

HCL / Northern Trust May 17 to Mar 18


Chicago, IL
Sr. Hadoop/Splunk Developer
Northern Trust is a private funding company serving world's most sophisticated clients from
sovereign wealth funds and the wealthiest individuals and families to the most successful hedge
funds and corporate brands.

Job Responsibilities:
• Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce
jobs in Java for data cleaning and preprocessing.
• Worked on Installing and configuring the HDPHortonWork2.X and Cloudera (CDH 5.5.1)
Clusters in Dev and Production Environments.
• Volumetric Analysis for 43 feeds (CurrentApproximate Size of Data (70TB), Based on
which size of ProductionCluster is to be decided.
• Involved in loading data from UNIX file system to HDFS.
• Involved in creating Hivetables, loading with data and writing hive queries which will run
internally in MapReduce way.
• Responsible for implementation and ongoing administration of Hadoop infrastructure
• Importing and exporting data from different databases like MySQL, RDBMS into HDFS
and HBASE using Sqoop.
• Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning
Datanodes, Troubleshooting, Manage and review data backups, Manage & review log
files.
• In-depth knowledge of LDAP and Identity & Access management products.
• Designed LDAP Schemas, DITs to implement enterprise wide centralized repository.
• Hands on Experience in configuring LDAP, SSL, SSO and Digital Signatures.

Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing
Layer, HUE, AZURE, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Cloudera, Lily HBase, Cron.

American Express Jan’16 – Apr’17


Phoenix, AZ
Splunk and Big Data Developer
The American Express Company, also known as Amex, is an American multinational financial
services corporation headquartered in Manhattan's Three World Financial Center in New York
City, United States. The company is best known for its credit card, charge card, and traveler's
cheque businesses. Amex cards account for approximately 24% of the total dollar volume of
credit card transactions in the US.

Job Responsibilities:
• Participate in business and system requirements sessions
• Provided inputs on solution architecture based on solution alternatives, frameworks,
products
• Enhanced Search Query Performance based on Splunk Search Queries
• Performance optimizations based on python scripts, shell scripts and CRON schedule
• Involved in Resolving technical issues during development, deployment, and support
• Performed testing activities related to performance testing, Unit Testing, Load Testing,
Functional Testing, Automated testing for the python scripts developed
• Requirements elicitation and translation to technical specifications
• Actively involved in mounting file-systems, software installations, establishing
connectivity for the WAS, JBOSS, IAAS Servers to the integration systems related to
Databases (Oracle, Mainframe)
• Actively involving in monitoring the server’s health using the Splunk Monitoring and
Alerting tool, Tivoli Alerting tool
• Anchor proof of concept (POC) development to validate proposed solution and
reduce technical risk.
• Perform performance optimizations on Java/JVM frameworks and UNIX Shell Scripts
• Engaged multiple teams for sourcing the data files from the databases (Oracle,
Mainframe) to the servers involved in the platform
• Involved in configuring Load Balancer Configuration on the servers
• Involved in setting up Kafka and Zookeeper Producer-Consumer components for the
Big Data Environments
• Got Certified as Splunk Certified Power User
• Used Java Collection Framework for developing Map-Reduce applications and APIs for
NoSQL databases
• Involved in the activities of Python and shell-scripting for the Key-Value Pairs creation
and masking the PII data fields
• Developed Spark scripts by using Scala shell commands as per the requirement.
• Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
• Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data
Frames and Pair RDD's.
• Worked on migrating Map Reduce programs into Spark transformations using Spark and
Scala

Environment: Apache Hadoop, Apache Hive, Apache Pig, Cloudera (CDH 5), MapR, Ubuntu,
HDFS, MapReduce, Amazon Web Services(AWS), Python, Splunk, Supervisor, Monit,
Hazelcast, HAProxy, Kafka, Zookeeper, Elastic-Search, Logstash, Kibana (ELK), Servers:
JBOSS, WAS, IAAS, E-PAAS, Redhat Enterprise Linux, Talend, Microsoft Azure

Walmart, Inc. Jul’15 – Dec’15


Bentonville, AR
Big Data Developer
Wal-Mart Stores, Inc. doing business as Walmart, is an American multinational retailing
corporation that operates as a chain of hypermarkets, discount department stores, and grocery
stores. Walmart's operations are organized into four divisions: Walmart U.S., Walmart
International, Sam's Club and Global eCommerce. The company offers various retail formats
throughout these divisions, including supercenters, supermarkets, hypermarkets, warehouse
clubs, cash-and-carry stores, home improvement, specialty electronics, restaurants, apparel
stores, drugstores, convenience stores, and digital retail.

Job Responsibilities:
• Worked as a Dev-Ops Engineer.
• Involved in the activities of Release Planning
• Involved in the activities of deployments, developments, Change Request Creations,
Environment Readiness
• Performed activities on the development and production clusters
• Documented Design Documents for Big Data Analytics & Reporting
• Involved in the activities of daily standups and scrum planning
• Worked using Azure Data Lake Store for analyzing the data stored on YARN and
HDFS including multiple access methods related to Spark, Hive, HBase
• Analyzed the different kinds of structured and unstructured data including the
processing of files within the data stored in the Data Lake.
• Worked on App Engine and Amazon AWS back-ends and in the front-ends as well
• Migrated Hadoop metadata to Docker container
• Involved in the activities of Amazon EMR, S3, setting up connectivity using VPC
Connection
• Performed map-reduce operations using Amazon EMR
• Experienced in Data Modelling in SQL and NoSQL Databases
• Hands on experience in NOSQL databases like HBase, Cassandra, MongoDB.
• Worked using the tools of JIRA and Jenkins within the project.

Environment: Apache Hadoop, Apache Hive, Ubuntu, HDFS, MapReduce, Shell Scripting,
Python, HBase, Mongo DB, Couch DB, JIRA, Jenkins

Nike, Inc. Feb’15-Jun’15


Beaverton, OR
Big Data Developer

Nike, Inc. is an American multinational corporation that is engaged in the design, development,
manufacturing and worldwide marketing and sales of footwear, apparel, equipment, accessories
and services. The company is headquartered near Beaverton, Oregon, in the Portland
metropolitan area. It is one of the world's largest suppliers of athletic shoes and apparel.

Job Responsibilities:
• Worked with the QA team to perform testing of the components related to Big Data
environment and leveraging the capabilities of the existing scripts on the servers and
automating the scripts execution.
• Worked with Data Analytics team for meeting the testing requirements involved with the
Hive & Pig scripts for different Use-Cases in Hadoop.
• Documented Design Documents for Big Data Analytics & Reporting
• Performed Unit Testing for the python scripts
• Performed automation testing for the java scripts development involved.
• Involved in the operations of Cloudera, Hortonworks, MapR environments
• Performed End-to-End testing for the scripts execution in the big-data clusters
• Written test results and verified actual test results with the expected results for SQL and
HiveQL Queries
• Analyzed large data sets by running Hive queries and Pig scripts.
• Worked with the Data Science team to gather requirements for various data mining
projects .
• Developed multiple MapReduce jobs in Java for data cleaning and preprocessing
• Involved in loading data from LINUX file system to HDFS and then to Amazon S3.
• Responsible for creating and managing HBase Data Store.
• Hands on experience in NOSQL databases like HBase, Cassandra, MongoDB.

Environment: Apache Hadoop, Apache Hive, Apache Pig, Cloudera (CDH 5), Ubuntu, Auto-
CAD, Sqoop, HDFS, MapReduce, NoSQL, HBase, CouchBase, Oozie, Amazon Web
Services(AWS), Spark, Storm, Flume, Python, Shell Scripting

University of Michigan, Flint Sep’12-Dec’14


Flint, MI
Big Data Graduate Student Research Assistant
UM-Flint faculty from over 100 areas of study have gained an international reputation for their
dedication to engaged learning. Professors pour their expertise and creativity into the
development of research and service-learning projects that match course curriculum with the
world’s most-pressing issues. These projects bring learning to life, address community needs,
and fulfill students’ desires to contribute to the common good.
I have worked on developing and configuring the cluster management, big data using Hadoop
and Hive Data-Warehousing tool. Performing various types of queries on selected tables and
data from different fields. Developed reports related to financial, hospital environments. I have
developed website using Joomla for coordinating the functionality of big data on the Internet as
well.
Job Responsibilities:
• Provided consulting services, solutions and training around Big Data ecosystem
(Hadoop, NoSQL, Cloud).
• Mentored an intern working on recommendation engine using Hadoop/ Mahout.
• Built a scalable, cost effective, and fault tolerant dataware-house system on Amazon
EC2 Cloud. Developed MapReduce/EMR jobs to analyze the data and provide heuristics
and reports. The heuristics were used for improving campaign targeting and efficiency.
• Worked on multiple virtual machines such as Cloudera and Ubuntu.
• One of the demonstrations of my work is shown on http://ehps.weebly.com
• Implemented Map-Reduce Programming on Classical and YARN MapReduce daemons
• Worked on Big Data using Hive-Data Warehousing tool. And, developed a website to
coordinate with the Big Data.
• Developed 2D and 3D designs using Auto-CAD.

Environment:

CloudEra (CDH 3,4,5), Apache Hadoop, Linux, HDFS, Hive, Pig, Sqoop, Flume, Zookeeper,
HBase, Oozie, Flume, HortonWorks, MongoDB, Java, Map Reduce , Amazon EC2
infrastructure, Amazon Elastic Map Reduce (EMR), MySQL, shell scripts.

Potrebbero piacerti anche