Sei sulla pagina 1di 5

Rashmi

630-345-5239
hadoopdevep24+rashmi@gmail.com
SUMMARY
7+ years of software development experience which includes 4 years on Big Data Technologies like
Hadoop, Hive, Pig, Sqoop, HBase and Flume.
Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing
partitioning and bucketing, writing and optimizing the HiveQL queries.
Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Good understanding of NoSQL Databases.
Worked in Windows, UNIX/Linux platform with different technologies such as SQL, PL/SQL, XML,
HTML, CSS, Java Script, Core Java etc.
Experience in Hadoop administration activities such as installation and configuration of clusters using
Apache and Cloudera
Experience in using IDEs like Eclipse and NetBeans.
Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams and Class
Diagrams.
Working knowledge of database such as Oracle 10g.
Experience in writing Pig Latin scripts.
Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME
and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using
visualization/reporting tools.
Hands on experience in configuring and working with Flume to load the data from multiple sources
directly into HDFS.
Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
Clear knowledge of rack awareness topology in the Hadoop cluster
Experience in use of Shell scripting to perform tasks.
Familiar in Core Java with strong understanding and working knowledge in Object Oriented Concepts
like Collections, Multithreading, Data Structures, Algorithms, Exception Handling and Polymorphism.
Basic knowledge in application design using Unified Modeling Language (UML), Sequence diagrams,
Case diagrams, Entity Relationship Diagrams (ERD) and Data Flow Diagrams (DFD).
Extensive programming experience in developing web based applications using Core Java, J2EE, JSP
and JDBC.

Comprehensive knowledge of Software Development Life Cycle coupled with excellent


communication skills.
TECHNICAL SKILLS
Languages
Scripting Languages
Big Data Ecosystem
Operating Systems
RDBMS
Modeling Tools
Web Technologies
Web Services
IDEs
Methodologies
Familiar GUIs
Servers

JDK1.6/1.7, 2EE 1.5,


C, JAVA, SQL, PIG LATIN, Bash scripting, JSON
HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Impala, Cassandra, Oozie, Zookeeper,
Flume
Windows 2008/XP/8.1, UNIX, Linux, Cento OS
Oracle 10g, SQL Server 2005/2008, MS-Access, MySQL, NoSQL
UML on Rational Rose 4.0.
HTML, XML, JSP, CSS, Ajax, jQuery
WebLogic, Web Sphere, Apache Cassandra, Tomcat
Eclipse, NetBeans, WinSCP
SDLC
MS Office Suite, MS Project
Apache Tomcat

EDUCATION
Bachelor of Technology, Computer Science and Engineering, JNTU, India
PROFESSIONAL EXPERIENCE
Senior Hadoop Developer
Infinity
Health
Care,
Milwaukee,
WI
Jul 2014 - Present
Infinity Health Care is the premier provider of emergency department management and wide spectrum of
healthcare services. The project, which I worked on, was aimed at developing an analytical platform for big
data analytics. The setup involved creating a base for the Hadoop cluster with Pig, Hive and other such
packages for analysis. The project sets up the ecosystem for import of the data, pre-processing the data to clean
out and extract information for the clients.
Responsibilities:
Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in Java for
data cleaning and preprocessing.
Experience in installing, configuring and using Hadoop Ecosystem components.
Experience in Importing and exporting data into HDFS and Hive using Sqoop.
Load and transform large sets of structured, semi structured and unstructured data.
Worked on different file formats like Sequence files, XML files and Map files using Map Reduce
Programs.
Responsible for managing data coming from different sources.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
Gained good experience with NOSQL database.
Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in
map, reduce way.
Responsible for building scalable distributed data solutions using Hadoop.
Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Experience in managing and reviewing Hadoop log files.
Involved in loading data from LINUX file system to HDFS.
Implemented test scripts to support test driven development and continuous integration.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Worked on tuning the performance Pig queries.
Mentored analyst and test team for writing Hive Queries.
Installed Oozie workflow engine to run multiple MapReduce jobs.
Implemented working with different sources using Multi Input formats using Generic and Object
Writable.
Cluster co-ordination services through Zookeeper.
Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related
properties, thrift server in Hive.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades
as required.
Worked with the Data Science team to gather requirements for various data mining projects.
Environment: Cloudera CDH 4, HDFS, Hadoop 2.2.0 (Yarn), Flume 1.5.2, Eclipse, Map Reduce, Hive 1.1.0,
Pig Latin 0.14.0, Java, SQL, Sqoop 1.4.6, Centos, Zookeeper 3.5.0 and NOSQL database.
Senior Hadoop Developer
Pacific Life, TX
Sep 2013 - Jun
2014
Pacific Life, a leading diversified international group of companies and is one of the top providers of Life
insurance in United States. E-Commerce is a web application that facilitates customers to get a fast quote
online, originate a policy and service an account. The company has many regional offices in the business of

home and insurance. Each regional office sends the details of monthly transactions to the central system. Most
of the feed is in the raw text format and the data is in fixed line format. The companys business needs include
the assessment of current users and new users geographically and the types of package are more lucrative to
customers as per their location geographies.
Responsibilities:
Involved in defining job flows, managing and reviewing log files.
Supported Map Reduce Programs those are running on the cluster.
As a Big Data Developer, implemented solutions for ingesting data from various sources and processing
the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase,
Hive, Oozie, Flume, Sqoop etc.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Imported Bulk Data into HBase Using Map Reduce programs.
Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
Perform analytics on Time Series Data exists in HBase using HBase API.
Designed and implemented Incremental Imports into Hive tables.
Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying
on the log data.
Wrote multiple java programs to pull data from Hbase.
Involved with File Processing using Pig Latin.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in
map reduce way.
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and
other sources.
Worked on debugging, performance tuning of Hive & Pig Jobs.
Used Hive to find correlations between customer's browser logs in different sites and analyzed them to
build risk profile for such sites.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing
Hive queries and Pig Scripts.
Environment: Java, Hadoop 2.1.0, Map Reduce2, Pig 0.12.0, Hive 0.13.0, Linux, Sqoop 1.4.2, Flume 1.3.1,
Eclipse, AWS EC2, and Cloudera CDH 4.
Hadoop Developer
Symetra, Seattle, WA
Jan 2013 - Aug
2013
Symetra is a United States-based family of companies providing retirement plans, employee benefits,
annuities and life insurance through independent distributors nationwide. Symetra elevates people's? lives
through retirement, employee benefits, and life insurance products that help provide security and confidence.?
Responsibilities:
Developed Big Data Solutions that enabled the business and technology teams to make data-driven
decisions on the best ways to acquire customers and provide them business solutions.
Involved in installing, configuring and managing Hadoop Ecosystem components like Hive, Pig, Sqoop
and Flume.
Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for
processing the data.
Worked on Importing and exporting data from different databases like MySQL, Oracle into HDFS and
Hive using Sqoop.
Worked on Writing Hive queries for data analysis to meet the business requirements.
Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from
different sources using Flume and managing.
Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data
sources and to perform joins on the Map side using distributed cache.
Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.

Created internal and external tables with properly defined static and dynamic partitions for efficiency.
Implemented Hive custom UDFs to achieve comprehensive data analysis.
Used the RegEx, JSON and Avro SerDes for serialization and de-serialization packaged with Hive to
parse the contents of streamed log data.
Developed Pig scripts for advanced analytics on the data for recommendations.
Experience in writing Pig UDF's and macros.
Exported the business required information to RDBMS using Sqoop to make the data available for BI
team to generate reports based on data.
Developed generic Shell scripts to automate Sqoop job by passing parameters for data imports.
Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for
processing the data.
Implemented daily workflow for extraction, processing and analysis of data with Oozie.
Responsible for troubleshooting MapReduce jobs by reviewing the log files.
Environment: Hadoop, MapReduce, Hive, Oozie, Sqoop, Flume, JAVA, LINUX, CentOS
Hadoop Developer
Digilant, Boston, MA
Oct 2011
- Dec 2012
Digilant builds software that automates media buying, making big data actionable for brand marketers.
Digilant uses programmatic advertising to connect brands to their next customers by incorporating valuable
first-party data about behaviors; actions and interests demonstrated by consumers across web and mobile touch
points. The company receives log data from consumer touch points, the data is then aggregated based on user
interests and social engagement that helps in targeted advertising.
Responsibilities:
Supported MapReduce Programs running on the cluster.
Given POC of FLUME to handle the real time log processing for attribution reports.
Evaluated business requirements and prepared detailed specifications that follow project guidelines
required to develop written programs.
Exported the result set from Hive to MySQL using Sqoop after processing the data.
Used Oozie workflow engine to run multiple Hive and Pig jobs.
Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
Used Hive to partition and bucket data.
Analysed the partitioned and bucketed data and compute various metrics for reporting.
Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
Worked on Hive for exposing data for further analysis and for generating transforming files from
different analytical formats to text files.
Written HiveQL in creating Hive tables to store the processed results in a tabular format.
Writing the script files for processing data and loading to HDFS.
Developed Pig scripts for advanced analytics on the data for recommendations.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
Deployed and configured flume agents to stream log events into HDFS for analysis.
Analysed the data by performing Hive queries and running Pig scripts to study customer behaviour.
Worked on improving performance of existing Pig and Hive Queries.
Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and
uploaded to database.
Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
Environment: Hadoop, CDH4, PIG, HIVE, Sqoop, Flume, SQL, Oozie, MapReduce, Java.
Java Developer
Karur Vysa Bank, India
2011

Apr 2010 - Sep

This project was developed for one of the top client in the financial sector. It was developed with a flexible
design, which provides the platform to offer on-line integrated financial services to the bank customers. Using
this web based application customers can conduct activities like on-line retail banking, secure messaging,
Credit Card Payments, Shopping, Corporate banking and even payment to third party individuals and
institutes.
Responsibilities:
Maintained the UI screens using web technologies like HTML, JavaScript, JQuery and CSS.
Involved in Requirements analysis, design, and development and testing.
Designed, deployed and tested Multi-tier application using the Java technologies.
Involved in front end development using JSP, HTML & CSS.
Documented the changes for future development projects.
Involved in code deployment, unit testing and bug fixing.
Prepared design documents for code modified and ticket maintenance.
Implemented Multithreading concepts in java classes to avoid deadlocking.
Used MySQL database to store data and execute SQL queries on the backend.
Used Apache Ant for the build process.
Involved in developing JSP for client data presentation and, data validation on the client side with in the
forms.
Actively involved in code review and bug fixing for improving the performance.
Documented application for its functionality and its enhanced features.
Used JDBC connections to store and retrieve data from the database.
Environment: Java, HTML, CSS, XML, JavaScript, JQuery, Apache Tomcat, Ant, SQL, PL/SQL and Shell
scripting
Java Developer
Credit Information Bureau India Ltd., India
Jun 2008 Mar 2010
Credit Information Bureau India Ltd. is one of the leading and secured financial institutions in India. Loan
Approval and Payment system is an automated multi-application system by which customers of the bank can
have quick processing of their loan applications and set up one-time or recurring payments. The customers can
use the User Interface to keep track of all aspects of their loans and their payment details.
Responsibilities:
Analyzed the system and gathered the system requirements.
Created design documents and reviewed with team in addition to assisting the business analyst / project
manager in explanations to line of business.
Developed the web tier using JSP to show account details and summary.
Designed and developed the UI using JSP, HTML and JavaScript.
Utilized JPA for Object/Relational Mapping purposes for transparent persistence onto the SQL Server
database.
Used Tomcat web server for development purpose.
Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts,
PL/SQL code for procedures and functions.
Developed application using Eclipse.
Used Log4J to print the logging, debugging, warning, info on the server console.
Interacted with Business Analyst for requirements gathering.
Environment: Java, J2EE, JUnit, XML, JavaScript, Log4j, CVS, Eclipse, Apache Tomcat, and Oracle.