Sei sulla pagina 1di 15

 

 
The Crowdsourced Guide to the 
KPMG Virtual Internship  
Link to the Virtual Internship:
https://www.insidesherpa.com/virtual-internships/theme/m7W4GMqeT3bh9Nb2c/KPMG-Data-A
nalytics-Virtual-Internship

Introduction 2

Module 1: Drafting an email 2


Guide to module 1 2
Questions about Module 1 from students: 2
Glossary for Module 1: 2

Module 2: Prepare a detailed approach for completing the analysis 3


Guide to module 2 3
Student Questions about module 2 5
Glossary for Module 2: 5

Module 3: Data Insights and Presentation 6


Guide to module 3 6
Student questions about module 3 6
Glossary for Module 3: 6

Techniques and processes to analyse the data 7


Pivot Tables 7
Regression Analysis 7
More coming soon 7

Software and tools to analyse data 8

Questions from the community 9

Recruitment and landing a job at KPMG or in Data Analytics 10


Introduction
Congratulations on starting the KPMG Virtual Internship!

Hi Everyone! This is the crowdsourced guide to the KPMG Virtual Internship. It’s a long-term
way for the students of the program to help current and future students out.

Why Read this?


This is basically a guide on how to take the next step to learn from and do the Virtual Internship.
Now there’s “no right answer” with most cases/engagements/client work, but there are ways to
get to better thought-out answers. In this guide, there will be an index and a rough guide on
what steps you could take to get to a well-thought out answer and opinion to the modules.

Can I contribute?
Yes! This document is in the ‘comment and suggest mode’ - so everyone can propose a
change. Our team will go over any proposals and approve them accordingly!

What do I get out of contributing?


Firstly - helping other students succeed is an awesome thing! It’s actually a really good tell if
someone’s going to be a good manager (especially if they’re backed up with technical skills).
Secondly - we’ll make a note of the contributors and we’ll save it onto your profile on
InsideSherpa! KPMG will see this when they look over your profiles on the Virtual Internship.

Module 1: Drafting an email

Guide to module 1
<Feel free to add notes here suggestions - we’ve written module 2 up first>

This module is very important as it will set up how beneficial our final insights for our client will
be. Some data quality issues are usually more obvious to spot than others.

If you are using excel, one of the ways to identify invalid data is to use the filter tool which allows
to us to see all the different values that are in the column. This method works best for
categorical data. For identifying invalid numerical data, you can use the sorting tool to find
values that are infeasible ( e.g impossible ages or sales amounts). Also, the COUNTIF function
is useful to find the number of values that are less than or over a certain amount. It works like
this:

e.g COUNTIF(B2:B5,">55")
This will count the number of values in rows 2 to 5 that are greater than 55.

Questions about Module 1 from students:

Glossary for Module 1:


In the task there
Module 2: Prepare a detailed approach for completing the
analysis

Guide to module 2
This module is where the data analytics gets tougher! It’s okay to feel overwhelmed with what
they ask for in this module - it reflects what happens in the real world well. (Sometimes
managers just 10x the difficulty of tasks without thinking about it twice haha).

In essence, this task is asking you map out ​HOW you will find the next 100 customers to
prioritise for the client. The question of ​HOW requires you to look at process of analysing the
data, what data you need and how you’ll present to a client.

In the task, they mention “Data Exploration; Model Development and Interpretation” - we’ll get to
that after some of the starting tasks that you should do.

At the end, you want to create a presentation that comforts your clients about how you’ll find the
next 100 clients, so that they can feel comfortable about what you’re about to do (and what
they’re essentially paying you for).

To ​get to “HOW” you have to do some analysis yourself and ​start to form an opinion on how
to find the top 100 customers​. Recall, consultants are hired because they have well-formed
opinions on how to serve and help their clients achieve their goals.

If I were to re-summarise the task into actionable steps for a junior to do this, this is what
I’d say:

1. Given you’ve looked at the data in Module 1, I’d write down some questions about the
data you have in a document or a piece of paper somewhere
a. Questions could include “Is there a relationship between age and number of
purchases? Or “If I added in data about postcode or the average wealth of a
suburb, will i find a correlation with transactions?
b. Take some time to think about relationships that the data might have to the
metric of success here

2. Now you have some questions, your task now is to create a list of what data you need to
answer these questions
a. For example:
i. “Is there a relationship between age and number of purchases” - the data
I need is age of customer, and the transactions they’ve done
ii. “Is there a relationship

3. Now one thing you’ll start to realise now is that, even though you might have the data to
answer - the data isn’t structured right for easy analysis!
a. So this is where you want to modify
b. Huge thing to note: if you’re using excel, its a ​VERY BAD IDEA TO EDIT THE
RAW DATA ITSELF -
c. Huge hint: The easy place to start making useful data is to combine the
transactions data with the customer demographic data.

i. Here’s an example link on how to do that in excel/google sheets:


https://docs.google.com/spreadsheets/d/14S_RNVLBlqbZjC0eeZBt9lYiSa
k3LayC4ly3X8OReJU/edit#gid=2098604483

ii. Working with database software, it’d be a smart idea to JOIN these two
tables with the unique key of customer ID - strongly suggest this method if
you’re going to be handling this data as data analyst would in industry
1. Something like:
a. SELECT *
FROM customer_transactions as transactions
JOIN customer_demographics as demographics ON
transactions.customer_id = demographics._id

iii.

4. The step after this to build a

5. Now, these might seem like a lot of steps, but these steps are necessary to give you an
idea of the data you are working with, and you should start to understand the patterns
that might be in the data

6. Now onto answering ​HOW you’ll help the client find the next customers they should
target.

7. Now if you’re going to recommend a ​method of analysis to a client, you are going to want
to show to them WHY a certain method is better than another method. This is where
“Model Development” and “Interpretation” come in.

a. Now the next question you’re going to ask is “What is a model?”, in the context of
this Virtual Internship. Well, A “Model” is basically a method of you processing
the data to get an outcome. Alot of the times a model will be an algorithm or
approach.
b. Check out this link for some good explanation on what a model is:
https://datascience.stackexchange.com/questions/12909/definition-of-a-model-in-machine-learning
c. the data to get an outcome. A lot of the times a model will be an algorithm or
approach.
d. Here are some links online with lots of data science techniques
i. https://www.datasciencecentral.com/profiles/blogs/40-techniques-used-by
-data-scientists
ii. https://towardsdatascience.com/the-10-statistical-techniques-data-scientis
ts-need-to-master-1ef6dbd531f7
iii. https://deparkes.co.uk/2017/01/13/top-10-data-science-techniques/
e. Look at “​Techniques and processes to analyse the data​” to find potential
techniques and a guide for them (when this is made)

Now given the

Student Questions about module 2

Glossary for Module 2:


Module 3: Data Insights and Presentation
<Feel free to add notes here suggestions - we’ve written module 2 up first>

Guide to module 3

Student questions about module 3

Glossary for Module 3:


Software and tools to analyse data
<Will write up a more detailed explanation - if you have suggestions that’ll be great>
Excel, Power BI, Tableau, Python (with all the math packages like numpy, matplotlib etc),
Matlab

Microsoft Excel/Google Sheets

Tableau

Python Data Science Stack: (Python/Numpy/Scipy/Matplotlib)


Techniques and processes to analyse the data
Generally, these are listed in the order of technical difficulty to do.

Pivot Tables

Regression Analysis

Excel tutorial for regression on the document


Clustering
Discriminant Analysis
Decision Tree

More coming soon


Questions from the community
“For the analysis in Module 2, do we need to convert some of the qualitative data into quantitative
data? If yes, any suggestion on how should I do this?

Thank you! I have ideas to convert some of the fields, for example, I am planning to use the distance
to CBD for postcode and address and probably find the average salary for each job industry
category. The confusing field for me is the wealth segment. I don't really get how can I convert this to
quantitative data. Thank you”
Recruitment and landing a job at KPMG or in Data Analytics
Soft Skills suggested for when firms are looking to hire data
analysts
List of Contributors

This crowdsourced guide for the KPMG Virtual Internship wouldn’t be anything without the
comments, notes and docs written by people in the group.

Content Contributors
● David Kontrobarsky
● Alex Luo
● Jez Grunfeld
● Andrew Snow

Comment Contributors
● Shan Baiyi Yang

Question Contributors

Potrebbero piacerti anche