Sei sulla pagina 1di 6

I'm self-taught analyst who has worked in various analytical roles, and I'm currently in a role at Zynga in which

data analytics is a critical part of my daily decision making. Two years ago, I wrote an article on how someone should teach themselves business intelligence skills. It's dated, opinionated, (and long), but the Excel and SQL sections are still relevant today: http://john.marsland.org/blog/bu... If you want to learn how to be a data analyst, follow the following steps 1. Master Microsoft Excel 2. Learn Basic SQL 3. Learn Basic Web Development 4. Dive into a Concentration

Here's how you should start: 1. Master Microsoft Excel First and foremost, master Microsoft Excel. Excel is the most versatile and common business tool for data analytics. While many data analysts graduate to other functional specific paths+tools (data-mining, visualization, statistical-applications, etc), almost all paths start with and likely still use Excel. Start by learning the basic navigational components and concepts (workbooks, worksheets, formula bar,ribbon). Learn a couple of basic formulas (if, vlookup, text, date) and then graduate to the more powerful formulas (sumproduct,getpivotdata,match/index). As you begin to get more comfortable, begin mastering the keyboard shortcuts. Start by learning how to navigate within a workbook/between workbooks. Then learn the shortcuts for formatting, inserting charts/tables, hiding/unhiding/grouping columns/rows. Biased Note: If possible, you should learn Excel using a Windows operating system. The Mac OS version is limited by design and doesn't allow you to learn the traditional shortcuts, which will slow you down substantially. You know you've learned enough shortcuts when you can perform 80% of the tasks you need by only using the keyboard instead of the mouse. Learn the Data->Pivot->Presentation method for designing scalable Excel templates. This article has a good introduction: http://www.databison.com/index.p... Learn how to build different models/presentations for different analytics applications. Build a model for your Fantasty Sports team. Download a financial statement, and try to predict the next quarter's revenues. Download data from the US Census (http://www.census.gov/main/www/a...) and learn more about demographic profile of the US. Excel is good at most analytical tasks as a generalist tool, but not great with any tasks. However, concentrating on Excel will expose you to various analytics concepts which you can later master in other applications. If you have questions, watch the videos or post in the forums at MrExcel. It's the Quora/Stackexchange for Excel

2. Learn Basic SQL Excel allows you to slice and dice data, but it assumes you have the data readily available. As you become a more seasoned analyst, you'll find the best way to get at data is to pull it directly from the source, and that often requires pulling data from a Relational Database which likely support some derative of SQL.

You should master SQL next. Here's a general guide

Buy a book, find a good web tutorial (Try W3Schools for a light tutorialhttp://www.w3schools.com/sql/sql... or Big Data University for a more involved one: http://bigdatauniversity.com ), or ask an analyst friend to show you the basics for an hour. I showed a friend last week--you'll pick it up quickly. Skip all the stuff about tables and go straight to learning how to pull data. Learn the big 6 reserved keywords: SELECT FROM WHERE GROUP BY HAVING ORDER BY Following that, learn how to join to other tables. Know the difference between an inner and outer join Then take a deep dive and learn the concepts behind Relational Databases. You should know why databases have IDs/Keys, the difference between a fact and a dimension, why indexes are useful, and at least remembered reading the about the 1st/2nd/3rd Normal Forms If you like the design stuff, pick up a copy of Kimball's the Data Warehouse ToolKit http://www.amazon.com/The-Data-W... Great overview to Dimensional Modeling with an emphasis on different vertical. Then breeze thru Kimball's the ETL toolkit afterward. Graduate to learning how to create temporary tables and indexes. Then create a view and move on to figuring out how to create, insert, and update tables. If you're really hungry, download a copy of MySQL Community Server, setup a database server for yourself, and go at it.

3. Learn Basic Web Development This may seem like an odd-ball for #3, but it's a natural next step and an added bonus (or requirement), especially if you want to work at consumer internet companies. Knowing how to read--or at least becoming vaguely familiar with--common web technologies/languages/concepts such as HTML and Javascript will enable you become a better analyst in a world that is becoming more web based. Get a wordpress blog, and mess around with it a bit. Add some google analytics tracking to it. Learn about tracking pixels, server-side tags, and get v post. 4. Dive into a Concentration If you nail #1 and #2 and get exposed to the #3, you will have learned the foundations of a basic data analyst. There are a ton of paths to choose once you've nailed the basics. Each of these have their own set of technologies, tools, and careers. A few highlighted below - Collection+Storage. Focus is on optimal methods to collect, store, and make accessible data for various applications. Could mean learning unix, web servers, and regular expressions for mining log files. Could mean learning how to design a star-schema, creating a NoSQL database, as well as determining the optimal solution for inserting, updating, deleting, and extracting data.

- Analytics. Focus here is on learning how to better slice and dice data. Could mean learning Excel VBA for automation. Could mean picking up a tool for better visualization (Tableua) of data or statistical analysis (R,SPSS,SAS). - Presentation. Focus is on data presentation. Dashboards, Reports, Alerts, Data Tables--you name it. You could learn how to use tools that are made for visual analytics (such as Tableua), focus on developing catchy infographics, use an existing SaaS tool for distribution visuals, or dive into programming and create your own set of visuals using jquery/google charts api. As Anon User wrote, the Edward Tufte books are a fantastic way to get lightly exposed to this concentration - Programming. SQL is a declarative language--you tell the query engine what you want and the engine figures out the rest. Most other programming languages are procedural languages--you tell them how to get what you want. The latter is a lot harder to learn, but at some point, you may want to do more than just write SQL code to influence your analytics - whether it be for back-end or front-end analytics applications. If you've had no formal programming experience, take a class. If you know some basics or can simply geek out with the best of 'em, try your luck at one of the more commonly available programming languages. Here are some to consider - Excel VBA. Old, but if you just want to automate spreadsheets, there's no lower friction way to start. - PHP. Dead simple. Tons of examples. Used all over the web - Python. Second to PHP. Beautiful. Increasing more common for scripting, although web frameworks are less common than PHP - Javascript. Plenty of examples, and great if you want to do front-end visualizations. Math/Statistics. Finding patterns in data relies on at least a rudimentary understanding of statistics. Becoming a seasoned data-minor, developing sophisticated forecasting/stock-trading/bidding heuristics will require much more. If you're interested in this component, mess around with R (or SAS/SPSS) to start. - Enterprise. There are plenty of companies that have made top dollars selling full-fledged Enterprise reporting solutions. Cognos, Business Objects, Microstrategy, and Hyperion are a few. You can have a well-paid career mastering these tools and consulting for companies that need them maintained. These tools address all parts of the data analyst spectrum, but increasingly are becoming less common in consumer internet data analytics because they're expense and require teams to maintain them. Lastly, as you become a more seasoned analyst, don't ignore the "business" component. Business Intelligence is not a technical problem. It's a data socialization problem. How do you get the right information to the right decision makers (human or machine) at the right time to affect a desired outcome? Data Hackers add value to a business by surfacing information that turns into business outputs, and the better you are exposed to a particular business, the more empowered you will be to affect the end product.

When I was on the analytics team at Airbnb (2011-2012), we looked for the following: 1. Research design/methodology - Ability to set up experiments properly, with careful attention to control groups and confounding variables - Knowledge of basic statistics techniques and concepts (regressions, ttests, significance, etc) - Ability to delve into open ended problems and find trends in huge sets of data - Understanding of all the caveats and complications of research without getting so bogged down in them that it takes months to get results 2. Tools to manipulate data (programming ability, sql, statistics tools, etc) - Python, Ruby, or another similar programming language - R, STATA, SAS, or some other statistical programming language for analyzing data - SQL or similar querying/manipulation language, understanding of fairly complex joins, nested queries, etc - Excel can be useful but details can probably be learned as needed (personally, I don't think I've ever used a pivot table in my job because I use other tools to combine data) - Hive, Hadoop, etc. are really useful, albeit not essential for getting hired (but would mean a lot more than detailed knowledge of Excel, which I would assume any smart person could pick up as needed) 3. Ability to interpret and summarize results broadly for technical and non-technical audiences 4. Any other special skills, such as data visualization, machine learning, advanced statistical techniques, etc. At ClassDojo, we are at an earlier stage, and the data is in JSON instead of SQL, so there is even more need for everyone to program. I spend 90% of my time getting data in place in Python before I can do any actual analysis or visualization (in Python or R).

I'd like to add four things that really helped me getting the right data and the right analysis: get familiair and socialize with the people that actually own and manage the data. In most cases these are database administrators (DBAs) or support engineers of all sorts of databases and systems. I work for a large company with many data stores, most of them with authorization required. I needed all kinds of permissions to get access to the data and it helped a lot to talk to the DBAs, hang around a bit and explaining to them what I do and why I need the information. Eventually it was much easier to get access to the right data, reports or queries. In most occasions you should focus not only on the answer or evidence you were looking for but also to find new insights. Browsing through all this data, what can you learn from it? Are there connections between data sources that can help you spot new trends or gain better insights? Just take some extra time to dive deeper into the pile of data. This will create additional value for your role and guard you from being the upper management's reporting robot :-) Visualization is essential. Data is just data. Gathering insights is what matters and insights will have a higher impact when they are presented properly. When you are preparing for a big board presentation and you have a designer in your team or company make sure you involve him or her in the visualization of your data. Designers usually like these kind of small favours. Some people mentioned programming tools to work with data. I use small Unix tools such as 'awk', 'sed', 'grep' etc a lot to prepare the data before importing it into Excel. They are easier to install and work with than most programming languages. They work on Windows and Macs as well.

And if you'd like to read more on data analysis then I also recommend Ian Ayres' book 'Super crunchers'

There are two classes of skills that a successful data analyst has: both soft and technical skills. The core work flow for a data analyst is several fold. Once a problem has been defined, and a hypothesis is to be tested, the data must be drawn out and then analyzed. The resulting analysis is written up and communicated to the interested stake holder. In order to do this there are several hard and soft skills that are required. Technical Skills: 1. A basic knowledge of statistics to a rigorous understanding of Machine Learning. Most consumers of analysis will not look at more than descriptive analysis (means, medians, significance). 2. Computer skills that are useful are a Querying Language (SQL,Hive,Pig), a scripting Language (Python,Matlab), a Statistical Language (R, SAS, SPSS), and a Spreadsheet (Excel). Soft Skills 1. Defining the problem and narrowing the analysis down often requires a lot of soft skills. Balancing the demands on your time to reduce infinite what-if scenarios and understanding the requestors needs requires good communication and understanding of the business needs. Avoid agreeing to delivering too much information that will be not useful to solving the core issues. 2. Knowing the audience. There is a different presentation required for a PM or a CEO. As a Data Analyst, you will be often required to answer to both. A typical PM will want a more collaborative interaction with more scenarios spelled out and a less polished presentation. A CEO will often be looking for a specific recommendation in a small polished presentation. 3. Delivery. Having a wonderfully accurate predictive model, that has been backtested to deliver a low RMSE, or an AB test that can increase conversion 15% without reducing sales price are all great results. However, without a great presentation key findings may be left out of product road maps and in the backlog for months or years.