Required Skills
- Microsoft Excel
- Programming Skills
Tools to manipulate data (programming ability, sql, statistics tools, etc)
- Python, Ruby, or another similar programming language
- R, STATA, SAS, or some other statistical programming language for analyzing data
- SQL or similar querying/manipulation language, understanding of fairly complex joins, nested queries, etc
- Hive, Hadoop, etc. are really useful, albeit not essential for getting hired
Resources
Here are some books I've found recommended in various locations. I can't speak to their quality/usefulness, but others have found them helpful. I've added them here for future reference and your convenience:
It starts off with defining data mining in the current business context and then summarizes some of the best practices in data mining.
This book does not deal with any statistical equations or complex algorithms. The book, instead, describes how some of the leading companies in the world are using analytics to out-smart their competition.
Quick Reference
Q: What is SAS?
A: SAS is an integrated system of software solutions that enables you to perform the following tasks: data entry, retrieval, and management. report writing and graphics design. statistical and mathematical analysis.
Q: What is R?
A: R is a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls and surveys of data miners are showing R's popularity has increased substantially in recent years.
Q: What is SQL?
A: SQL is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems.
Q: What is Python?
A: Python is a high-level general-purpose programming language. Find out more information about Python here.
Q: What is Ruby?
A: Ruby is a dynamic, reflective, object-oriented, general-purpose programming language. Find out more information about Ruby here.
Q: What is Hadoop?
A: Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware.
Q: What is Hive?
A: Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
Other, More In-depth Articles
This article talks about the importance of the field and how there is a high demand. The pay and benefits are worth consideration.
This article gives an in-depth curriculum on learning the steps to becoming a Data Analyst.
Finally, this article gives lots of good details about the suggested path you take for a Data Analyst from someone working for a fairly large company.