2. Different Profiles of Analytics
DATA SCIENCE
• Data scientist, chief scientist, senior analyst, director of analytics, Etc.
• Industries like Digital analytics, search technology, marketing, fraud detection,
astronomy, energy, Healthcare, social networks, finance, forensics, security
(NSA), mobile, telecommunications, weather forecasts, and fraud detection.
• Projects Taxonomy creation (text mining, big data), clustering applied to big
data sets, recommendation engines, simulations, rule systems for statistical
scoring engines, root cause analysis, automated bidding, forensics, exo-planets
detection, and early detection of terrorist activity or pandemics.
• Main components are Machine to Machine communications, Automation.
• Overlaps with Computer Science, Statistics, Machine Learning, Data Mining,
Operational Research, Business Intelligence.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
3. Different Profiles of Analytics
MACHINE LEARNING - Very popular computer science
discipline
• Part of data science and closely related to Data Mining. Machine learning is
about designing algorithms (like data mining), but emphasis is on prototyping
algorithms for production mode, and designing automated systems
• Python is now a popular language for ML development Projects.
• Core algorithms include clustering and supervised classification, rule systems,
and scoring techniques
• A sub-domain, close to Artificial Intelligence is deep learning.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
4. Different Profiles of Analytics
DATA MINING
• Designing algorithms to extract insights from rather large and potentially
unstructured data (text mining), sometimes called Nugget Discovery.
• Techniques include pattern recognition, feature selection, clustering,
supervised classification and encompasses a few statistical techniques.
• Data mining thus have some intersection with statistics, and it is a subset of
Data science.
• Data miners use open source and software such as Rapid Miner.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
5. Different Profiles of Analytics
PREDICTIVE MODELING
• Predictive modeling projects occur in all industries across all disciplines.
• Aim at predicting future based on past data, usually but not always based on
statistical modeling.
• Predictions often come with confidence intervals.
• Roots of predictive modeling are in statistical science.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
6. Different Profiles of Analytics
STATISTICS
• Loosing ground to data science, industrial statistics, operations research, data
mining, machine learning -- where the same clustering, cross-validation and
statistical training techniques are used, albeit in a more automated way and on
bigger data.
• Many professionals who were called statisticians 10 years ago, have seen their
job title changed to data scientist or analyst in the last few years.
• Modern sub-domains include statistical computing, statistical learning(closer to
machine learning), computational statistics(closer to data science), data-driven
(model-free) inference, sport statistics, and Bayesian statistics
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
7. Different Profiles of Analytics
INDUSTRIAL STATISTICS
• Statistics frequently performed by non-statisticians (engineers with good
statistical training), working on engineering projects such as yield optimization
or load balancing (system analysts). They use very applied statistics, and their
framework is closer to six sigma, quality control and operations research, than
to traditional statistics.Also found in oil and manufacturing industries.
• Techniques used include time series, ANOVA, experimental design, survival
analysis, signal processing(filtering, noise removal, deconvolution), spatial
models, simulation, Markov chains, risk and reliability models.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
8. Different Profiles of Analytics
ACTUARIAL SCIENCES
• Just a subset of statistics focusing on insurance (car, health, etc.)
• using survival models: predicting when you will die, what your health
expenditures will be based on your health status (smoker, gender, previous
diseases) to determine your insurance premiums.
• Also predicts extreme floods and weather events to determine premiums.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
9. Different Profiles of Analytics
HPC
• High performance computing, not a discipline per se, but should be of concern
to data scientists, big data practitioners, computer scientists and
mathematicians, as it can redefine the computing paradigms in these fields.
• HPC should not be confused with Hadoop and Map-Reduce: HPC is hardware-
related, Hadoop is software-related (though heavily relying on Internet
bandwidth and servers configuration and proximity).
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
10. Different Profiles of Analytics
OPERATIONS RESEARCH(OR)
• It is about decision science and optimizing traditional business projects:
inventory management, supply chain, pricing. They heavily use Markov Chain
models, Monter-Carlo simulations, queuing and graph theory, and software
such as AIMS, Matlab or Informatica.
• Big, traditional old companies use OR, new and small ones (start-ups) use data
science to handle pricing, inventory management or supply chain problems.
• Car traffic optimization is a modern example of OR problem, solved with
simulations, commuter surveys, sensor data and statistical modeling.
• OR has a significant overlap with six-sigma, also solves econometric problems,
and has many practitioners/applications in the army and defense sectors
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
11. Different Profiles of Analytics
ECONOMETRICS
• Econometrics is heavily statistical in nature, using time series models such as
auto-regressive processes.
• Also overlapping with operations research (itself overlapping with statistics!)
and mathematical optimization (simplex algorithm).
• Econometricians like ROC and efficiency curves.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
12. Different Profiles of Analytics
DATA ENGINEERING
• Performed by software engineers (developers) or architects (designers) in large
organizations (sometimes by data scientists in tiny companies)
• A sub-domain currently under attack is data warehousing, as this term is
associated with static, siloed conventational data bases, data architectures, and
data flows, threatened by the rise of NoSQL, NewSQL and graph databases.
• Transforming these old architectures into new ones (only when needed) or
make them compatible with new ones, is a lucrative business
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
13. Different Profiles of Analytics
BUSINESS INTELLIGENCE
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
14. Different Profiles of Analytics
BUSINESS INTELLIGENCE
• Focuses on dashboard creation, metric selection, producing and scheduling data
reports (statistical summaries) sent by email or delivered/presented to
executives, competitive intelligence (analyzing third party data), as well as
involvement in database schema design (working with data architects) to collect
useful, actionable business data efficiently.
• Typical job title is Business Analyst
• some are more involved with marketing, product or finance (forecasting sales and
revenue).
• Some have learned advanced statistics such as time series, but most only use
(and need) basic stats, and light analytics, relying on IT to maintain databases
and harvest data.
• BI and market research(but not competitive intelligence) are currently
experiencing a decline.
• Part of the decline is due to not adapting to new types of data (e.g. unstructured text)
that require engineering or data science techniques to process and extract value
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
15. Different Profiles of Analytics
DATA ANALYTICS
• This is the new term for Business Statistics since at least 1995, and it covers a
large spectrum of applications including fraud detection, advertising mix
modeling, attribution modeling, sales forecasts, cross-selling optimization
(retails), user segmentation, churn analysis, computing long-time value of a
customer and cost of acquisition, Etc.
• Except in big companies, data analyst is a Junior role; these practitioners have
a much more narrow knowledge and experience than data scientists
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
16. Different Profiles of Analytics
BUSINESS ANALYTICS
• Same as data analysis, but restricted to business problems only.
• Tends to have a bit more of a financial, marketing or ROI flavor.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
17. DATA SCIENTIST vs. DATA ANALYST
Resource: http://www.edureka.co/blog/difference-between-data-scientist-and-data-analyst/
18. DATA SCIENTIST vs. DATA ANALYST
• “Data Analyst” focuses on the movement and interpretation of
data, typically with a focus on the past and present.
• Alternatively, a “Data Scientist” may be primarily responsible for
summarizing data in such a way as to provide forecasting, or an
insight into future based on the patterns identified from past and
current data.
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
19. DATA SCIENTIST vs. DATA ANALYST
CRISP-DM Process Model
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
20. DATA SCIENTIST vs. DATA ANALYST
• Business understanding – Determine Business Objectives, Assess Situation,
Determine Data Mining Goals, Produce Project Plan
• Data understanding – Collect Initial Data, Describe Data, Explore Data,Verify
Data Quality
• Data preparation – Select Data, Clean Data, Construct Data, Integrate Data
• Modeling – Select modeling technique, Generate Test Design, Build Model, Assess
Model
• Evaluation - Evaluate Results, Review Process, Determine Next Steps
• Deployment – Plan Deployment, Plan Monitoring and Maintenance, Produce
Final Report, Review Project
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
21. DATA SCIENTIST vs. DATA ANALYST
• A Data Scientist is often heavily involved in the cleaning and manipulation
of data to support their modeling needs as well as the building and
evaluating of model designs which are intended to help guide changes in
business decisions.
• On the other hand, a Data Analyst may spend their time exploring data to
support troubleshooting efforts or to generate ideas for useful reports to
pitch to the customer.
• In general, while Data Analysts tend to be more Business focused, Data
Scientists are often Mathematically focused.
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
22. DATA SCIENTIST vs. DATA ANALYST
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
23. DATA SCIENTIST vs. DATA ANALYST
• Data Analysts typically perform data migration and visualization roles that
focus on describing the past; while Data Scientists typically perform roles
manipulating data and creating models to improve the future.
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
24. DATA SCIENTIST vs. DATA ANALYST
• Situation:
• A large provider of streaming entertainment and data services wants to
improve call center performance and extract tactical business value from
call center data
• Project Domain:
• Logged performance data from the firm’s proprietary hardware platform
and call center data tied to specific customers and device IDs
• In the above case, the Data Analyst and Scientist would both use data
but, in different ways. The Data Analyst would be concerned with
reporting metrics, such as average call time; while the Data Scientist
would be concerned with using the historical data to predict the future,
such as predicting future months call volumes. Both roles are equally
important to the operation of the call center and help find solutions for
the center to run smoothly. The figure below details some solutions each
role creates.
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
26. Current Analytics Scenario
•2013
• Significant rise in Data Analytics and Big Data initiatives across all sectors
in India. Finance, telecom, ecommerce and retail sectors showed higher
investments in data related tools and technologies.
• Industries like healthcare, auto and manufacturing also increased their
data related spend.
• Hiring picked up considerably but the demand supply lag was still
considerable considering the sheer lack of availability of trained analytics
professionals
28. So…..
Tech
Term
Stat/
OR
Source of
Data
Size of
data
Typical Software
DataAnalysis May be Manual
Usually
Small
SPSS, SYSTAT etc.
Business
Intelligence
No
Business
process
Usually
Large
Business Objects, Micro
strategy, Pentaho etc.
Data Mining May be
Business
process
Very
Large
Clementine, e-miner etc.
Analytics Yes
Business
process
Large SAS, R etc.
• Hence, a hallmark of Analytics is application of statistics/OR in industry
setting using business process data.
• SAS is the widely used tool. R (open source) is FAST growing
30. Current Analytics Scenario
•2013 – Key Facts
• Though SAS remained a popular tool, trends show that R has been
increasing its share of the market. Several analytics companies use both
SAS and R, depending on project and client preferences.
• Big Data tools Hadoop and Map Reduce were industry favourites.
Companies are investing in building this capability , but are not using it
effectively yet.
• Hadoop skills were in demand coupled with SAS and R.
• Web analytics, text analytics and social media analytics began to gain
popularity
31. Current Analytics Scenario
•2014
• Demand for analytics is currently driven by businesses and
organizations that want to up-skill their employees. We predict that
in 2014, MBA colleges will place more emphasis on analytics as part
of their regular curriculum to cater to the demand-supply gap for
analytics professionals.
• A number of specialized analytics courses (such as the ones offered
by the Great Lakes Institute of Management, Chennai and the
Indian School of Business , Hyderabad) have already gained
popularity in 2013. We predict that many more such courses will be
launched in 2014.
36. Current Analytics Salary
• Average entry level salaries have increased 27% since 2013, from Rs.
520 thousand to Rs 660 thousand per annum.
• Typically, there is a 250% increase in salary from entry level analyst
to manager.
• Managers in analytics command an annual salary upward of Rs 1500
thousand.
• At senior levels, annual salaries are upward of Rs. 2500 thousand
which is more than a 60% increase from a managers salary
37. Let’s Conclusion for today
• Salaries will continue to increase and we will see professionals from
other sectors honing their analytics skills and switching careers.
• The scope of analytics will so permeate India Inc., that 15 years from
now, we predict that those professionals with no analytics and big
data skills will have no scope for growth.
• This seems harsh, but that is how dynamic data analytics is and how
pivotal data backed decision making will become.
38. In God,WeTrust... All Others must bring the "Data"
Srikanth Ayithy
about.me/srikanthayithy