(Big) Data Projects need a cross-functional and comprehensive approach to exploit the value hidden in small and huge datasets.
This presentation is aimed at providing an open canvas tool to focus the specific project requirements (in term of resources, skills, procedures) in order to avoid common mistakes and to maximize “success” rate.
3. AGENDA
i. A few words about us (TOP-IX & BIG DIVE course).
ii. BIG DATA opportunities, beyond the buzzword.
iii. Open challenges in applied Data Science.
iv. A canvas “Ring” to rule them all…
4. 80+ Members
(15 in 2003)
NON PROFIT
CONSORTIUM
PUBLIC & PRIVATE
PARTICIPATION
MISSION
TO FOSTER
INNOVATION
BY LEVERAGING
INFRASTRUCTURE
ASSETS
EDUCATION
START-UP
CORPORATE
INNOVATION
CIVIC TECH
FUNDED
PROJECTS
IX NORTH-WEST
ITALY
DP
7 collaborators
16 employees
2 directors
TOP-IX CONSORTIUM
10. WHAT’S “NEW” ABOUT DATA
DATA
SKILLS TO EXTRACT
INFORMATION ARE
NOW MORE
ACCESSIBLE
INFRASTRUCTURE AS
A COMMODITY
/ Cloud
/ HPC & HPN
/ Frameworks
CULTURE & APPROACH
/ Complexity science
/ Network thinking
/ Open Innovation
DATA AVAILABILITY
/ Exponential growth
/ Machine VS human
/ Structured VS
un-structured
11. BIG DATA + ML = The NEW STACK
Big Data technologies are used to
handle core data engineering
challenges, and machine learning is
used to extract value from the data.
12. COMMON OPEN CHALLENGES
/THE DATA
/THE SKILLS
/FROM PROTOTYPE TO…
/THE RESULTS INTERPRETATION AND
THE EXPLAINABILITY ISSUE
/“GREY ZONES” IN DATA EXPLOITATION
/THE PURSUIT OF INNOVATION
14. DATA REMAINS THE STARTING POINT
Metadata
Features Selection
Refers to the process of extracting useful
information (or features) from existing data.
“Data” that provides information about other
data.
{Descriptive, Structural, Administrative}
Volume
The effective amount of usable data.
No a-priori objective parameters.
On field validation is required.
15. ABOUT FEATURES…
FROM SOURCE DATA
TO RELEVANT DATA
Noisy or redundant data
makes it more difficult to
discover meaningful patterns.
High-dimensional dataset
requires more complex
models/algorithms and more
computational power.
Features “reduction” Data augmentation
Enriching existing data
with open data or through
third-party data providers.
31. Project name: Designed by: Date: Version:
(D
ata)output
Data
Infrastructure
GOAL(S)
SKI
LLS
PROCESSO
VALORIZZAZI
O
NE
TOO
LS
Data
input
Implementation
T
uning
Interpretation
VAL
UE
Execution
Planning
Data
strategy
Data skills
O
therskills
Benchmark
Metrics
Budget & timing
Outsourcing
Data
governance
Exploration
Hypothesis
Datapreparation
Dataprocessing
Validation
Iteration
Accessibility
Format
Metadata
Features
DatalakeFramework
Storage
Computing
Coding
Dataengineering
(Applied)Datascience
Dataviz
Business
Legal
Social science
Sector expertise
PRO
C
ESS