Because every organization produces and propagates data as part of their day-to-day operations, data trends are becoming more and more important in the mainstream business world’s consciousness. For many organizations in various industries, though, comprehension of this development begins and ends with buzzwords: “Big Data,” “NoSQL,” “Data Scientist,” and so on. Few realize that any and all solutions to their business problems, regardless of platform or relevant technology, rely to a critical extent on the data model supporting them. As such, Data Modeling is not an optional task for an organization’s data effort, but rather a vital activity that facilitates the solutions driving your business.
Instead of the technical minutiae of Data Modeling, this webinar will focus on its value and practicality for your organization. In doing so, we will:
Address fundamental Data Modeling methodologies, their differences and various practical applications, and trends around the practice of Data Modeling itself
Discuss abstract models and entity frameworks, as well as some basic tenets for application development
Examine the general shift from segmented Data Modeling to more business-integrated practices
Discuss fundamental Data Modeling concepts based on “The DAMA Guide to the Data Management Body of Knowledge” (DAMA DMBOK)
Streamlining Python Development: A Guide to a Modern Project Setup
Data-Ed Webinar: Data Modeling Fundamentals
1. Peter Aiken, Ph.D.
Data Modeling Fundamentals
• DAMA International President 2009-2013
• DAMA International Achievement Award 2001 (with
Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
Peter Aiken, Ph.D.
• 33+ years in data management
• Repeated international recognition
• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS (vcu.edu)
• DAMA International (dama.org)
• 10 books and dozens of articles
• Experienced w/ 500+ data
management practices
• Multi-year immersions:
– US DoD (DISA/Army/Marines/DLA)
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart
– … PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
The Case for the
Chief Data Officer
Recasting the C-Suite to Leverage
Your MostValuable Asset
Peter Aiken and
Michael Gorman
Copyright 2018 by Data Blueprint Slide #
4. Data Modeling Approaches
NoSQL
Relaxed Normalization
schema implied by structure
fields may be empty, duplicate, or missing
Relational
Required Normalization
schema enforced by DB
same fields in all records
• Minimize data inconsistencies (one item = one
location)
• Reduced duplicated data
• Preserve storage resources
• Optimized based on access patterns
• Flexible, based on application requirements
• Supports clustered architecture
• Reduced server overhead
6. Couchbase - The Data Platform Architecture
5
COUCHBASE LITE SYNC GATEWAY COUCHBASE SERVER
Lightweight embedded NoSQL database with
full CRUD and
query functionality.
Secure web gateway with
synchronization, data access, and data
integration APIs for accessing,
integrating, and synchronizing data
over the web.
Highly scalable, highly available,
high performance NoSQL
database server.
Client Middle Tier StorageWAN LAN
Security
Built-in enterprise level security throughout the entire stack includes user authentication, user and role based data access control (RBAC), secure transport (TLS),
and 256-bit AES full database encryption.
7. Couchbase Server Cluster Service Deployment
STORAGE
Couchbase Server 1
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed Cache
Cluster
ManagerCluster
Manager
Managed Cache
Storage
Data
Service STORAGE
Couchbase Server 2
Managed Cache
Cluster
ManagerCluster
Manager
Data
Service STORAGE
Couchbase Server 3
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed Cache
Cluster
ManagerCluster
Manager
Data
Service STORAGE
Couchbase Server 4
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed Cache
Cluster
ManagerCluster
Manager
Query
Service STORAGE
Couchbase Server 5
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed Cache
Cluster
ManagerCluster
Manager
Query
Service STORAGE
Couchbase Server 6
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed Cache
Cluster
ManagerCluster
Manager
Index
Service
Managed Cache
Storage
Managed Cache
Storage Storage
STORAGE
Couchbase Server 7
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed Cache
Cluster
ManagerCluster
Manager
Index
Service
Storage
Managed Cache Managed Cache
SDK SDK
Managed Cache
Storage
Managed Cache
Storage
9. Properties of Real-World Data
• Rich structure
• Attributes, Sub-structure
• Relationships
• To other data
• Value evolution
• Data is updated
• Structure evolution
• Data is reshaped
Customer
Name
DOB
Billing
Connections
Purchases
10. Modeling Data in Relational World
Billing
ConnectionsPurchases
Contacts
Customer
Rich structure
Normalize & JOIN Queries
Relationships
JOINS and Constraints
Value evolution
INSERT, UPDATE, DELETE
Structure evolution
ALTER TABLE
Application Downtime
Application Migration
Application Versioning
12. Flexibility from JSON
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2842-2847-3909",
"expiry" : "2019-03"
}
],
"address" :
{
"Street" : "10, Downing Street",
"City" : "San Francico",
"State" : "California",
"zip" :94401
}
}
• Document is self describing
• Fields can be added or can be missing
• Data types can change
• Arrays give you flexibility in number of
items in an attribute
13. Using JSON to Store Data
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2842-2847-3909",
"expiry" : "2019-03"
}
],
"Connections" : [
{
"CustId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 }
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
CustomerID Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
CBL2015 master 6274… 2018-12
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam Smith
CustomerID item amt
CBL2015 mac 2823.52
CBL2015 ipad2 623.52
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam Smith
Contacts
Customer
Billing
ConnectionsPurchases
14. Models for Representing Data
Data Concern Relational Model
JSON Document Model
(NoSQL)
Rich Structure
Multiple flat tables
Constant assembly / disassembly
Documents
No assembly required!
Relationships
Represented
Queried (SQL)
Represented
N1QL (support ANSI JOIN)
Value Evolution Data can be updated Data can be updated
Structure Evolution
Uniform and rigid
Manual change (disruptive)
Flexible
Dynamic change
15. !3Copyright 2018 by Data Blueprint Slide #
Data Modeling Fundamentals
• Data Management Overview
• Motivation
– of Systems/components
– Data is a not well understood substructure
• Why data modeling & what is it?
– Model represents our understanding of the
– Fundamental, foundational system
characteristics
– Shared between system and human
• Fundamentals
– The power of the purpose statement
– Understanding data centric thinking
– Data modeling compliments other architecture/
engineering techniques, as well as
– Challenges beyond data modeling
• Take Aways, References & Q&A
UsesUsesReuses
What is data management?
!4Copyright 2018 by Data Blueprint Slide #
Sources
Data
Engineering
Data
Delivery
Data
Storage
Specialized Team Skills
Data Governance
Understanding the current
and future data needs of an
enterprise and making that
data effective and efficient in
supporting
business activities
Aiken, P, Allen, M. D., Parker, B., Mattia, A.,
"Measuring Data Management's Maturity:
A Community's Self-Assessment"
IEEE Computer (research feature April 2007)
Data management practices connect
data sources and uses in an
organized and efficient manner
• Engineering
• Storage
• Delivery
• Governance
When executed,
engineering, storage, and
delivery implement governance
Note: does not well-depict data reuse
16.
What is data management?
!5Copyright 2018 by Data Blueprint Slide #
Sources
Data
Engineering
Data
Delivery
Data
Storage
More Specialized Team Skills
Resources
(optimized for reuse)
Data Governance
AnalyticInsight
!6Copyright 2018 by Data Blueprint Slide #
17. You can accomplish
Advanced Data Practices
without becoming proficient
in the Foundational Data
Management Practices
however this will:
• Take longer
• Cost more
• Deliver less
• Present
greater
risk
(with thanks to Tom DeMarco)
Data Management Practices Hierarchy
Advanced
Data
Practices
• MDM
• Mining
• Big Data
• Analytics
• Warehousing
• SOA
Foundational Data Management Practices
Data Platform/Architecture
Data Governance Data Quality
Data Operations
Data Management Strategy
Technologies
Capabilities
Copyright 2018 by Data Blueprint Slide # !7
DMM℠ Structure of
5 Integrated
DM Practice Areas
Data architecture
implementation
Data
Governance
Data
Management
Strategy
Data
Operations
Platform
Architecture
Supporting
Processes
Maintain fit-for-purpose data,
efficiently and effectively
!8Copyright 2018 by Data Blueprint Slide #
Manage data coherently
Manage data assets professionally
Data life cycle
management
Organizational support
Data
Quality
18. Data Strategy is often
the weakest link
Data architecture
implementation
Data
Governance
Data
Management
Strategy
Data
Operations
Platform
Architecture
Supporting
Processes
Maintain fit-for-purpose data,
efficiently and effectively
!9Copyright 2018 by Data Blueprint Slide #
Manage data coherently
Manage data assets professionally
Data life cycle
management
Organizational support
Data
Quality
3 3
33
1
Data Management
Body of
Knowledge
!10Copyright 2018 by Data Blueprint Slide #
Data
Management
Functions
20. Data
Architecture
and
Data Models
!13Copyright 2018 by Data Blueprint Slide #
http://www.architecturalcomponentsinc.com
• Architecture is higher level of abstraction
– Understanding/integration focused
• Models more downward facing
– Implementation/detail focused
Models are literally the translation
between systems and people
!14Copyright 2018 by Data Blueprint Slide #
Data Modeling Fundamentals
• Data Management Overview
• Motivation
– of Systems/components
– Data is a not well understood substructure
• Why data modeling & what is it?
– Model represents our understanding of the
– Fundamental, foundational system
characteristics
– Shared between system and human
• Fundamentals
– The power of the purpose statement
– Understanding data centric thinking
– Data modeling compliments other architecture/
engineering techniques, as well as
– Challenges beyond data modeling
• Take Aways, References & Q&A
21. Data Models are about ...
• Things that someone cares
to keep information about
– Entities: persons, places, things
• The characteristics of the things
– Attributes: color, size, sequence
media code, product descriptions, quantity ordered
• How the entitles interact
– Relationships: accomplished
by cooperating (sharing key
information)
An order is placed by one
and only one customer
!15Copyright 2018 by Data Blueprint Slide #
What do we teach knowledge workers about data?
!16Copyright 2018 by Data Blueprint Slide #
What percentage of the deal with it daily?
22. What do we teach IT professionals about data?
!17Copyright 2018 by Data Blueprint Slide #
• 1 course
– How to build a
new database
• What
impressions do IT
professionals get
from this
education?
– Data is a technical
skill that is needed
when developing
new databases
• Slender, elegant and graceful
• World's 3rd longest suspension span
• Opened on July 1st, collapsed in a windstorm on
November 7,1940
• "The most dramatic failure in
bridge engineering history"
• Changed forever how engineers
design suspension bridges leading
to safer spans today.
Tacoma Narrows Bridge/Gallopin' Gertie
!18Copyright 2018 by Data Blueprint Slide #
23. !19Copyright 2018 by Data Blueprint Slide #
Similarly data failures cost organizations
minimally 20-40% of their IT budget
Repeat 100s, thousands, millions of times ...
!20Copyright 2018 by Data Blueprint Slide #
24. Death by 1000 Cuts
!21Copyright 2018 by Data Blueprint Slide #
• How does maltreated data cost money?
• Consider the opposite question:
– Were your systems explicitly designed to
be integrated or otherwise work together?
– If not then what is the likelihood that they
will work well together?
• Organizations spend 20-40% of their IT
budget evolving data - including:
– Data migration
• Changing the location from one place to another
– Data conversion
• Changing data into another form, state, or product
– Data improving
• Inspecting and manipulating, or re-keying data to prepare it for
subsequent use - John Zachman
Lack of data coherence is a hidden expense
!22
PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
Copyright 2018 by Data Blueprint Slide #
25. Bad Data Decisions Spiral
!23Copyright 2018 by Data Blueprint Slide #
Bad data decisions
Technical deci-
sion makers are not
data knowledgable
Business decision
makers are not
data knowledgable
Poor organizational outcomes
Poor treatment of
organizational data
assets
Poor
quality
data
!24Copyright 2018 by Data Blueprint Slide #
Data Modeling Fundamentals
• Data Management Overview
• Motivation
– of Systems/components
– Data is a not well understood substructure
• Why data modeling & what is it?
– Model represents our understanding of the
– Fundamental, foundational system
characteristics
– Shared between system and human
• Fundamentals
– The power of the purpose statement
– Understanding data centric thinking
– Data modeling compliments other architecture/
engineering techniques, as well as
– Challenges beyond data modeling
• Take Aways, References & Q&A
26. How much data,
by the minute!
For the entirety of 2017,
every minute of every day:
• (almost) Seventy
thousand hours of Netflix
• (almost) a half million
tweets
• 15+ million texts
• 3.5+ million google
searches
• 103+ million email spams
!25Copyright 2018 by Data Blueprint Slide #
https://www.domo.com/learn/data-never-sleeps-5
!26Copyright 2018 by Data Blueprint Slide #
As articulated by Micheline Casey
There will
never be less
data than
right now!
27. USS Midway
& Pancakes
What is this excellent
engineering example?
• It is tall
• It has a clutch
• It was built in 1942
• It is still in regular use!
!27Copyright 2018 by Data Blueprint Slide #
You cannot architect after implementation!
!28Copyright 2018 by Data Blueprint Slide #
31. Families of Modeling Notation Variants
!35Copyright 2018 by Data Blueprint Slide #
Eventually One, More
Eventually One
Exactly One
Zero, or More
One or More
Zero or One
Information Engineering
Pick one!
What is a Relationship?
• Natural associations between two or more entities
!36Copyright 2018 by Data Blueprint Slide #
32. Ordinality & Cardinality
• Defines mandatory/optional relationships using minimum/
maximum occurrences from one entity to another
!37Copyright 2018 by Data Blueprint Slide #
An order is
placed by one
and only one
customer
A customer
places zero
or more
orders
A product is contained on zero
or more orders
An order
contains at least
one or more
products
Q: What is the proper relationship for these entities?
!38Copyright 2018 by Data Blueprint Slide #
33. A: a relationship for these entities
!39Copyright 2018 by Data Blueprint Slide #
Eventually One, More
Eventually One
Exactly One
Zero, or More
One or More
Zero or One
Q: What is an Attribute?
!40Copyright 2018 by Data Blueprint Slide #
34. A: Attribute Definition
• Attributes describe an entity and attribute values describe
“instances of business things”
!41Copyright 2018 by Data Blueprint Slide #
Rigid Data Structure
!42Copyright 2018 by Data Blueprint Slide #
Person Job Class
Position
BR1) One EMPLOYEE
can be associated with one
PERSON
BR2) One EMPLOYEE can be
associated with one POSITION
Manual
Job Sharing
Manual
Moon Lighting
Employee
35. Flexible data structure
!43Copyright 2018 by Data Blueprint Slide #
Person Job Class
Employee Position
BR1) Zero, one, or more
EMPLOYEES can be associated
with one PERSON
BR2) Zero, one, or more EMPLOYEES
can be associated with one POSITION
Job Sharing
Moon Lighting
Everyone Shares Understanding
!44Copyright 2018 by Data Blueprint Slide #
Data structures must be specified prior
software development/acquisition
(Requires 2 structural loops more
than the more flexible data structure)
More flexible data structure Less flexible data structure
36. Understanding
• Definition:
– 'Understanding an architecture'
– Documented and articulated as a digital blueprint
illustrating the
commonalities and
interconnections
among the
architectural
components
– Ideally the understanding
is shared by systems and humans
!45Copyright 2018 by Data Blueprint Slide #
Modeling Procedures
1. Identify entities
2. Identify key for each
entity
3. Draw rough draft of
entity relationship
data model
4. Identify data
attributes
5. Map data attributes
to entities
!46Copyright 2018 by Data Blueprint Slide #
37. Models Evolution is good, at first ...
!47Copyright 2018 by Data Blueprint Slide #
Preliminary
activities
Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
Preliminary
activities
Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
Preliminary
activities
Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
Preliminary
activities
Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
Relative use of time allocated to tasks during Modeling
Preliminary
activities
Modeling
cycles
Wrapup
activities
Evidence
collection &
analysis
Project
coordination
requirements
Target
system
analysis
Modeling
cycle
focus
Activity
Refinement
Collection
Analysis
Validation
Declining coordination requirements
Increasing amounts of targetsystem analysis
!48Copyright 2018 by Data Blueprint Slide #
38. Don’t Tell Them You Are Modeling!
!49
• Just write some stuff down
• Then arrange it
• Then make some appropriate
connections between your
objects
Copyright 2018 by Data Blueprint Slide #
!50Copyright 2018 by Data Blueprint Slide #
Data Modeling Fundamentals
• Data Management Overview
• Motivation
– of Systems/components
– Data is a not well understood substructure
• Why data modeling & what is it?
– Model represents our understanding of the
– Fundamental, foundational system
characteristics
– Shared between system and human
• Fundamentals
– The power of the purpose statement
– Understanding data centric thinking
– Data modeling compliments other architecture/
engineering techniques, as well as
– Challenges beyond data modeling
• Take Aways, References & Q&A
39. Each model has a purpose
!51Copyright 2018 by Data Blueprint Slide #
Data Models are Developed in Response to Organizational Needs
!
!
!
!
!52Copyright 2018 by Data Blueprint Slide #
Organizational Needs
become instantiated
and integrated into an
Data Models
Informa(on)System)
Requirements
authorizes and
articulates
satisfyspecificorganizationalneeds
40. Standard definition reporting does not provide conceptual context
!53Copyright 2018 by Data Blueprint Slide #
Bed
Something you sleep in
Bed
Entity: BED
Purpose: This is a substructure within the room
substructure of the facility location. It
contains information about beds within rooms.
Attributes: Bed.Description
Bed.Status
Bed.Sex.To.Be.Assigned
Bed.Reserve.Reason
Associations: >0-+ Room
Status: Validated
Keep them focused on data model purpose
!54
• The reason we are locked in
this room is to:
– Mission: Understand formal
relationship between soda and
customer
• Outcome: Walk out the door with a
data model this relationship
– Mission: Understand the
characteristics that differ
between our hospital beds
• Outcome: We will walk out the door
when we identify the top three traits that
represent the brand.
– Mission: Could our systems
handle the following business
rule tomorrow?
– "Is job-sharing permitted?"
• Outcomes: Confirm that it is possible to
staff a position with multiple employees
effective tomorrow
selects and pays forgiven to
Soda
Customer
selects
can be filled by zero or 1
Employee Position
has exactly 1
How does our
perspective change:
the primary means of
tracking a patient
Copyright 2018 by Data Blueprint Slide #
42. Data Modeling Example #2
fuel
rent-rate
phone-rate
phone-call
rental
agreement
customer
auto
repair
history
phone-unit
Source: Chikofsky 1990
Interpretations:
1. Car rental company
2. Rental agreement is central
3. No direct connection between
customer and contract
4. Contract must have a customer
5. Nothing structural prevents
autos from being rented to
multiple customers
6. Phone units are tied to rentals
!57Copyright 2018 by Data Blueprint Slide #
Model Purpose Statement:
This model codifies the official
vocabulary to be used when
describing aspects of any of the
following organizational concepts:
– fuel
– customer
– auto
– rental agreement
– rent-rate
– phone-call
– phone-rate
– phone-unit
– repair history
It is documentation shown
during the on-
boarding process
Data Modeling
Example #3
salesperson
name
commission
rate
invoice # amount date paid
customer
name
addresscustomer #dateorder #
pricequantityorder #item #
quantity
on hand
descriptionsupplieritem # cost
SALESPERSON
INVOICE
ORDER
CATALOG
LINE ITEM
!58Copyright 2018 by Data Blueprint Slide #
• Sales commission-based pricing information
• Difficult to change a customer address
• Easy to implement variable pricing - difficult to implement
standard pricing - is standard pricing implemented
• Sales person information is not directly tied to the order
• Price not included in the catalog
• Do sales people sell things that are shipped quickly so they get
their commission quicker?
• Nothing prohibits a sales from having multiple
sales persons
• Multiple invoices are allowed for a single order
• Partial shipment is allowed
• Data base cannot tell what part of an order the
invoice pertains to
Model Purpose Statement:
This model codifies the official
vocabulary and specific
operational rules to be used when
describing aspects of any of the
following organizational concepts:
– salesperson
– invoice
– order
– line item
– catalog
43. !59
DISPOSITION Data Map
Copyright 2018 by Data Blueprint Slide #
Model Purpose Statement:
This model codifies the official
vocabulary to be used when
describing disposition related organizational concepts:
– user
– admission
– discharge
– encounter
– facility
– provider
– diagnosis
Data Model #4: DISPOSITION
• At least one but possibly more system USERS enter the
DISPOSITION facts into the system.
• An ADMISSION is associated with one and only one
DISCHARGE.
• An ADMISSION is associated with zero or more
FACILITIES.
• An ADMISSION is associated with zero or more
PROVIDERS.
• An ADMISSION is associated with one or more
ENCOUNTERS.
• An ENCOUNTER may be recorded by a system USER.
• An ENCOUNTER may be associated with a PROVIDER.
• An ENCOUNTER may be associated with one or more
DIAGNOSES.
• At least one but possibly more system USERS enter the
DISPOSITION facts into the system.
• An ADMISSION is associated with one and only one
DISCHARGE.
• An ADMISSION is associated with zero or more
FACILITIES.
• An ADMISSION is associated with zero or more
PROVIDERS.
• An ADMISSION is associated with one or more
ENCOUNTERS.
• An ENCOUNTER may be recorded by a system USER.
• An ENCOUNTER may be associated with a PROVIDER.
• An ENCOUNTER may be associated with one or more
DIAGNOSES.
!60
ADMISSION Contains information about patient admission
history related to one or more inpatient episodes
DIAGNOSIS Contains the International Disease Classification
(IDC) of code representation and/or description
of a patient's health related to an inpatient code
DISCHARGE A table of codes describing disposition types
available for an inpatient at a FACILITY
ENCOUNTER Tracking information related to inpatient
episodes
FACILITY File containing a list of all facilities in regional
health care system
PROVIDER Full name of a member of the FACILITY team
providing services to the patient
USER Any user with access to create, read, update,
and delete DISPOSITION data
Copyright 2018 by Data Blueprint Slide #
ADMISSION Contains information about patient admission
history related to one or more inpatient episodes
DIAGNOSIS Contains the International Disease Classification
(IDC) of code representation and/or description
of a patient's health related to an inpatient code
DISCHARGE A table of codes describing disposition types
available for an inpatient at a FACILITY
ENCOUNTER Tracking information related to inpatient
episodes
FACILITY File containing a list of all facilities in regional
health care system
PROVIDER Full name of a member of the FACILITY team
providing services to the patient
USER Any user with access to create, read, update,
and delete DISPOSITION data
ADMISSION Contains information about patient admission
history related to one or more inpatient episodes
DIAGNOSIS Contains the International Disease Classification
(IDC) of code representation and/or description
of a patient's health related to an inpatient code
DISCHARGE A table of codes describing disposition types
available for an inpatient at a FACILITY
ENCOUNTER Tracking information related to inpatient
episodes
FACILITY File containing a list of all facilities in regional
health care system
PROVIDER Full name of a member of the FACILITY team
providing services to the patient
USER Any user with access to create, read, update,
and delete DISPOSITION data
Death must be a disposition code!
44. Two Brilliant Einstein Quotes
• "The significant
problems we
face cannot be
solved at the
same level of
thinking we were
at when we
created them."
– Albert Einstein
!61Copyright 2018 by Data Blueprint Slide #
IT Project or Application-Centric Development
Original articulation from Doug Bagley @ Walmart
!62Copyright 2018 by Data Blueprint Slide #
Data/
Information
IT
Projects
Strategy
• In support of strategy, organizations
implement IT projects
• Data/information are typically
considered within the scope of IT
projects
• Problems with this approach:
– Ensures data is formed to the
applications and not around the
organizational-wide information
requirements
– Process are narrowly formed around
applications
– Very little data reuse is possible
45. Data-Centric Development
Original articulation from Doug Bagley @ Walmart
!63Copyright 2018 by Data Blueprint Slide #
IT
Projects
Data/
Information
Strategy
• In support of strategy, the organization
develops specific, shared data-based
goals/objectives
• These organizational data goals/
objectives drive the development of
specific IT projects with an eye to
organization-wide usage
• Advantages of this approach:
– Data/information assets are developed from an
organization-wide perspective
– Systems support organizational data needs and
compliment organizational process flows
– Maximum data/information reuse
theDataDoctrine.com
We are uncovering better ways of developing
IT systems by doing it and helping others do it.
Through this work we have come to value:
Data programmes preceding software development
Stable data structures preceding stable code
Shared data preceding completed software
Data reuse preceding reusable code
!64Copyright 2018 by Data Blueprint Slide #
46. theDataDoctrine.com
We are uncovering better ways of developing
IT systems by doing it and helping others do it.
Through this work we have come to value:
Data programmes preceding software development
Stable data structures preceding stable code
Shared data preceding completed software
Data reuse preceding reusable code
!65Copyright 2018 by Data Blueprint Slide #
That is, while there is value in the items on
the right, we value the items on the left more.
• "Everything should be
made as simple as
possible, but no
simpler."
– Albert Einstein
Two Brilliant Einstein Quotes
!66Copyright 2018 by Data Blueprint Slide #
47. Typically Managed Architectures
• Process Architecture
– Arrangement of inputs -> transformations = value -> outputs
– Typical elements: Functions, activities, workflow, events, cycles, products, procedures
• Systems Architecture
– Applications, software components, interfaces, projects
• Business Architecture
– Goals, strategies, roles, organizational structure, location(s)
• Security Architecture
– Arrangement of security controls relation to IT Architecture
• Technical Architecture/Tarchitecture
– Relation of software capabilities/technology stack
– Structure of the technology infrastructure of an enterprise, solution or system
– Typical elements: Networks, hardware, software platforms, standards/protocols
• Data/Information Architecture
– Arrangement of data assets supporting organizational strategy
– Typical elements: specifications expressed as entities, relationships, attributes,
definitions, values, vocabularies
!67Copyright 2018 by Data Blueprint Slide #
As Is Information
Requirements
Assets
As Is Data Design Assets As Is Data Implementation
Assets
ExistingNew
Modeling in Various Contexts
O2 Recreate
Data Design
Reverse Engineering
Forward engineering
O5 Reconstitute
Requirements
O9
Reimplement
Data
To Be Data
Implementation
Assets
O8
Redesign
Data
O4
Recon-
stitute
Data
Design
O3 Recreate
Requirements
O6
Redesign
Data
To Be
Design
Assets
O7 Re-
develop
Require-
ments
To Be
Requirements
Assets
O1 Recreate Data
Implementation
Metadata
!68Copyright 2018 by Data Blueprint Slide #
48. Information Architecture Component Reengineering Options
O-1 data implementation (e.g., by recreating descriptions of implemented file
layouts);
O-2 data designs (e.g., by recreating the logical system design layouts); or
O-3 information requirements (e.g., by recreating existing system specifications and
business rules).
O-4 data design assets by examining the existing data implementation (when
appropriate O-1 can facilitate O-4); and
O-5 system information requirements by reverse engineering the data design O-4.
(Note: if the data design doesn't exist O-4 must precede O-5.)
O-6 transforming as is data design assets, yielding improved to be data designs that
are based on reconstituted data design assets produced by O-2 or O-4 and
(possibly O-1);
O-7 transforming as is system requirements into to be system requirements that are
based on reconstituted system requirements produced by O-3 or O-5 and
(possibly O-2);
O-8 redesigning to be data design assets using the to be system requirements
based on reconstituted system requirements produced by O-7; and
O-9 re-implementing system data based on data redesigns produced by O-6 or O-8.
!69Copyright 2018 by Data Blueprint Slide #
Model Evolution Framework
!70Copyright 2018 by Data Blueprint Slide #
Conceptual Logical Physical
Goal
Validated
Not Validated
Every change can
be mapped to a
transformation in
this framework!
49. Model Evolution (better explanation)
!71Copyright 2018 by Data Blueprint Slide #
As-is To-be
Technology
Independent/
Logical
Technology
Dependent/
Physical
abstraction
Other logical
as-is data
architecture
components
• "Concern for man and
his fate must always
form the chief interest of
all technical endeavors.
Never forget this in the
midst of your diagrams
and equations."
– Albert Einstein
!72Copyright 2018 by Data Blueprint Slide #
50. Data Models Used to Support Strategy
• Flexible, adaptable data structures
• Cleaner, less complex code
• Ensure strategy effectiveness measurement
• Build in future capabilities
• Form/assess merger and acquisitions strategies
!73Copyright 2018 by Data Blueprint Slide #
Employee
Type
Employee
Sales
Person
Manager
Manager
Type
Staff
Manager
Line
Manager
Adapted from Clive Finkelstein Information Engineering Strategic Systems Development 1992
How do Data Models Support Organizational Strategy?
• Consider the opposite question:
– Were your systems explicitly designed to
be integrated or otherwise work together?
– If not then what is the likelihood that they
will work well together?
– In all likelihood your organization is spending between 20-40% of its
IT budget compensating for poor data structure integration
– They cannot be helpful as long as their structure is unknown
• Two answers
– Achieving efficiency and effectiveness goals
– Providing organizational dexterity for rapid implementation
!74Copyright 2018 by Data Blueprint Slide #
51. Typical focus of a
database modeling effort
Data Modeling Ensures Interoperability
!75Copyright 2018 by Data Blueprint Slide #
Program F
Program E
Program D
Program G
Program H
Application
domain 2Application
domain 3
Program I
Typical focus of a
software engineering effort
Program A
DataModel
DataModel
DataModel
DataModel
DataModel
DataModel
Program F
Program E
Program D
Program G
Program H
Program I
Application
domain 2Application
domain 3
DataModel
DataModel
DataModel
Data Model Focus has Great Potential Business Value
• How are decisions
about the range and
scope of common data
usage, made?
• Analysis scope is on
use of data to support a
process
• Problems caused by
data exchange or
interface problems
• Goals often connect
strategic and
operational
• One data model is ideal
!76Copyright 2018 by Data Blueprint Slide #
DataModel
Program A
52. !77Copyright 2018 by Data Blueprint Slide #
Data Modeling Fundamentals
• Data Management Overview
• Motivation
– of Systems/components
– Data is a not well understood substructure
• Why data modeling & what is it?
– Model represents our understanding of the
– Fundamental, foundational system
characteristics
– Shared between system and human
• Fundamentals
– The power of the purpose statement
– Understanding data centric thinking
– Data modeling compliments other architecture/
engineering techniques, as well as
– Challenges beyond data modeling
• Take Aways, References & Q&A
Use Models to
!78
• Store and formalize information
• Filter out extraneous detail
• Define an essential set of
information
• Help understand complex system behavior
• Gain information from the process of developing and
interacting with the model
• Evaluate various scenarios or other outcomes indicated by
the model
• Monitor and predict system responses to changing
environmental conditions
Copyright 2018 by Data Blueprint Slide #
53. • Goal must be shared IT/business understanding
– No disagreements = insufficient communication
• Data sharing/exchange is largely and highly automated and
thus dependent on successful engineering
– It is critical to engineer a sound foundation of data modeling basics
(the essence) on which to build advantageous data technologies
• Modeling characteristics change over the course of analysis
– Different model instances may be useful to different analytical problems
• Incorporate motivation (purpose statements) in all modeling
– Modeling is a problem defining as well as a problem solving activity - both are inherent to
architecture
• Use of modeling is much more important than selection of a specific modeling method
• Models are often living documents
– It easily adapts to change
• Models must have modern access/interface/search technologies
– Models need to be available in an easily searchable manner
• Utility is paramount
– Adding color and diagramming objects customizes models and allows for a more engaging and
enjoyable user review process
Data Modeling for Business Value
!79
Inspired by: Karen Lopez http://www.information-management.com/newsletters/enterprise_architecture_data_model_ERP_BI-10020246-1.html?pg=2
Copyright 2018 by Data Blueprint Slide #
Why Modeling
!80Copyright 2018 by Data Blueprint Slide #
• Would you build a house without an
architecture sketch?
• Model is the sketch of the system to be
built in a project.
• Would you like to have an estimate how
much your new house is going to cost?
• Your model gives you a very good idea of
how demanding the implementation work
is going to be!
• If you hired a set of constructors from all
over the world to build your house, would
you like them to have a common
language?
• Model is the common language for the
project team.
• Would you like to verify the proposals of
the construction team before the work gets
started?
• Models can be reviewed before thousands
of hours of implementation work will be
done.
• If it was a great house, would you like to
build something rather similar again, in
another place?
• It is possible to implement the system to
various platforms using the same model.
• Would you drill into a wall of your house
without a map of the plumbing and electric
lines?
• Models document the system built in a
project. This makes life easier for the
support and maintenance!
54. Upcoming Events
Enterprise Data World 2018 (San Diego)
The First Year as a CDO
April 24, 2018 @ 1:30 PM ET
May Webinar:
Implementing the Data Maturity Model
May 8, 2018 @ 2:00 PM ET/11:00 AM PT
June Webinar:
Data Governance Strategies
June 12, 2018 @ 2:00 PM ET/11:00 AM PT
DGIQ 2018 (San Diego)
Keeping the Momentum Going in your Data Quality Program
June 11, 2018 @ 1:30 PM (PT)
Sign up for webinars at: www.datablueprint.com/webinar-schedule
!81Copyright 2018 by Data Blueprint Slide #Copyright 2018 by Data Blueprint Slide #
Brought to you by:
Join in the discussion - questions?
It’s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to Peter now!
+ =
!82Copyright 2018 by Data Blueprint Slide #
55. 10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056
Copyright 2018 by Data Blueprint Slide # !83