A tiered storage architecture offers a cost-efficient solution for today’s enterprise storage needs. Instead of storing all data in one class of storage, in a tiered architecture, hot data can reside in a high-performance tier to ensure low access latencies, while colder data can be stored in tiers with lower cost per gigabyte. This white paper discusses how enterprises can take full advantage of tiered storage architecture.
2. The world is becoming smarter
all the time. Smartphones,
tablets and smart TVs are already
common household items and
with Internet of Things (IoT), more
and more everyday items are
becoming smart in one way or
the other. While smart devices
are making life more convenient,
they are also putting enormous
pressure on enterprise storage.
The generation and consumption
of data are the corner stones of
connected devices, so data must
be stored, analyzed and also
made quickly accessible in order
to enjoy the full potential that
connected devices offer.
With more Internet-connected
devices in use, the amount of
data that is generated on a yearly
basis is growing exponentially,
and by 2020 the data universe is
expected to reach 44 zettabytes
with growth forecast to continue
at 40% year over year. According
to analyst firm IDC, however,
about 90% of the world’s
data is considered to be “cold
data”, which is only accessed
infrequently. In other words, we
only access about 10% of the
world’s data on a regular basis.
Facebook is an excellent real-
world example of hot versus cold
data. When a photo is uploaded
to Facebook, it is pushed to
friends’ and followers’ timelines,
making the photo “hot” as it’s
accessed by hundreds, even
thousands, of users within
minutes. The photo will also
generate associated data in the
form of likes, comments and
tags, but eventually the photo will
fade away from people’s timelines
and become just another photo
in one’s profile. The photo may
still be viewed by a user every
now and then, but basically the
photo and its associated data
that was once hot and frequently
accessed has now become cold,
infrequently accessed data.
Given the different access
frequencies, hot and cold data
obviously have different storage
requirements. Because hot
data is accessed frequently and
possibly by thousands of people
at the same time, it needs to
be stored in a storage device
that is capable of providing high
performance and low latency.
However, the cost of high
performance storage, such as
PCIe NVMe SSDs, is always
higher. For this reason, it is not
cost-efficient to store all data in
the same type of device if only a
fraction is accessed regularly.
A tiered storage architecture
offers a cost-efficient solution
for today’s enterprise storage
needs. Instead of storing all data
in one class of storage, tiered
architecture provides several
different tiers of storage with
each having unique performance
and cost characteristics. Tiering
provides the best performance
and lowest cost by storing data
in the appropriate tier based on
the access frequency of the data.
In other words, hot data can
reside in a high-performance tier
to ensure low access latencies,
whereas colder data can be
stored in tiers with lower cost
per gigabyte.
Smart World: The Challenge for Enterprise Storage
Source: Oracle 2012
2010
5
10
20
30
40
50
Data in zettabytes
2015 2020
Identifying “hot” and
“cold” data is key to
tiering storage
Data growing at a
40% compound
annual rate
10% hot data
90% cold data
3. Computer memory architecture
has always consisted of several
tiers. A modern CPU alone has
three levels of SRAM caches
inside (L1, L2 & L3), which are
accompanied by a system-wide
DRAM cache sitting on the DDR
interface. It’s natural to extend
the tiered architecture to storage
as well, because modern storage
devices span across multiple
latency tiers. PCIe NVMe SSDs
offer the highest performance
and lowest latency, but SATA
6Gbps SSDs provide lower cost
per gigabyte. Hard disk drives
(HDDs) reside at the bottom of
the latency tier and are orders
of magnitude slower than even
SATA 6Gbps SSDs, but the cost
per gigabyte is the lowest of all
by a significant margin.
When speaking of caching and
tiering, it’s important to make
a clear distinction between the
two. Caching comes in three
different forms (write-around,
write-through and write-
back caching) and is typically
employed with memory (SRAM/
DRAM) and storage (SSD/
HDD), but it can be utilized with
different classes of storage as
well. Basically, the faster form
of memory/storage is used
as a temporary data cache
to improve read and/or write
performance depending on the
chosen cache type.
• Write-around cache is
effectively a read-only
cache, as all data is first
written to the slower tier
and the caching algorithms
then determine what data
is accessed frequently and
copies it to the faster tier for
lower latency access.
• Write-through cache writes
to both fast and slow storage
tiers simultaneously and
a write operation is only
considered complete when
it has been written to both
tiers. In other words, write-
through cache is also a
read-only cache because
the performance is still
determined by the slower tier
and the idea behind write-
through caching is that the
data that was written the
most recently is also the
most likely to be read next.
• Write-back cache is the
only one that improves write
performance because data is
first written to the faster tier
and then later moved to the
slower tier.
Tiering is fundamentally different
from caching because there
are no temporary caches - just
different tiers of permanent
storage. Whereas caching
copies data from the slower tier
to a faster one for improved read
latency, in tiering, data is never
copied – it is always moved
in full from one tier to another.
That creates a space efficiency
advantage because the capacity
of all storage tiers is available to
the host, whereas in caching,
only the capacity of the slower
tier is accessible because data is
copied and not moved.
Another fundamental difference
between tiering and caching is
that tiering supports more than
two tiers of storage. Because
data is moved between tiers
based on access frequency,
there is practically no upper limit
on the number of tiers that a
tiered storage architecture can
have. Caching architectures
typically only work with two
tiers, as the faster tier is used
to accelerate the slower main
tier where all data is ultimately
stored long-term. A multi-tier
architecture enables higher
performance and lower
cost because the whole
storage architecture can be
designed to best fit a specific
workload, which may consist
of varying levels of data access
frequencies.
Understanding the Difference Between Tiering
and Caching
Type Latency Typical Size
L1 Cache 1-3 ns 32KB per CPU core
L2 Cache 3-10 ns 256KB per CPU core
L3 Cache 10-20 ns 2-20MB per CPU package
DRAM 30-60 ns 2-32GB per module
PCIe NVMe SSD 20,000-100,000 ns 400-3,200GB per drive
SATA SSD 40,000-110,000 ns 120-3,840GB per drive
HDD 3,000,000-10,000,000 ns 500-8,000GB per drive
Latencies in Modern Computer Architecture
A nanosecond (ns) is one billionth (10-9
) of a second
4. Generally speaking, the
cost of storage is dictated
by performance: the more
performance a drive provides,
the higher the price per gigabyte.
When variation in data access
frequencies is added to the
equation, it does not make sense
to use just one tier of storage. If
only high-performance storage
was used, the cost would be
through the roof and since the
array would end up storing mostly
cold data, the return on investment
(ROI) would be very poor. Similarly,
if only low cost storage was used,
the performance would be very
limited, which would deteriorate
user experience and could limit the
growth of the business.
Performance and capacity per
dollar are the key metrics in
tiering. Each type of storage has
its unique performance and cost
characteristics, so a multi-tiered
storage architecture enables the
highest cost efficiency as each
storage type can be used in its
appropriate tier.
All the most frequently accessed
data and all incoming writes
are initially stored in Tier 1,
resulting in very high read and
write activity. Because PCIe
NVMe SSDs provide the highest
performance per dollar and watt,
using them in Tier 1 is the most
cost-efficient solution. It would
take multiple SATA SSDs or
thousands of HDDs to match
the performance of a single PCIe
NVMe SSD, which would be far
costlier to acquire despite PCIe
NVMe SSDs commanding a
higher cost per gigabyte.
Additionally, multiple SATA SSDs
and especially hundreds of HDDs
would consume significantly
more power, thus a PCIe NVMe
SSD also provides lower total
cost of ownership (TCO) by
reducing electricity expenses.
2-bit MLC SATA SSDs, such as
the Samsung SM863 Series,
are the optimal choice for Tier 2
because they still provide high
performance, but offer lower
cost per gigabyte than PCIe
NVMe SSDs do. Compared to
3-bit MLC, 2-bit MLC also has
higher write endurance, which is
beneficial since the data in the
upper tiers is more likely to be
modified than the static, cold
data that has already reached
Tier 3. On the other hand, 3-bit
MLC offers read performance
similar to 2-bit MLC but at a
lower cost per gigabyte, making
it ideal for Tier 3 where the data
is mostly read-only.
While SSDs offer superior
performance, density and power
efficiency, HDDs are priced
noticeably lower per gigabyte
than SSDs are. Thus HDDs may
still have a place at the bottom
of the storage tier, especially in
scenarios where large amounts
of cold data need to be stored.
In such scenarios, the acquisition
cost of SSD-only storage would
be very high, so even though
SSDs offer higher density (less
racks and space required) and
power efficiency (lower electricity
cost), the cost may be too high
if hundreds or thousands of
petabytes of storage is needed.
For deep archives of very large
datasets, even tape can be used
below the HDD tier for the data
that is almost never accessed,
but still needs to be retained for
possible future use.
Tiering Storage for Performance and TCO
For more details about the
differences between 2-bit
and 3-bit MLC, please refer
to “Evaluating MLC vs TLC
vs V-NAND for Enterprise
Applications” white paper.
LEARN MORE
Storage Options for Each Tier
Tier 1
PCIe NVMe
SSD
Tier 2
2-bit MLC SSD
Tier 3
3-bit MLC SSD
Tier 4
HDDs
SM863
PM863
Performanceper$
Capacityper$