This presentation was given at Daho.am, Munich developers conference. It tells the Cloudinary story of building and scaling a service for developers in the bootstrap way with zero external funding.
The presentation shares some insights we have, behind-the-scenes details including the evolution of Cloudinary's internal architecture and some interesting numbers.
Cloudinary provides a cloud-based service for image and video management: uploading media files to the cloud directly from the browser or mobile device, perform image manipulation and video transcoding on the fly using URL-based API and deliver the media content optimized to your users via a fast CDN.
2. Initial Vision Late 2011
Eliminate all image related R&D work
(Upload, storage, administration, manipulation, optimization, delivery)
Cloud based API for image upload
URL based API for on-the-fly image manipulation
http://res.cloudinary.com/demo/image/upload/w_180,h_140,c_thumb,g_face/bike.jpg
6. First Architecture
A Single Virtual Server
AWS (Amazon Web Services)
EC2 (Elastic Compute Cloud)
Early 2012
Don’t re-implement the wheel
Use existing solutions. Use open source solutions.
Use cloud services. Use AWS.
Keep it simple. 80/20 Rule. Always. Unless it is fun…
7. First Architecture
A Single Virtual ServerRuby on Rails
Application
Business Logic, Image processing
MySQL
Database
nginx + Passenger
Web server
AWS S3
Storage
AWS CloudFront
CDN
Early 2012
http://res.cloudinary.com/demo/image/upload/w_133,h_133,c_thumb,g_face/bike.jpg
8. The Importance Of Caching
Multiple cache layers to protect your application server
Generated images
are stored in S3
Fast nginx with
S3 access
CDN as the
main cache layer
On-the-fly image processing Tons of user requests
Image manipulation performance relevant only to first user
Don’t optimize performance yet
(and no need for GPU acceleration…)
9. Getting Initial Traction
Nice Initial
Vision
Implemented
A Cool Service
Cloud-based API
Publicly Available ?
What now?
10. What Do Developers Want?
We started with a Ruby GEM
for Ruby on Rails Web developers
Developers love
minimal code changes
Developers look for
simple SDKs
Public API
is not enough
11. Developers Want One-Liners
http://res.cloudinary.com/demo/image/upload/w_150,h_150,c_fill/bike.jpg
Instead of manually building URLs:
<%= image_tag("bike.jpg", :width => 150, :height => 150) %>
Rails developers are used to image_tag:
<%= cl_image_tag("bike.jpg", :width => 150, :height => 150,
:crop => :fill) %>
Providing a similar view helper method:
<%= cl_image_tag("bike.jpg", :width => 150, :height => 150,
:crop => :thumb, :gravity => :face) %>
Additional manipulation options, everything is done in the cloud:
12. Developers Love One-Liners
Local/S3 storage.
Local RMagick/MiniMagick.
Regular URLs.
class PictureUploader < CarrierWave::Uploader::Base
process convert: 'jpg'
version :standard do
process resize_to_fill: [100, 150, :north]
end
version :thumbnail do
process resize_to_fit: [80, 80]
end
end
class Post < ActiveRecord::Base
...
mount_uploader :picture, PictureUploader
...
end
Cloud image storage.
Cloud image processing.
CDN URLs.
include Cloudinary::CarrierWave Original code, without Cloudinary
With Cloudinary, adding one line
13. Developer Religions
Need to support all frameworks. Tight integrations
And specific sub-frameworks…
But jQuery-free Javascript for Angular people…
14. Developer Religions
Each religion framework has different conventions and rules
Ruby GEM
330,000 Downloads
500+ Paying customers
But still…
cloudinary.uploader.upload("my_picture.jpg",
function(result) { … },
{ crop: "limit", width: 2000 }
);
Wrong…
cloudinary.uploader.upload("my_picture.jpg",
{ crop: "limit", width: 2000 },
function(error, result) { … },
);
Great…
16. Documentation and Blog
API Documentation,
Framework specific guides
Technical posts, Tutorials,
Cookbook recipes
SDKs must be accompanied with detailed documentation
Great SEO impact. Most of Cloudinary’s customers
arrive due to technical content on the website
17. Documentation and Blog
Hard work, but very appreciated Or not…
Having great technical support does the rest of the work
From developers
to developers
1,000 support tickets
last month
18. Multiple EC2 instances (Auto Scaling group)
Separated UI and API clusters
Updated Architecture
Ruby on Rails - Application
Business Logic, Image processing,
Background processing
nginx + Passenger
Web server
AWS S3
Storage
Akamai
CDN
Early 2013
AWS SQS
Queuing service
AWS RDS (Multi-AZ)
Hosted MySQL Database
AWS ELB
Load Balancer
API
Requests
19. Switched to Akamai CDN
Charge per request x 15KB Average image size = Requests cost > Bandwidth cost
Customers demanded Akamai Wider global presence
Better cache invalidation
CloudFront was great, but:
And… surprisingly, has better pricing
20. Don’t waste money on
too many extra servers…
The Auto Scaling Challenge
Handle bursts of image requests
Scale up as quickly as possible
Define correct scale sensors (CPU or IO)
Keep enough extra servers upHandle single customer’s website bursts
Handle growth of customers & traffic
21. Better Auto Scaling
AWS Auto Scaling Group
Always have enough extra servers
Multiple availability zones
Optimized AMI with minimal setup on load
High average CPU utilization High number of active Passengers
Reserve
instances
to cut costs
22. Handling Single Customer Bursts
A single customer should not break the system for others
Specifically in bursts while launching new sites
Limit the number of Rails
processes (passenger) per customer
Until the system
auto scales up…
And cache errors in S3 for faster delivery by nginx
23. System Monitoring
Our service is in the heart of our customer’s production services
System must always work perfectly
Monitoring is critical
Deliver Cloudinary-based image
ads to viewers of the largest most
popular news sites worldwide
For example:
24. System Monitoring
Leveraging 3rd-party and in-house monitoring services
AWS
CloudWatch
Log Analysis, Applicative Monitoring and more
And… to make sure
we wake up at night:
Automatic phone
calls to the team
25. Battle Tested
So many possible cases. Endless potential errors
Many image formats Many color profiles Many frameworks
Many devices Many versions Many resolutionsMany operating systems
Analyzing all errors Thousands of automatic tests
26. EC2 strong SSD instances (Auto Scaling group)
Separated UI and API clusters
Ruby on Rails - Application
Business Logic, Image processing,
Background processing
nginx + Passenger
Web server
RedisScala Daemon
Advanced Architecture
AWS S3
Storage Akamai
CDN
Late 2014
AWS SQS
Queuing service
AWS ELB
Load Balancer
MySQL Clusters
Sharded Databases
MySQL Clusters
Sharded Databases
MySQL Clusters
Sharded Databases
MySQL Clusters
Sharded Databases
MySQL Clusters
Sharded Databases
Cassandra
NoSQL DB Cluster
MySQL Clusters
Sharded Databases
MySQL Clusters
Sharded Databases
ElasticSearch
Indexing Cluster
3rd Party Add-ons
MySQL
Clusters
MySQL
Clusters
LogStash Cluster
Log monitoring
API
Requests
27. AWS - Cost Effective
Not expensive as its reputation
Reserved instances
Powerful services
Don’t build by yourself…
Very flexible
Quick-start boost
28. Performance Analysis
Time to improve processing performance and reduce CPU costs
The main
suspects…
Multiple Thumbnails Image Rotation
Displayed interactive results using D3 charts
Stored data in Cassandra database
Tracked sampled transformation CPU usage
Face Detection
29. Performance Analysis Result
The culprit: Ruby execution of external image processing commands
def run_external(command, options={})
Open3.popen3(command) do
|stdin, stdout, stderr, wait_thr|
stdin.binmode
stdout.binmode
stderr.binmode
stdin.write(options[:input]) if options[:input]
stdin.close
begin
Timeout::timeout(options[:timeout]) do
return nil if wait_thr.value != 0
return [stdout.read, stderr.read]
end
rescue Timeout::Error
Process::kill("TERM", wait_thr.pid)
return nil
end
end
end
Original code - Very slow
def run_external(command, options={})
child = POSIX::Spawn::Child.new(command,
options.slice(:timeout, :input))
if child.success?
return [child.out, child.err]
else
Rails.logger.warn(
"Failed: #{child.out} #{child.err}")
nil
end
rescue POSIX::Spawn::TimeoutExceeded
Rails.logger.warn("#{command} - Timeout exceeded")
nil
end
Updated code - Much faster
31. The Bootstrapped Way
Worked well for us :-)
Small team of great people
Gradual architecture
Great service. Great support
Grassroots marketing. Content oriented
Gradual features. Customer first
32. And it’s just the beginning….
Today’s Numbers. What’s Next?
Developers signed-up Paying customers from 65 countries
Daily new images Daily image delivered
* Not including CDN layer of customers…
From startups to Fortune 500 companies
CPU cores
55,000 2,000
20 Million 1.5 Billion 1,000
Architecture keeps evolvingScale challenges getting complex
Being customer focused - New features every week…
33. One more thing…
Debated since day one. Glad we didn’t implement it too early :-)
The market is ready now. Customers requested
Same capabilities of images, now also for video
http://res.cloudinary.com/demo/video/upload/w_200,h_200,c_fill,g_north/dog.mp4
Longer processing times. Heavier processing.
Significant architecture changes. New clusters…