SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
GPU Accelerated
Libraries for
Ruby
Prasun Anand
About me
● SciRuby Contributor
● Google Summer of Code 2016, 2017
● Genenetwork project
● Ruby Grant 2017
● Projects:
○ JRuby port of NMatrix
○ ArrayFire gem
○ RbCUDA
Scientific Computing
Why should Python
programmers have all the
fun ?
Arrays / Matrices
BLAS and LAPACK
GPU Computing is not easy !
CUDA and OpenCL
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
[2] pry(main)> b = a + a
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
2.0000 6.0000
4.0000 8.0000
=> #<ArrayFire::Af_Array:0x000000020625c8>
[1] pry(main)> left = ArrayFire::Af_Array.new 2 , [3,3] , [1, 4, 6, 4, 11 , 2 ,-5, 8, 10]
No Name Array
[3 3 1 1]
1.0000 4.0000 -5.0000
4.0000 11.0000 8.0000
6.0000 2.0000 10.0000
=> #<ArrayFire::Af_Array:0x000000014e56c8>
[2] pry(main)> right = ArrayFire::Af_Array.new 2 , [3,2] , [1, 0, 8, 10, -11, 8]
No Name Array
[3 2 1 1]
1.0000 10.0000
0.0000 -11.0000
8.0000 8.0000
=> #<ArrayFire::Af_Array:0x00000001591db0>
[3] pry(main)> result = ArrayFire::BLAS.matmul(left, right, :AF_MAT_NONE, :AF_MAT_NONE)
No Name Array
[3 2 1 1]
-39.0000 -74.0000
68.0000 -17.0000
86.0000 118.0000
=> #<ArrayFire::Af_Array:0x000000016136f8>
VALUE arf_init(int argc, VALUE* argv, VALUE self)
{
afstruct* afarray;
Data_Get_Struct(self, afstruct, afarray);
dim_t ndims = (dim_t)NUM2LONG(argv[0]);
dim_t* dimensions = (dim_t*)malloc(ndims * sizeof(dim_t));
dim_t count = 1;
for (size_t index = 0; index < ndims; index++) {
dimensions[index] = (dim_t)NUM2LONG(RARRAY_AREF(argv[1], index));
count *= dimensions[index];
}
double* host_array = (double*)malloc(count * sizeof(double));
for (size_t index = 0; index < count; index++) {
host_array[index] = (double)NUM2DBL(RARRAY_AREF(argv[2], index));
}
af_create_array(&afarray->carray, host_array, ndims, dimensions, f64);
return self;
}
static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val, VALUE left_prop_val, VALUE
right_prop_val){
afstruct* left;
afstruct* right;
afstruct* result = ALLOC(afstruct);
Data_Get_Struct(left_val, afstruct, left);
Data_Get_Struct(right_val, afstruct, right);
af_mat_prop left_mat_prop = arf_mat_type_from_rbsymbol(left_prop_val);
af_mat_prop right_mat_prop = arf_mat_type_from_rbsymbol(right_prop_val);
af_matmul(&result->carray, left->carray, right->carray, left_mat_prop, right_mat_prop);
return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result);
}
BLAS functionalities
● Matmult
● Transpose
LAPACK functionalities
● Det
● Inverse
● Norm
● Qr
● Cholesky
● Svd
● lu
Statistics
● Mean
● Median
● Variance
Benchmarks
● AMD FX 8350 octacore processor
● Nvidia GTX 750Ti GPU
● Double dtype
10 X
Faster than NMatrix-Ruby-Lapack
10,000 X
Faster than NMatrix-Ruby
100,000 X
Faster than NMatrix-Ruby-BLAS
RbCUDA
GPU Array
● Generic pointer used to handle an array of elements on the GPU.
● Memory copying from CPU to GPU and vice-versa.
● Interfaced with NMatrix and NArray
vadd_kernel_src = <<-EOS
extern "C" {
__global__ void matSum(int *a, int *b, int *c)
{
int tid = blockIdx.x;
if (tid < 100)
c[tid] = a[tid] + b[tid];
}
}
EOS
f = compile(vadd_kernel_src)
RbCUDA::Driver.run_kernel(f.path)
● CuBLAS
● CuSolver
● CuRand
Benchmarks
● AMD FX 8350 octacore processor
● Nvidia GTX 750Ti GPU
● Double dtype
1,000,000 X
Faster than NMatrix-Ruby-BLAS
Future Work
● Image Processing APIs and Indexers
● Multiple dtypes
● RbCUDA is under active development.
● Project RbCUDA is being funded by Ruby Association
(Ruby Grant 2017)
Hobby Projects
1. Synapse : A Ruby first Deep Learning Library
2. Ninjaplot : A plotting library
● https://github.com/arrayfire/arrayfire-rb
● https://github.com/prasunanand/rbcuda
● https://github.com/prasunanand/synapse
● https://github.com/prasunanand/ninjaplot
Contributions are Welcome!
Acknowledgements
1. Pjotr Prins
2. Pradeep Garigipati
3. Kenta Murata
Thanks!
Github: prasunanand
Twitter: @prasun_anand
Blog: prasunanand.com
Questions

Mais conteúdo relacionado

Mais procurados

Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & howlucenerevolution
 
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵Wanbok Choi
 
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみたスマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみたTaro Matsuzawa
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in Rmickey24
 
2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchersSamuel Harrold
 
Statistical Schema Induction
Statistical Schema InductionStatistical Schema Induction
Statistical Schema InductionJohanna Voelker
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboyKenneth Geisshirt
 
What they don't tell you about JavaScript
What they don't tell you about JavaScriptWhat they don't tell you about JavaScript
What they don't tell you about JavaScriptRaphael Cruzeiro
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 
Javascript Arrays
Javascript ArraysJavascript Arrays
Javascript Arraysshaheenakv
 
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...Flink Forward
 
Pointer Events in Canvas
Pointer Events in CanvasPointer Events in Canvas
Pointer Events in Canvasdeanhudson
 
LLVM Backend の紹介
LLVM Backend の紹介LLVM Backend の紹介
LLVM Backend の紹介Akira Maruoka
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases LabFabio Fumarola
 
Python 101 language features and functional programming
Python 101 language features and functional programmingPython 101 language features and functional programming
Python 101 language features and functional programmingLukasz Dynowski
 

Mais procurados (19)

Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
 
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
 
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみたスマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09
 
2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers
 
Statistical Schema Induction
Statistical Schema InductionStatistical Schema Induction
Statistical Schema Induction
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboy
 
Python GC
Python GCPython GC
Python GC
 
What they don't tell you about JavaScript
What they don't tell you about JavaScriptWhat they don't tell you about JavaScript
What they don't tell you about JavaScript
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Javascript Arrays
Javascript ArraysJavascript Arrays
Javascript Arrays
 
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
 
Pointer Events in Canvas
Pointer Events in CanvasPointer Events in Canvas
Pointer Events in Canvas
 
JS OO and Closures
JS OO and ClosuresJS OO and Closures
JS OO and Closures
 
D3.js workshop
D3.js workshopD3.js workshop
D3.js workshop
 
LLVM Backend の紹介
LLVM Backend の紹介LLVM Backend の紹介
LLVM Backend の紹介
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases Lab
 
Python 101 language features and functional programming
Python 101 language features and functional programmingPython 101 language features and functional programming
Python 101 language features and functional programming
 

Semelhante a Rubyconfindia2018 - GPU accelerated libraries for Ruby

High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018Prasun Anand
 
High performance GPU computing with Ruby
High performance GPU computing with RubyHigh performance GPU computing with Ruby
High performance GPU computing with RubyPrasun Anand
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on JrubyPrasun Anand
 
Coscup2021 - useful abstractions at rust and it's practical usage
Coscup2021 - useful abstractions at rust and it's practical usageCoscup2021 - useful abstractions at rust and it's practical usage
Coscup2021 - useful abstractions at rust and it's practical usageWayne Tsai
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]RootedCON
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with ScalaJorge Paez
 
Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for SpeedYung-Yu Chen
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
 
Regular expressions, Alex Perry, Google, PyCon2014
Regular expressions, Alex Perry, Google, PyCon2014Regular expressions, Alex Perry, Google, PyCon2014
Regular expressions, Alex Perry, Google, PyCon2014alex_perry
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
SP-First-Lecture.ppt
SP-First-Lecture.pptSP-First-Lecture.ppt
SP-First-Lecture.pptFareedIhsas
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov
 
PHP and MySQL Tips and tricks, DC 2007
PHP and MySQL Tips and tricks, DC 2007PHP and MySQL Tips and tricks, DC 2007
PHP and MySQL Tips and tricks, DC 2007Damien Seguy
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 

Semelhante a Rubyconfindia2018 - GPU accelerated libraries for Ruby (20)

High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018
 
High performance GPU computing with Ruby
High performance GPU computing with RubyHigh performance GPU computing with Ruby
High performance GPU computing with Ruby
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on Jruby
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
 
Coscup2021 - useful abstractions at rust and it's practical usage
Coscup2021 - useful abstractions at rust and it's practical usageCoscup2021 - useful abstractions at rust and it's practical usage
Coscup2021 - useful abstractions at rust and it's practical usage
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with Scala
 
Python lecture 05
Python lecture 05Python lecture 05
Python lecture 05
 
Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
 
Eta
EtaEta
Eta
 
Regular expressions, Alex Perry, Google, PyCon2014
Regular expressions, Alex Perry, Google, PyCon2014Regular expressions, Alex Perry, Google, PyCon2014
Regular expressions, Alex Perry, Google, PyCon2014
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Chp4(ref dynamic)
Chp4(ref dynamic)Chp4(ref dynamic)
Chp4(ref dynamic)
 
SP-First-Lecture.ppt
SP-First-Lecture.pptSP-First-Lecture.ppt
SP-First-Lecture.ppt
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
 
PHP and MySQL Tips and tricks, DC 2007
PHP and MySQL Tips and tricks, DC 2007PHP and MySQL Tips and tricks, DC 2007
PHP and MySQL Tips and tricks, DC 2007
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 

Último

OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 

Último (20)

OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 

Rubyconfindia2018 - GPU accelerated libraries for Ruby

  • 2. About me ● SciRuby Contributor ● Google Summer of Code 2016, 2017 ● Genenetwork project ● Ruby Grant 2017 ● Projects: ○ JRuby port of NMatrix ○ ArrayFire gem ○ RbCUDA
  • 4. Why should Python programmers have all the fun ?
  • 7. GPU Computing is not easy !
  • 9.
  • 10. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  • 11. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  • 12. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  • 13. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  • 14. [2] pry(main)> b = a + a No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 2.0000 6.0000 4.0000 8.0000 => #<ArrayFire::Af_Array:0x000000020625c8>
  • 15. [1] pry(main)> left = ArrayFire::Af_Array.new 2 , [3,3] , [1, 4, 6, 4, 11 , 2 ,-5, 8, 10] No Name Array [3 3 1 1] 1.0000 4.0000 -5.0000 4.0000 11.0000 8.0000 6.0000 2.0000 10.0000 => #<ArrayFire::Af_Array:0x000000014e56c8> [2] pry(main)> right = ArrayFire::Af_Array.new 2 , [3,2] , [1, 0, 8, 10, -11, 8] No Name Array [3 2 1 1] 1.0000 10.0000 0.0000 -11.0000 8.0000 8.0000 => #<ArrayFire::Af_Array:0x00000001591db0>
  • 16. [3] pry(main)> result = ArrayFire::BLAS.matmul(left, right, :AF_MAT_NONE, :AF_MAT_NONE) No Name Array [3 2 1 1] -39.0000 -74.0000 68.0000 -17.0000 86.0000 118.0000 => #<ArrayFire::Af_Array:0x000000016136f8>
  • 17. VALUE arf_init(int argc, VALUE* argv, VALUE self) { afstruct* afarray; Data_Get_Struct(self, afstruct, afarray); dim_t ndims = (dim_t)NUM2LONG(argv[0]); dim_t* dimensions = (dim_t*)malloc(ndims * sizeof(dim_t)); dim_t count = 1; for (size_t index = 0; index < ndims; index++) { dimensions[index] = (dim_t)NUM2LONG(RARRAY_AREF(argv[1], index)); count *= dimensions[index]; } double* host_array = (double*)malloc(count * sizeof(double)); for (size_t index = 0; index < count; index++) { host_array[index] = (double)NUM2DBL(RARRAY_AREF(argv[2], index)); } af_create_array(&afarray->carray, host_array, ndims, dimensions, f64); return self; }
  • 18. static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val, VALUE left_prop_val, VALUE right_prop_val){ afstruct* left; afstruct* right; afstruct* result = ALLOC(afstruct); Data_Get_Struct(left_val, afstruct, left); Data_Get_Struct(right_val, afstruct, right); af_mat_prop left_mat_prop = arf_mat_type_from_rbsymbol(left_prop_val); af_mat_prop right_mat_prop = arf_mat_type_from_rbsymbol(right_prop_val); af_matmul(&result->carray, left->carray, right->carray, left_mat_prop, right_mat_prop); return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result); }
  • 19. BLAS functionalities ● Matmult ● Transpose LAPACK functionalities ● Det ● Inverse ● Norm ● Qr ● Cholesky ● Svd ● lu
  • 21. Benchmarks ● AMD FX 8350 octacore processor ● Nvidia GTX 750Ti GPU ● Double dtype
  • 22.
  • 23. 10 X Faster than NMatrix-Ruby-Lapack
  • 24.
  • 25.
  • 26.
  • 27. 10,000 X Faster than NMatrix-Ruby
  • 28.
  • 29.
  • 30.
  • 31. 100,000 X Faster than NMatrix-Ruby-BLAS
  • 32.
  • 33.
  • 35. GPU Array ● Generic pointer used to handle an array of elements on the GPU. ● Memory copying from CPU to GPU and vice-versa. ● Interfaced with NMatrix and NArray
  • 36. vadd_kernel_src = <<-EOS extern "C" { __global__ void matSum(int *a, int *b, int *c) { int tid = blockIdx.x; if (tid < 100) c[tid] = a[tid] + b[tid]; } } EOS f = compile(vadd_kernel_src) RbCUDA::Driver.run_kernel(f.path)
  • 38. Benchmarks ● AMD FX 8350 octacore processor ● Nvidia GTX 750Ti GPU ● Double dtype
  • 39.
  • 40. 1,000,000 X Faster than NMatrix-Ruby-BLAS
  • 41.
  • 42.
  • 43. Future Work ● Image Processing APIs and Indexers ● Multiple dtypes ● RbCUDA is under active development. ● Project RbCUDA is being funded by Ruby Association (Ruby Grant 2017)
  • 44. Hobby Projects 1. Synapse : A Ruby first Deep Learning Library 2. Ninjaplot : A plotting library
  • 45. ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/rbcuda ● https://github.com/prasunanand/synapse ● https://github.com/prasunanand/ninjaplot Contributions are Welcome!
  • 46. Acknowledgements 1. Pjotr Prins 2. Pradeep Garigipati 3. Kenta Murata