Talk @ RubyConfIndia 2012. Ruby is a pure object oriented and really a beautiful language to learn and practice.
But most of us do not bother to know or care about what happens behind the scene when we write some ruby code. Say creating a simple Array, Hash, class, module or any object. How does this map internally to C code ?
Ruby interpreter is implemented in C and I will talk about the Interpreter API that we as ruby developers
should be aware of. The main purpose of the presentation is to understand the efforts and complexity behind
the simplicity offered. I would also like to touch upon the difference in implementation of some core data structures
in different ruby versions. Having known a part of C language implementation behind Ruby, I would also like to throw some light upon when and why would we need to write some ruby extensions in C.
20. RObject, RBasic and RClass
struct RObject { struct RClass {
struct RBasic basic; struct RBasic basic;
union { rb_classext_t *ptr;
struct { struct st_table *m_tbl;
long numiv; struct st_table *iv_index_tbl;
VALUE *ivptr; };
struct st_table *iv_index_tbl;
} heap;
} as;
};
struct RBasic {
VALUE flags;
VALUE klass;
};
Ruby Conf India 2012
21. Instance specific behavior
my_obj = Object.new
def my_obj.hello
p “hello”
end
my_obj.hello
#=> hello
Object.new.hello
# NoMethodError: # undefined method `hello' for #<Object:0x5418467>
Ruby Conf India 2012
22. Conceptual sketch
Object
my_obj
klass *m_tbl
Object
*m_tbl
‘my_obj
my_obj *super
klass *m_tbl
-hello
Ruby Conf India 2012
23. #class.c
VALUE
make_singleton_class(VALUE obj)
{
VALUE orig_class = RBASIC(obj)->klass;
VALUE klass = rb_class_boot(orig_class);
FL_SET(klass, FL_SINGLETON);
RBASIC(obj)->klass = klass;
return klass;
}
Ruby Conf India 2012
24. Am I Immediate Object or Pointer ?
VALUE
Ruby Conf India 2012
25. typedef unsigned long VALUE
C type for referring to arbitrary ruby objects
Stores immediate values of :-
Fixnum
Symbols
True
False
Nil
Undef
Bit test :
If the LSB = 1, it is a Fixnum.
If the VALUE is equal to 0,2,4, or 6 it is a special constant:
false, true, nil, or undef.
If the lower 8 bits are equal to '0xe', it is a Symbol.
Otherwise, it is an Object Reference
Ruby Conf India 2012
26. RString
#1.8.7 # 1.9.3
struct RString { #define RSTRING_EMBED_LEN_MAX ((int)
struct RBasic basic; ((sizeof(VALUE)*3)/sizeof(char)-1))
long len; struct RString {
char *ptr; struct RBasic basic;
union { union {
long capa; struct {
VALUE shared; long len;
} aux; char *ptr;
}; union {
long capa;
VALUE shared;
} aux;
} heap;
char ary[RSTRING_EMBED_LEN_MAX + 1];
} as;
};
Ruby Conf India 2012
31. Shared Strings
str = "This is a very very very very very long string"
str2 = String.new(str)
#str2 = str.dup
Heap
RString
char *ptr
str2
long len = 46
VALUE shared
“This is a very very very
very very long string”
RString
str char *ptr
long len = 46
Ruby Conf India 2012
33. Copy on Write
str = "This is a very very very very very long string"
str2 = str.dup
str2.upcase!
Heap
RString
str char *ptr
“This is a very very very very very
long string”
long len = 46
RString
“THIS IS A VERY VERY VERY
str2 char *ptr VERY VERY LONG STRING”
long len = 46
Ruby Conf India 2012
35. Embedded Strings
str = "This is a very very very very very long string"
str2 = str[0..3]
#str2 = “This”
Heap
RString
str char *ptr
“This is a very very very very very
long string”
long len = 46
Rstring
str2 long len = 4
char ary[] = “This”
Ruby Conf India 2012
37. Shared Strings with slice
str = "This is a very very very very very long string"
str2 = str[1..-1]
#str2 = str[22..-1]
# 0 <= start_offset < 46-23
RString
Heap
str char *ptr
long len = 46
VALUE shared
T h i . . i n g
RString
str2 char *ptr
long len = 45
Ruby Conf India 2012
39. String.new(“learning”)
Creating a string 23 characters or less is fastest
Creating a substring running to the end of the target string is also fast
When sharing same string data, memory and execution time is saved
Creating any other long substring or string, 24 or more bytes, is slower.
Ruby Conf India 2012
41. RHash 1.8.7
st_table_entries
key1 value key3 value x
st_table
key2 value x
num_entries = 4
num_bins = 5
**bins
key4 value x
hash buckets - slots
Ruby Conf India 2012
42. RHash 1.9.3
st_table_entries
1x key1 value key2 value 3
4x 2
st_table
3 key3 value 4
2 3
num_entries = 4
num_bins = 5
**bins
*head
*tail
4 key4 value 1x
3 4x
hash buckets - slots
Ruby Conf India 2012
44. C Extensions – why and when ?
Performance
Using C libraries from ruby applications
Using ruby gems with native C extensions
e.g. mysql, nokogiri, eventmachine, RedCloth, Rmagick, libxml-ruby, etc
Since ruby interpreter is implemented in C, its API can be used
Ruby Conf India 2012
45. My fellow ist
Patrick Shaughnessy
Ruby Conf India 2012
47. Thank you all for being patient
and hearing me out !
Hope this helps you !
Ruby Conf India 2012
Editor's Notes
Around 300 C files Around 100 .h header files
Some Objects are fully specified by a VALUE, eliminating the need to create an actual object in Object Space. This saves a lot of processing cycles and does not functionally compromise the Object Model. These object types are: VALUE as an Immediate Object As we said above, immediate values are not pointers: Fixnum, Symbol, true, false, and nil are stored directly in VALUE. Fixnum values are stored as 31-bit numbers[Or 63-bit on wider CPU architectures.] that are formed by shifting the original number left 1 bit and then setting the least significant bit (bit 0) to ``1.'' When VALUE is used as a pointer to a specific Ruby structure, it is guaranteed always to have an LSB of zero; the other immediate values also have LSBs of zero. Thus, a simple bit test can tell you whether or not you have a Fixnum. There are several useful conversion macros for numbers as well as other standard datatypes shown in Table 17.1 on page 174. The other immediate values (true, false, and nil) are represented in C as the constants Qtrue, Qfalse, and Qnil, respectively. You can test VALUE variables against these constants directly, or use the conversion macros (which perform the proper casting).
You save memory since there’s only one copy of the string data, not two, and: You save execution time since there’s no need to call malloc a second time to allocate more memory from the heap.
When sharing same string data, memory is saved since there’s only one copy of the string data. When sharing same string data, execution time is saved since there’s no need to call malloc a 2 nd time to allocate more memory from the heap.