Ruby process & ActiveRecord data set executing in multi cores

You know what! in one of our (tekSymmetry LLC) projects, we have so many background calculations,
which usually takes so many hours to get fully completed. ever since we have introduced those processes,
we were having problem with it’s execution time. sometimes it get’s in nerve

as you know a single ruby process can use a single processor’s core at a time.
this is probable one of the reasons why muli processes based deployment
strategy is picked by ruby on rails community.

anyway, these days our servers got more than one core! more precisely,
in our case each of our production server got 8 cores based intel xeon processor.

so you see the question rose if we could run those long running expensive process in multicores
our system could have better chance to get faster!.

well this blog post is intended for showing you the technique how we have done it in ruby on rails.

for better understanding, let me give you some hints so you can get the context -

  • we have big database table rows!
  • processing a single row doesn’t require anything from the same database table.
  • we are using linux (in our case debian lenny)

so here is the way we have done it -

  1. we took the max rows count for the main query
  2. and divided by the number of cores we have
  3. then we forked child process with each subset of the rows
  4. and executed the logic and related stuffs!
  5. on the parent process we initiated a loop where it was checking the newly forked process status
  6. if all the pid files (which are generated by the newly forked children) are removed,
    parent process will flag it as successful execution thus it will end the loop.

so you see, it is damn! simple :) _) and it is working for us :) _),
it has improved our execution time 8x faster, because of getting 8 cores in new server.

here is the code in ruby how we did it. (we created a helper “multicore_execution_helper.rb“  and included in model, thus execute_in_multicores became usable)

1    module MulticoreExecutionHelper
2    
3      def execute_in_multicores(
4          p_cores, p_total_rows, p_model, p_conditions = {}, &block)
5    
6        p_cores == 2 if p_cores.to_i == 0
7        total_items_per_core = p_total_rows / p_cores
8        logger.info "[BATCH-PROCESS-LOG] Total processes - #{p_cores}, " +
9                    "total rows - #{p_total_rows} [#{total_items_per_core} / 1 core]"
10   
11       # Create job id for each process
12       job_ids = p_cores.times.collect{|i| rand.to_s }
13   
14       # Fork process for each core and execute the block
15       p_cores.times do |offset|
16         Process.fork do
17           logger.info "[BATCH-PROCESS-LOG] Starting thread - #{offset} " +
18                       "assigned # #{job_ids[offset]}"
19   
20           # Keep job track through the created process pid file.
21           pid_file = File.join(RAILS_ROOT, 'tmp/pids/', "#{job_ids[offset]}.pid")
22           File.open(pid_file, 'w') {|f| f.puts Process.pid.to_s}
23   
24           # Since fork process is created from the sample of the parent
25           # process's memory so we need to reconnect all live connections.
26           begin
27             ActiveRecord::Base.connection.reconnect!
28   
29             # Retrieve data from the specific row through the defined
30             # offset and limit
31             teams = p_model.find(
32                 :all, {
33                     :o ffset => (offset * total_items_per_core),
34                     :limit => total_items_per_core}.merge(p_conditions))
35   
36             block.call(teams)
37           rescue => $e
38             logger.error "[BATCH-PROCESS-LOG] Exception raised during " +
39                          "execution - #{$e.inspect}"
40           end
41   
42           # Remove pid since we are done here!
43           FileUtils.rm(pid_file)
44         end
45       end
46   
47       # monitor whether the process is completed or still in progress
48       # don't return this method unless all the forked processes have
49       # completed their job
50       sleep(2)
51   
52       while 1 do
53         fully_completed = true
54         for job_id in job_ids
55           pid_file = File.join(RAILS_ROOT, 'tmp/pids/', "#{job_id}.pid")
56           if fully_completed && File.exists?(pid_file)
57             fully_completed = false
58             break
59           end
60         end
61   
62         break if fully_completed
63         sleep(2)
64         logger.debug '[BATCH-PROCESS-LOG] again...'
65       end
66     end
67   
68   end
69   

here is the usages code -

143        execute_in_multicores(p_total_cores, SomeStuff.count, SomeStuff) do |some_stuffs|
144          # Do.. whatever you wanna do with the stuff here! these are gonna run on multicores!
151        end

see it is really simple! :) _) if you like it let me know! how much you like it :) _)
here you can find the code on github 

best wishes!

Ruby on Rails demo application presentation is picked by slideshare’s editor

Today morning i was informed by an email that my slide on slideshare is picked by their editor to keep it on their featured slides list.

it was really too great things for me. i congrats those slideshare’s guys!

you can check out slide here -

here is the moment i locked it up on screen shot!

my tweets

 

January 2010
S S M T W T F
« Dec   Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Flickr Photos

@kamalapur over bridge

@kamalapur station

cox's bazaar trip oct 09

cox's bazaar trip oct 09

cast ur vote!

More Photos
Follow

Get every new post delivered to your Inbox.