20 threads * 100,000 iterations
Ruby 1.9 = 1.54 s.
Ruby Enterprise = 3.01 s.
JRuby 1.1.2 = 5.82 s.
Jython 2.2.1 = 11.86 s.
Python 2.5.2 = 12.32 s.
Ruby 1.8.7 = 22.68
Since our attempt at testing Ruby as a crawler really wasn’t all that much slower than Python it could be really interesting to see what will happen with Ruby 1.9.
The blog post about the test (Its in Polish)
]]>Latest Stats:
35.6 URLs per second
3.073 Million URLs per day!
Whats most promising is that the new fat pipe is still the bottleneck which means that if anybody really wants to party, all we need to do is lay down some greenbacks and a OC-12 will show us mass terabyte pleasure.
]]>You can also hear a ticking sound. That is my new 1TB drive. It makes these weird ticking noises even when its not in use. REally sounds like the arm hitting something its not supposed to hit. Hope its not defective.
]]>* Complex code that is difficult to maintain and difficult to setup on a server
* Memory leakage
* Configurability
So the latest design is just 192 lines of Python in a single file, has a single configuration file, and takes about 5 minutes to setup on a standard Linux machine. I ran it last night and was delighted with the results:
Test Run
Tested 139,740 urls
Completed in 2 hrs, 13 mins
3.6 GB of html
Average filesize: 25.05 KB
Averaging
18.2 urls/second
1.572 million urls/day
Hardware and Environment
3 year old Dell Poweredge SC240
Pentium 4
3.5 GB of RAM
Average CPU load: 0.16
Average physical RAM used: 950 MB
OS: Ubuntu 7.10 (Gutsy Gibbon)
Filesystem: ReiserFS 3
Network connection:
Residential cable modem 5Mbps down (of which 100% is consumed when its running so likely to be faster on a fatter pipe)
Even better this code is infinitely extensible. We’ll spread it across as many machines as necessary to download the entire internet.