几种Perl下读取文件的效率测试 (猪圈)

Perl下读取文件有多种方式，但一直没有测试过不同的方式读取的效率如何，因为一些原因，需要重写一个配置模块，正好测试下不同的方式读取的效率。

除了Perl本身的open外，另外测试非阻塞式的AnyEvent::IO和强力文件吃进模块File::Slurp.

源码和测试结果如下：

#!/usr/bin/perl -w
use strict;
use 5.010;
use AnyEvent::IO qw(:DEFAULT :flags);
use File::Slurp qw(read_file);
use Benchmark qw(:all) ;
my $file = 'test.txt';
my $r = timethese ( -5, {
    a => sub{ aio_open $file, O_RDONLY, 0, sub {
        my ($fh) = @_
          or return AE::log error => "Can't Open file : $!";

        # now stat the file to get the size
        aio_stat $fh, sub {
            @_
              or return AE::log error => "Can't Open file : $!";
            my $size = -s _;

            aio_read $fh, $size, sub {
                my ($data) = @_
                  or return AE::log error =>
                  "Can't Open file : $!";

                $size == length $data
                  or return AE::log error => "short read, file changed?";

            };
        };
    }},
    b => sub{read_file( $file);},
    c => sub{    # Slurp in the file
    local $/ = undef;
    open( CFG, $file ) or return 0;
    my $contents = <CFG>;
    close( CFG );
        },
    d => sub{open (FILES, $file); while (<FILES>){}; close(FILES); },
    e => sub{open (FILES, $file); my @reads=<FILES>; close(FILES); },
    f => sub{aio_load $file, sub {
      my ($hosts) = @_ or return AE::log error => "$!";
   };}
});

cmpthese $r;

测试的结果如下:

Benchmark: running a, b, c, d, e, f for at least 5 CPU seconds...
         a: 6 wallclock secs ( 2.79 usr + 2.75 sys = 5.54 CPU) @ 5274.47/s (n=29210)
         b: 6 wallclock secs ( 1.37 usr + 3.63 sys = 5.01 CPU) @ 4693.03/s (n=23498)
         c: 5 wallclock secs ( 1.54 usr + 3.71 sys = 5.26 CPU) @ 8347.54/s (n=43883)
         d: 6 wallclock secs ( 3.23 usr + 2.17 sys = 5.40 CPU) @ 4483.79/s (n=24199)
         e: 6 wallclock secs ( 3.74 usr + 1.45 sys = 5.20 CPU) @ 4099.52/s (n=21297)
         f: 6 wallclock secs ( 2.04 usr + 3.35 sys = 5.40 CPU) @ 6791.37/s (n=36653)
    Rate    e    d    b    a    f    c
e 4100/s   -- -9% -13% -22% -40% -51%
d 4484/s   9%   -- -4% -15% -34% -46%
b 4693/s 14%   5%   -- -11% -31% -44%
a 5274/s 29% 18% 12%   -- -22% -37%
f 6791/s 66% 51% 45% 29%   -- -19%
c 8348/s 104% 86% 78% 58% 23%   --

测试结果可以发现通过替换换行符把文件直接作为单行读取的效率最高。

其次则是AnyEvent::IO,AnyEvent::IO这里使用的其实是AnyEvent::IO::Perl而不是使用了EV的AnyEvent::IO::Perl，也许直接用EV的会更好一些，另外f的测试就是直接读取了，虽然效率还是比不上c的方式，但明显可以看出快了不少。作为推荐模块的File::Slurp的效率和AnyEvent::IO差不多，不过如果使用utf8的读取方式则会更慢一些。

剩下的d和e则是我们经常会使用的方法。

总体来说，如果不考虑数据的安全和冲突的话，替换换行符的方式无疑是最快捷和方便的，如果你需要性能和安全的结合的话AnyEvent::IO也许是你一个不错的选择。

追加:

后来在我自己的笔记本上测试，结果却发生了很大的变化：

Benchmark: running a, b, c, d, e, f for at least 5 CPU seconds...
         a: 6 wallclock secs ( 2.42 usr + 2.86 sys = 5.27 CPU) @ 12946.90/s (n=68269)
         b: 5 wallclock secs ( 2.06 usr + 3.34 sys = 5.40 CPU) @ 7890.70/s (n=42594)
         c: 5 wallclock secs ( 2.01 usr + 3.23 sys = 5.24 CPU) @ 14962.42/s (n=78433)
         d: 5 wallclock secs ( 1.98 usr + 3.04 sys = 5.02 CPU) @ 15614.77/s (n=78433)
         e: 6 wallclock secs ( 2.29 usr + 2.93 sys = 5.23 CPU) @ 14105.45/s (n=73701)
         f: 6 wallclock secs ( 1.89 usr + 3.31 sys = 5.19 CPU) @ 14387.10/s (n=74741)
     Rate    b    a    e    f    c    d
b 7891/s   -- -39% -44% -45% -47% -49%
a 12947/s 64%   -- -8% -10% -13% -17%
e 14105/s 79%   9%   -- -2% -6% -10%
f 14387/s 82% 11%   2%   -- -4% -8%
c 14962/s 90% 16%   6%   4%   -- -4%
d 15615/s 98% 21% 11%   9%   4%   --

系统配置 i7 2.4G , perl 5.16

猪圈

别问我多久打扫一次，我懒得回答你

几种Perl下读取文件的效率测试

Tags:

引用通告

发表一个评论

搜索

关于