Snakemake 性能分析
1. Prerequisites
Section titled “1. Prerequisites”Install Snakemake
Section titled “Install Snakemake”Snakemake is primarily used on Linux, but also supports macOS and Windows (via WSL).
Linux / macOS
Section titled “Linux / macOS”# Using pippip install snakemake
# Using conda (recommended for reproducibility)conda install -c conda-forge snakemakeWindows
Section titled “Windows”Snakemake is not natively supported on Windows. Use WSL (Windows Subsystem for Linux):
-
Install WSL:
Terminal window wsl --install -
Install Snakemake inside WSL:
Terminal window pip install snakemake# orconda install -c conda-forge snakemake
Install Perl (for benchmark script)
Section titled “Install Perl (for benchmark script)”The benchmark summary script requires Perl, which is pre-installed on most Unix systems.
# Debian / Ubuntusudo apt install perl
# CentOS / RHELsudo yum install perlPerl is pre-installed on macOS. If needed, you can install via Homebrew:
brew install perlWindows (WSL)
Section titled “Windows (WSL)”Perl is included in most WSL distributions. If not:
sudo apt install perl2. Usage
Section titled “2. Usage”This script finds files matching *.benchmark and summarizes their metrics:
- Total Execution Time
- Total I/O (Input/Output)
- Maximum Memory Consumption (Max RSS)
Example
Section titled “Example”# Summarize benchmarks in a specific directory❯ workflow/scripts/sum_benchmark.pl benchmark/01.QCFinding file(s) named "*.benchmark" under benchmark/01.QC and summarizing...Total Execution Time: 02 hours 38 minutes 11 secondsTotal I/O In: 182.58G, Total I/O Out: 43.28G, Total I/O: 225.86GMax_rss: 2.09G
# Summarize benchmarks in the current directory (default)❯ workflow/scripts/sum_benchmark.plFinding file(s) named "*.benchmark" under current workdir and summarizing...Total Execution Time: 10 hours 04 minutes 44 secondsTotal I/O In: 315.35G, Total I/O Out: 178.44G, Total I/O: 493.80GMax_rss: 21.00G3. Source Code
Section titled “3. Source Code”Save the following code as sum_benchmark.pl and make it executable (chmod +x sum_benchmark.pl).
#!/usr/bin/perluse strict;use warnings;
my $path=$ARGV[0];$path||=".";
if (!$ARGV[0]) { print "Finding file(s) named \"*.benchmark\" under current workdir and summarizing...\n"; }else { print "Finding file(s) named \"*.benchmark\" under $path and summarizing...\n"; }
my ($total_sec,$io_in,$io_out,$max_rss)=(0,0,0,0);my $files=`find $path -name "*.benchmark"`;chomp($files);my @lines=split(/\n/,$files);foreach my $file (@lines) { open(IN,$file); while (<IN>) { if ($_ !~ /^s/) { my @arr=split/\t/; if ($arr[0] =~ /[0-9]/) { $total_sec+=$arr[0]; } if ($arr[6] =~ /[0-9]/) { $io_in+=$arr[6]; } if ($arr[7] =~ /[0-9]/) { $io_out+=$arr[7]; } if ($arr[2] =~ /[0-9]/) { if ($arr[2] > $max_rss) { $max_rss=$arr[2]; } } } } close IN;}
my $total_io=$io_in+$io_out;$io_in=&unit($io_in);$io_out=&unit($io_out);$total_io=&unit($total_io);$max_rss=&unit($max_rss);
if ($total_sec) { printf ("Total Execution Time: %02d hours %02d minutes %02d seconds\n",(gmtime($total_sec))[2,1,0]); print "Total I/O In: $io_in, Total I/O Out: $io_out, Total I/O: $total_io\n"; print "Max_rss: $max_rss\n";}else { print "Maybe there was no \"*.benchmark\" file?\n"; }
sub unit { my $input=shift; my $value; if($input >= 1024000 ) { $value=sprintf("%.2f",($input/1024000) )."T"; } elsif ($input >= 1024 ) { $value=sprintf("%.2f",($input/1024) )."G"; } else { $value=sprintf("%.2f",$input)."M"; } return($value);}