4.5. Smoothing raw tag count data across the genome¶
In this example, we generate smoothed density signal by binning the genome into 20 bp intervals and counting the number of non-paired-end tag reads falling within 75 bp of each interval. A simple follow-on script marks up results to wig or bigWig format for loading into a track of a local UCSC Genome Browser.
4.5.1. BEDOPS tools in use¶
For this script, we use bam2bed to convert a BAM file to BED, then we use bedmap to run a sliding density window over input genomic regions. Finally starch compresses the results.
4.5.2. Script¶
#!/bin/tcsh -ef
# author : Richard Sandstrom
if ( $#argv != 5 ) then
printf "Wrong number of arguments\n"
printf "<bam-file> <out-file> <window-size> <step-size> <chromosome-file>\n"
printf " where <chromosome-file> contains whole chromosome BED items for the\n"
printf " genome, e.g., sort-bed formatted output from the UCSC hg19.chromInfo table.\n"
exit -1
endif
# BAM file
set inBam = $argv[1]
# resulting density file
set outDensity = $argv[2]
# +/- window for counting read 5' ends
set window = $argv[3]
# step size across genome
set binI = $argv[4]
# chromosome file for organism of interest
set chromsfile = $argv[5]
set outDir = $outDensity:h
mkdir -p $outDir
set tmpDir = /tmp/`whoami`/scratch/$$
if ( -d $tmpDir ) then
rm -rf $tmdDir
endif
mkdir -p $tmpDir
# clip tags to single 5' end base
bam2bed < $inBam \
| awk '{if($6=="+"){s=$2; e=$2+1}else{s=$3-1; e=$3}print $1"\t"s"\t"e}' \
| sort-bed --max-mem 2G - \
>! $tmpDir/tags.bed
# create genome-wide bins and count how many tags fall within range of each
awk -v binI=$binI -v win=$window \
'{ \
i = 0; \
for(i = $2; i <= $3-binI; i += binI) { print $1"\t"i"\t"i + binI } \
# end of chrome may include a bin of size < binI \
if ( i < $3 ) { print $1"\t"i"\t"$3; } \
}' $chromsfile \
| bedmap --faster --range $window --echo --count --delim "\t" - $tmpDir/tags.bed \
| starch - \
>! $outDensity
rm -rf $tmpDir
exit 0