4.4. Finding the subset of SNPs within DHSes¶
In this example, we would like to identify the set of SNPs that are within a DHS, printing out both the SNP element and the DHS it is contained within.
4.4.1. BEDOPS tools in use¶
We use bedmap to answer this question, as it traverses a reference BED file (in this example, SNPs), and identifies overlapping elements from the mapping BED file (in this example, DHSs).
4.4.2. Script¶
SNPs are in a BED-formatted file called SNPs.bed
sorted lexicographically with sort-bed. The DNase-hypersensitive sites are stored in a sorted BED-formatted file called DHSs.bed
. These two files are available in the Downloads section.
bedmap --skip-unmapped --echo --echo-map SNPs.bed DHSs.bed \
> subsetOfSNPsWithinAssociatedDHS.bed
4.4.3. Discussion¶
The output of this bedmap statement might look something like this:
chr1 10799576 10799577 rs12.4.418 Systolic_blood_pressure Cardiovascular|chr1 10799460 10799610 MCV-1 9.18063
The output is delimited by pipe symbols (|
), showing the reference element (SNP) and the mapped element (DHS).
If multiple elements are mapped onto a single reference element, the mapped elements are further separated by semicolons, by default.
4.4.4. Downloads¶
SNP
elementsDNase-hypersensitive
elements
The bedmap tool can operate directly on Starch-formatted archives. Alternatively, use the unstarch tool to decompress Starch data files to sorted BED format.