Beginner's guide

Basics

Welcome to Enhort, an analyzing tool for genomic positions. Enhort accepts .bed-files, where in each line a genomic position is specified (e.g. chr7 1000 1001). You can upload your .bed-file on the Index-Page or using the Guided Analysis. Enhort then tests your genomic positions against a set of annotations. Using the Guided Analysis you can select annotations, based on your assembly number and cellline, see the section about the Guided Analysis usage below on this page. The quickstart upload uses a curated list of annotations, containing genes and gene related features to test your sites against.

After submitting your data, Enhort tests your positions and an equally sized random set of positions against each annotation. On the results page a list of significant annotations is shown, ordered by their effect size. Each entry corresponds to an annotation track and is clickable to show detaild information about integration counts and a description of the data. The colored bars present a visual guide to compare the integration of your sites in blue and the background sites in grey.

In the basic mode the sites are adjusted to fall inside the contigs and outside of blacklisted regions as specified by Encode. As an advanced usage the user can further alter the creation of the background sites. For example you already know your sites prefer genes, you can then select "known genes" as covariate using the checkbox on the right of the entry at the results page. After you click the "Run again"-button on the left the background model is recreated to express the same integration counts for genes as your sites. The genes annotation then appears on the right table. Multiple annotations can be selected as covariates to create a specified background model for your sites.

If you are interested in seeing some results use the sample data link. A more comprehensive tutorial is available here.

Guided Analysis

The Guided Analysis guides you through the steps to analyze your position data. On the first page upload your .bed-file. On the second page a selection of annotations can be made, based on your used cell line and the questions you have for the data. For example your positions are sequenced from a hESC cell line, all specific annotations for this cell line can be selected by clicking of the cell line name on the left of the table. Additionally you can select cell line independent annotations such as genes or CpG islands by clicking the blue labels in the "Unknown" row and any other annotation you wish to observe. Subsequently the selected annotations are used to compare a random background model against your uploaded positions, as explained above.

FAQ

How is the Fold Change calculated


If both inside counts or both outside counts are 0 the value is set to 0.

Subsequently a pseudocount is added. The maximum of the maximum inside divided by the minimum inside and maximum outside divided by maximum outside is calculated.

If a scored track is set as covariant, the P-value is not always exact 1


In contrast to basic inside/outside tracks the background model creation for scored tracks works with an expectancy value and not exact counts. This leads to partly deviant results. However, the p-value is above the 5% significant level as expected for most cases.

Why is the p-value exact 0.0 sometimes?


The p-value is calculated using the chi square test from the Apache Commons API as double values. Values smaller than the maximal precision is set to 0.

Hotspots bar


The plot above the results table shows hotspots from integration across the genome with colors. The hotspots were calculated with a sliding window across the genome where the count of integration sites inside the window was countd

Colors in the plot repesent the integration frequency, blue bars show the region with the highest integration count while white bars represent no or very few integration sites.

High contrast shows there are hotspots, even distributed colors show the absence of hotspots.

How to download background positions?


After running an analysis click on 'Backgrounds' or go to the 'Data export' tab to download the data.

Which algorithm creates the control integration sites?


The application uses the Mersenne Twister Algorithm.

Where does the name/ logo come from


The name is an alteration of Eihort. A monster created by the author H.P. Lovecraft.

Eihort (The Pale Beast) is a huge, pale, gelatinous, oval-shaped monstrosity covered in myriads of eyes and supported by thousands of bony, fleshless legs.

Whenever Eihort encounters a human, it makes a bargain with them. If the human declines, Eihort kills the human. If the human accepts, Eihort implants a undeveloped brood into their body. When the brood hatches it will kill the human host. According to the Revelations of Glaaki, after the fall of humanity, Eihort's brood will be born into light and replace humanity.

From http://lovecraft.wikia.com/wiki/Eihort

The base image is taken from here.

Credit goes to Draguunthor.

Cite


There is no published paper yet.

However, there was a poster presentation at GCB 2016. A .pdf version of the poster can be found here: GCB 2016 Poster