Next: Content Sensors Up: No Title Previous: Introduction

Signal Sensors

The most basic signal sensor is a simple consensus sequence or an expression that describes a consensus sequence along with allowable variations, such as a PROSITE expression [66,2]. More sensitive sensors can be designed using weight matrices in place of the consensus, in which each position in the pattern allows a match to any residue, but different costs are associated with matching each residue in each position [64,67,66,3,12]. The score returned by a weight matrix sensor for a candidate site is the sum of the costs of the individual residue matches over that site. If this score exceeds a given threshold, the candidate site is predicted to be a true site. Such sensors have a natural probabilistic interpretation in which the score returned is a log likelihood ratio under a simple statistical model in which each position in the site is characterized by an independent and distinct distribution over possible residues. A mathematically equivalent interpretation of the score is that it is the discrimination energy for site recognition [3].

Weight matrices can also be viewed as a simple type of neural network, sometimes called a perceptron [67,66]. Many investigators have also applied more complex neural networks, such as multi-layer feed-forward networks and time delay networks, to various DNA signal recognition problems [8,19,49,53,54,46,32]. Multi-layer nets have the ability to capture statistical dependency between the residues at different positions in a site, an ability that perceptrons (and hence weight matrices) lack. Time delay neural networks also allow insertions and deletions while evaluating a match to a prospective site, whereas weight matrices and feed-forward neural networks do not [56]. Other statistical/pattern models besides neural networks, such as nonhomogeneous Markov models (a weight matrix where the distribution at position i depends on the residue at position i-1, sometimes called ``WAM" models), decision trees, quadratic discriminant functions, and graphical models, have also been used as biosequence signal sensors [37,76,63,15,58,1]. In general, the penalty for these more sophisticated models is that much more training data is needed to estimate the many parameters that they contain, so they are unsuitable in cases where relatively few verified examples are known of the site to be modeled.


Next: Content Sensors Up: No Title Previous: Introduction
David Haussler
10/14/1998