
Contents
-
-
-
-
-
-
-
-
4.1 Pattern Discovery in Protein Sequences 4.1 Pattern Discovery in Protein Sequences
-
4.1.1 Preliminaries 4.1.1 Preliminaries
-
584.1.2 Discovery Algorithm 584.1.2 Discovery Algorithm
-
Subphase A of Phase 1 Subphase A of Phase 1
-
Subphase B of Phase 1 Subphase B of Phase 1
-
Phase 2 Phase 2
-
-
4.1.3 Optimization Heuristics 4.1.3 Optimization Heuristics
-
614.1.4 Experiments and Results 614.1.4 Experiments and Results
-
-
4.2 Classification of DNA Sequences 4.2 Classification of DNA Sequences
-
674.2.1 Pattern-Based Classifier 674.2.1 Pattern-Based Classifier
-
4.2.2 Fingerprint-Based Classifier 4.2.2 Fingerprint-Based Classifier
-
Building Fingerprint Files Building Fingerprint Files
-
Algorithm for Scoring Algorithm for Scoring
-
-
4.2.3 Experiments and Results 4.2.3 Experiments and Results
-
-
4.3 Generalizations and Future Work 4.3 Generalizations and Future Work
-
Further Information Further Information
-
-
-
-
-
-
-
-
55Chapter 4 Pattern Discovery and Classification in Biosequences
Get access-
Published:December 1999
Cite
Abstract
With the significant growth of the amount of biosequence data, it becomes increasingly important to develop new techniques for finding “knowledge” from the data. Pattern discovery is a fundamental operation in such applications. It attempts to find patterns in biosequences that can help scientists to analyze the property of a sequence or predict the function of a new entity. The discovered patterns may also help to classify an unknown sequence, that is, assign the sequence to an existing family. In this chapter, we show how to discover active patterns in a set of protein sequences and classify an unlabeled DNA sequence. We use protein sequences as an example to illustrate our discovery algorithm, though the algorithm applies to sequences of any sort, including both protein and DNA. The patterns we wish to discover within a set of sequences are regular expressions of the form *X1 * X2 * ... . The X1,X2,... are segments of a sequence, that is, subsequences made up of consecutive letters, and * represents a variable length don’t care (VLDC). In matching the expression *X1 * X2 * ... with a sequence S, the VLDCs may substitute for zero or more letters in S at zero cost. The dissimilarity measure used in comparing two sequences is the edit distance, that is, the minimum cost of edit operations used to transform one subsequence to the other after an optimal and zero-cost substitution for the VLDCs, where the edit operations include insertion, deletion, and change of one letter to another (Wagner and Fischer, 1974; K. Zhang et al., 1994). That is, we find a one-to-one mapping from each VLDC to a subsequence of the data sequence and from each pattern subsequence to a subsequence of the data sequence such that the following two conditions are satisfied, (i) The mapping preserves the left-to-right ordering (if a VLDC at position i in the pattern maps to a subsequence starting at position i1 and ending at position i2 in the data sequence, and a VLDC at position j in the pattern maps to a subsequence starting at position j1 and ending at position j2 in the data sequence, and i < j, then i2 < j2).
Sign in
Personal account
- Sign in with email/username & password
- Get email alerts
- Save searches
- Purchase content
- Activate your purchase/trial code
- Add your ORCID iD
Purchase
Our books are available by subscription or purchase to libraries and institutions.
Purchasing informationMonth: | Total Views: |
---|---|
October 2022 | 2 |
March 2024 | 1 |
June 2024 | 1 |
January 2025 | 2 |
Get help with access
Institutional access
Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:
IP based access
Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.
Sign in through your institution
Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.
If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.
Sign in with a library card
Enter your library card number to sign in. If you cannot sign in, please contact your librarian.
Society Members
Society member access to a journal is achieved in one of the following ways:
Sign in through society site
Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:
If you do not have a society account or have forgotten your username or password, please contact your society.
Sign in using a personal account
Some societies use Oxford Academic personal accounts to provide access to their members. See below.
Personal account
A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.
Some societies use Oxford Academic personal accounts to provide access to their members.
Viewing your signed in accounts
Click the account icon in the top right to:
Signed in but can't access content
Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.
Institutional account management
For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.