Chinaunix首页 | 论坛 | 博客
  • 博客访问: 208319
  • 博文数量: 35
  • 博客积分: 2691
  • 博客等级: 少校
  • 技术积分: 527
  • 用 户 组: 普通用户
  • 注册时间: 2008-01-11 09:42
文章分类

全部博文(35)

文章存档

2012年(5)

2010年(6)

2009年(2)

2008年(22)

我的朋友

分类: C/C++

2008-03-11 14:29:21

 

/* Extending Shift-And
 * 2008/3/11
 * By Yao Jianming
 */

#ifndef _EXTENDING_SHIFT_AND_H
#define _EXTENDING_SHIFT_AND_H

#include <string>
#include <vector>

struct element {
    int ch;
    int an;
    int bn;
};

class strmatch {
private:
    std::string str;
    int mask[256], mask_di, mask_df;
    int max_len;
    std::vector<element> vec;

    void makemasktable();

public:
    int count;

    strmatch(const char* p) : str(p),mask_di(0),mask_df(0),count(0),max_len(0){makemasktable();}

    void search(const void* haystack, size_t haystack_len);
};

#endif

#include "strmatch.h"

//pattern like: a-b-c-x(1,3)-d-e,

//could match like: abcfde, abcffde, abcfffde

void strmatch::makemasktable()
{
    char* patt = strdup(str.c_str());
    char* p = strtok(patt, "-");
    do {
        struct element el;
        if (*p == 'x' && *(p+1) == '(') {
            el.ch = 0;
            el.an = *(p+2) - '0';
            el.bn = *(p+4) - '0';
            max_len += el.bn - el.an + 1;
        } else {
            el.ch = *p;
            max_len += 1;
        }
        vec.push_back(el);
    }while((p = strtok(NULL, "-")) != NULL);

    for(int i=0; i<256; ++i)
        mask[i] = 0;
    int s = 1, i = 0;
    while(i < vec.size()) {
        if (vec[i].ch == 0) {
            mask_di |= s>>1;
            mask_df |= s<<(vec[i].bn - vec[i].an);
            for(int j=0; j<256; ++j) {
                for(int k=0; k<vec[i].bn; ++k) {
                    mask[j] |= s<<k;
                }
            }
            s <<= vec[i].bn;
        } else {
            mask[vec[i].ch] |= s;
            s <<= 1;
        }
        ++i;
    }
}

void strmatch::search(const void* haystack, size_t haystack_len)
{
    const char* ptr = (const char*)haystack;
    const char* ptr_end = ptr + haystack_len;

    int d = 0;
    while(ptr < ptr_end) {
        d = (((d<<1) | 0x1) & (mask[*ptr]));
        d |= ((mask_df - (d & mask_di)) & ~mask_df);
        if(d == (0x1<<(max_len-1))) ++count;
        ptr += 1;
    }
}

#include <stdio.h>
#include <sys/time.h>
#include "strmatch.h"

int main()
{
    struct timeval start, end;
    gettimeofday(&start, 0);

    strmatch str = "a-b-c-x(1,3)-d-e";
    char text[] = "abcabcfffdee";
    str.search(text, strlen(text));
    printf("size:%d\n", str.count);

    gettimeofday(&end, 0);
    long ltime = (end.tv_sec - start.tv_sec) * 1000000 +
                    end.tv_usec - start.tv_usec;
    printf("ltime: %ld\n", ltime);

    return 0;
}

 
Extending Shift-And

We augment the representation of Shift-And by adding the -transitions. We call "gap-initial" states those states i from which an -transition leaves. For each gap-initial state i corresponding to a gap x(a, b), we define its "gap-final" state to be (i + ba + 1), that is, the one following the last state reached by an -transition leaving i. In , we have one gap-initial state (3) and one gap-final state (6).

We create a bit mask I that has 1 in the gap-initial states and another mask F that has 1 in the gap-final states. In , the corresponding I and F masks are 00000100 and 00100000, respectively. After performing the normal Shift-And step, we simulate all the -moves with the operation

Image from book

The rationale is as follows. D & I isolates the active gap-initial states. Subtracting this from F has two possible outcomes for each gap-initial state i. First, if i is active, the result will have 1 in all the states from i to (i + ba), successfully propagating the active state i to the desired target states. Second, if i is inactive, the outcome will have 1 only in state (i + ba +1). This undesired 1 is removed by operating on the result with "& ~ F". Once the propagation has been done, we OR the result with the already active states in D. Note that the propagations of different gaps do not interfere with one another, since all the subtractions have a local effect.

shows the complete algorithm. For simplicity we assume that there are no gaps at the beginning or at the end of the pattern and that consecutive gaps have been merged into one. The preprocessing takes O(m∣Σ∣) time, while the scanning needs O(n) time. If max > w, however, we need several machine words for the simulation, and it then takes O(n[max/w) time.

Image from book
Gaps-Shift-And (p = p1p2 ...pm, T = t1t2 ...tn)
1. Preprocessing
2.     L  maximum length of an occurance
3.     For c  Σ Do B[c]  0L
4.     I  0L, F  0L
5.     i  0
6.     For j  1 ... m Do
7.          Ifpj is of the form x(a,b) Then
8.               I  I | 0L -i10i-1
9.               F  F | 0L-(i+b-a)-110i+b-a
10.              For c  Σ Do B[c]  B[c] | 0L-i-b1b0i
11.              i  i+b
12.         Else /* pj is a class of characters */
13.              For c  pj Do B[c]  B[c] | 0L-i-110i
14.              i  i + 1
15.         End of if
16.       End of for
17.   Searching
18.       D  0L
19.       For pos  1 ... n Do
20.              D  ((D << 1) | (0L-11) & B[tpos]
21.              D  D | ((F - (D & I) & ~ F)
22.              If D & 10L-1  0L Then report an occurance ending at pos
23.       End of for
Image from book

Figure 4.3: The extension of Shift-And to handle PROSITE expressions.

Example of Gaps-Shift-And We search for the pattern abcx(1, 3) – de in the text "abcabcffdee".

  1. Image from book

    We now apply the propagation formula on D. The result is ((F – (D & I)) & ~ F) = ((00100000 – 00000000) & 11011111) = 00000000, and hence D does not change. We do not mention again the propagation formula unless it has an effect on D.

  2. Image from book

  3. Image from book

    At this point the -transitions take effect: ((F – (D & I)) & ~ F) yields ((00100000 – 00000100) & 11011111) = 00011100, where states 4 and 5 have been activated. The new D value is D = 0 0 0 1 1 1 0 0.

  4. Image from book

  5. Image from book

  6. Image from book

    The propagation formula takes effect again and produces D = 0 0 1 1 1 1 0 0.

  7. Image from book

  8. Image from book

  9. Image from book

  10. Image from book

    The last bit of D is set, so we mark an occurrence. The gap has matched the text "ff".

  11. Image from book

阅读(1366) | 评论(3) | 转发(0) |
给主人留下些什么吧!~~

yjm05732008-06-04 10:23:41

先把Shift-And算法实现吧,Extending Shift-And只不过是扩展一下了

yjm05732008-06-04 10:21:26

下载本书 这本书比较详细 google一下就能找到了

chinaunix网友2008-05-14 22:04:03

老大,能不能详细解释下算法原理哦~