分类: C/C++
2008-03-11 14:29:21
|
We augment the representation of Shift-And by adding the ∊-transitions. We call "gap-initial" states those states i from which an ∊-transition leaves. For each gap-initial state i corresponding to a gap x(a, b), we define its "gap-final" state to be (i + b – a + 1), that is, the one following the last state reached by an ∊-transition leaving i. In , we have one gap-initial state (3) and one gap-final state (6).
We create a bit mask I that has 1 in the gap-initial states and another mask F that has 1 in the gap-final states. In , the corresponding I and F masks are 00000100 and 00100000, respectively. After performing the normal Shift-And step, we simulate all the ∊-moves with the operation
The rationale is as follows. D & I isolates the active gap-initial states. Subtracting this from F has two possible outcomes for each gap-initial state i. First, if i is active, the result will have 1 in all the states from i to (i + b – a), successfully propagating the active state i to the desired target states. Second, if i is inactive, the outcome will have 1 only in state (i + b – a +1). This undesired 1 is removed by operating on the result with "& ~ F". Once the propagation has been done, we OR the result with the already active states in D. Note that the propagations of different gaps do not interfere with one another, since all the subtractions have a local effect.
shows the complete algorithm. For simplicity we assume that there are no gaps at the beginning or at the end of the pattern and that consecutive gaps have been merged into one. The preprocessing takes O(m∣Σ∣) time, while the scanning needs O(n) time. If ℓmax > w, however, we need several machine words for the simulation, and it then takes O(n[⌈max/w⌉) time.
Gaps-Shift-And (p = p1p2 ...pm, T = t1t2 ...tn) 1. Preprocessing 2. L ← maximum length of an occurance 3. For c ∊ Σ Do B[c] ← 0L 4. I ← 0L, F ← 0L 5. i ← 0 6. For j ∊ 1 ... m Do 7. Ifpj is of the form x(a,b) Then 8. I ← I | 0L -i10i-1 9. F ← F | 0L-(i+b-a)-110i+b-a 10. For c ∊ Σ Do B[c] ← B[c] | 0L-i-b1b0i 11. i ← i+b 12. Else /* pj is a class of characters */ 13. For c ∊ pj Do B[c] ← B[c] | 0L-i-110i 14. i ← i + 1 15. End of if 16. End of for 17. Searching 18. D ← 0L 19. For pos ∊ 1 ... n Do 20. D ← ((D << 1) | (0L-11) & B[tpos] 21. D ← D | ((F - (D & I) & ~ F) 22. If D & 10L-1 ≢ 0L Then report an occurance ending at pos 23. End of for
Example of Gaps-Shift-And We search for the pattern a–b–c–x(1, 3) – d – e in the text "abcabcffdee".
We now apply the propagation formula on D. The result is ((F – (D & I)) & ~ F) = ((00100000 – 00000000) & 11011111) = 00000000, and hence D does not change. We do not mention again the propagation formula unless it has an effect on D.
At this point the ∊-transitions take effect: ((F – (D & I)) & ~ F) yields ((00100000 – 00000100) & 11011111) = 00011100, where states 4 and 5 have been activated. The new D value is D = 0 0 0 1 1 1 0 0.
The propagation formula takes effect again and produces D = 0 0 1 1 1 1 0 0.
The last bit of D is set, so we mark an occurrence. The gap has matched the text "ff".