分类: C/C++
2008-03-03 10:44:25
The use of bit-parallelism to search a set of strings P = {p1,…,pr} is efficient for sets such that r × ℓmin fits in a few computer words [NR00]. For simplicity we assume below that r × ℓmin ≤ w.
To perform longer shifts, we keep only the prefixes of size ℓmin of the patterns. If we match a prefix, we directly verify the entire string against the text.
The prefixes are packed together as in and the search is similar to BNDM (), with the search performed for all the prefixes at the same time. The only difference is that we need to clear some bits after a shift. The mask CL in does that. It prevents the bits used to search for pi from interfering with those used for pi+1. The variable last is still used, but in this case it represents the position where a prefix of one of the strings begins. Pseudo-code of the whole algorithm is shown in .
Multiple BNDM (p = p1p2 … pm, T = t1t2 … tn) 1. Preprocessing 2. For c ∊ σ Do B[c] ← 0|P| 3. ℓ ← 0 4. For k ∊ 1 … r Do 5. ℓ ← ℓ + ℓmin 6. For j ∊ 1 … ℓmin Do B[pkj) ← B[pkj] | 0|p|-ℓ+j-1 10ℓ-j 7. End of for 8. CL ← (1ℓmin - 1 0)r 9. DF ← (10ℓmin - 1)r 10. Searching 11. pos ← 0 12. While pos ≤ n - m Do 13. j ← ℓmin, last ← ℓmin 14. D = 1|P| 15. While D ≠ 0|P| Do 16. D ← D & B[tpos + j] 17. j ← j - 1 18. If D & DF ≠ 0|P| Then 19. If j > 0 Then last ← j 20. Else /* at least one prefix matches */ 21. Check which prefixes of length ℓmin match pi needs to be checked if D & 0|P| - ℓmin × i 10ℓmin - 1 0ℓmin × (i - 1) ≠ 0|P| 22. Verify the corresponding string(s) against the test 23. Report the occurrence(s) at pos + 1 24. End of if 25. End of if 26. D ← (D << 1) & CL /* Shifting and cleaning */ 27. End of while 28. pos ← pos + last 29. End of while
Example using English We search the text "CPM_annual_conference_announce" for the set of strings P = {announce, annual, annually}.
nual_conference_announce
last ← 6.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j > 0, then last ← 4
CPM_ _conference_announce
last ← 6.
D ← 111111111111111
D & DF ≠ 0∣P∣ and j = 0, so we check the patterns "annual" and "annually" against the text and mark the occurrence of "annual".
CPM_annual rence_announce
last ← 6.
D ← 111111111111111111
CPM_annual_confeannounce
last ← 6.
D ← 111111111111111111
CPM_annual_conference_ ce
last ← 6.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j = 0, so we check the string "announce" against the text and mark its occurrence.
The next shift of the search window gives pos > n – ℓmin and the search stops.
Example using DNA We search for the set of strings P = {ATATATA, TATAT, ACGATAT} in the text AGATACGATATATAC.
DF = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 CL = 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0
CGATATATAC
last ← 5.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j > 0, then last ← 4
D & DF ≠ 0∣P∣ and j > 0, then last ← 3
D & DF ≠ 0∣P∣ and j > 0, then last ← 2
AG ATATATAC
last ← 5.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j > 0, then last ← 2
AGAT ATATAC
last ← 5.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j > 0, then last ← 4
D & DF ≠ 0∣P∣ and j > 0, then last ← 3
D & DF ≠ 0∣P∣ and j = 0, so we check the pattern ACGATAT against the text and mark its occurrence.
AGATACG TAC
last ← 5.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j > 0, then last ← 4
D & DF ≠ 0∣P∣ and j > 0, then last ← 3
D & DF ≠ 0∣P∣ and j > 0, then last ← 2
D & DF ≠ 0∣P∣ and j > 0, then last ← 1
D & DF ≠ 0∣P∣ and j = 0, so we check the string ATATATA against the text and mark its occurrence.
AGATACGA AC
last ← 5.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j > 0, then last ← 4
D & DF ≠ 0∣P∣ and j > 0, then last ← 3
D & DF ≠ 0∣P∣ and j > 0, then last ← 2
D & DF ≠ 0∣P∣ and j > 0, then last ← 1
D & DF ≠ 0∣P∣ and j = 0, so we check the string TATAT against the text and mark its occurrence.
AGATACGAT C
last ← 5.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j > 0, then last ← 4
D & DF ≠ 0∣P∣ and j > 0, then last ← 3
D & DF ≠ 0∣P∣ and j > 0, then last ← 2
D & DF ≠ 0∣P∣ and j > 0, then last ← 1
D & DF ≠ 0∣P∣ and j = 0, so we check the string ATATATA against the text, but we fail to recognize an occurrence.
AGATACGATA last ← 5.
D ← 111111111111111111
D & DF ≠ 0∣P∣ and j > 0, then last ← 3