welcome to xlongwei.com

欢迎大家一起学习、交流、分享


QQ:9167702333 邮箱:admin@xlongwei.com

KMP字符串查找算法


分类 Algorithm   关键字 分享   标签 java   algorithm   发布 hongwei  1427987973813
注意 转载须保留原文链接,译文链接,作者译者等信息。  
KMP算法需要先计算pattern的失败回退数组
public static int[] getKmpFail(String pattern) {
	if (!hasLength(pattern)) throw new IllegalArgumentException("null or empty pattern is not allowed to get kmp fail array.");
	
	int i, j, len = pattern.length();
	int[] fail = new int[len];
	fail[0] = -1;
	for (j = 1; j < len; j++) {
		i = fail[j - 1];
		while ((pattern.charAt(j) != pattern.charAt(i + 1)) && (i >= 0)) {
			i = fail[i];
		}
		if (pattern.charAt(j) == pattern.charAt(i + 1)) {
			fail[j] = i + 1;
		} else {
			fail[j] = -1;
		}
	}
	return fail;
}
然后查找目标字符串出现的位置,int[] fail=getKmpFail(String pattern)
public static int kmpIndexOf(String source, String pattern, int from, int[] fail, int to) {
	int i = from, j = 0, lenp = pattern.length();
	while ((i < to) && (j < lenp)) {
		if (source.charAt(i) == pattern.charAt(j)) {
			i++;
			j++;
		} else {
			if (j == 0) {
				i++;
			} else {
				j = fail[j - 1] + 1;
			}
		}
	}
	return (j == lenp) ? (i - lenp) : -1;
}
扩展一下,查找多个目标字符串,返回最先找到的目标(曾用于查找回复内容中的表情文本:-)等)
public static int[] kmpFirstMatch(String source, String[] patterns, int from, int[][] fails) {
	int[] indices = { -1, -1 }; //[0] is index and [1] is length
	int to = source.length();
	for (int i = 0; i < patterns.length; i++) {
		if(to - from < patterns[i].length()) continue;
		int index = kmpIndexOf(source, patterns[i], from, fails[i], to);
		if (index != -1 && (index < indices[0] || indices[0] == -1)) {
			indices[0] = to = index;
			indices[1] = patterns[i].length();
			to++;
		}
	}
	return indices;
}
stepFirst是另一个高效的查找多个目标字符串的算法,char[]=string.toCharArray(),
public static int[] stepFirstMatch(char[] sources, int from, char[][] chars) {
	int[] indices = { -1, -1 };//[0] is index and [1] is length
	for (int i = from; i < sources.length; i++) {
		for (int j = 0; j < chars.length; j++) {
			if (sources[i] == chars[j][0]) {
				int k = 1;
				while (k < chars[j].length && i + k < sources.length && sources[i + k] == chars[j][k])
					k++;
				if (k == chars[j].length) {
					indices[0] = i;
					indices[1] = k;
					return indices;
				}
			}
		}
	}
	return indices;
}
查找目标字符串出现次数
public static int kmpCountMatch(String source, String pattern) {
	int count = 0, from = 0, to = source.length();
	int[] fail = getKmpFail(pattern);
	while ((from = kmpIndexOf(source, pattern, from, fail, to)) != -1) {
		from += pattern.length();
		count++;
	}
	return count;
}