LeetCode 解题报告 (30)-- 双指针找拼接子字符串

发表于 2016-04-04 标签 python ，双指针

原题如下：
>You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

>For example, given:
s: "barfoothefoobarman"
words: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

题目不难理解，实现也不难，关键是时间复杂度的控制。下面会讲述两种方法，第一种方法时间复杂度 \(O((n-k)*k)\), 其中 n 为 s 的长度，k 为所有的 words 拼接起来后的长度，这种方法提交时在 TLE 和 AC 间徘徊（即偶尔 AC，偶尔 TLE）。第二种方法时间复杂度为 O (n)，提交的时候能够 AC。

方法一

思路

设 wordLen 为给出的 words 列表中每个词的长度，wordNum 为给出的 words 列表的长度，n 为给出的 s 的长度。

那么可以从头开始遍历 s 直到 n-wordLen*wordNum+1，每次遍历用一个字典存储遍历中遇到的属于 words 的词语，并检查两者对应的词语的数量关系。遇到不属于 words 的词语或者字典中的某一词语的数量大于 words 中该词语的数量则跳出执行下一次的遍历

这种方法的时间复杂度 \(O((n-k)*k)\)，n 为 s 的长度。提交时偶尔 TLE，偶尔 AC，AC 时显示的时间是 876ms。
### 实现代码：

class Solution(object):  
    def findSubstring(self, s, words):  
        """  
        :type s: str  
        :type words: List[str]  
        :rtype: List[int]  
        """  
        # 注意words中的词可能会重复  
        wordDict = {}  
        for word in words:  
            if word in wordDict:  
                wordDict[word] += 1  
            else:  
                wordDict[word] = 1

        wordLen = len(words[0])  
        result = []

        for i in range(len(s)-wordLen*wordNum+1):  
            tmp = s[i:i+wordLen]  
            tmpDict = {} # 存储words的临时list  
            if tmp in words:  
                head = i  
                while tmp in words and i < len(s)-wordLen+1:  
                    if tmp in tmpDict:  
                        tmpDict[tmp] += 1  
                    else:  
                        tmpDict[tmp] = 1

                    if tmpDict[tmp] > wordDict[tmp]:  
                        tmpDict[tmp] -= 1  
                        break

                    i += wordLen  
                    tmp = s[i:i+wordLen]  
                if tmpDict == wordDict:  
                    result.append(head)  
            else:  
                continue  
        return result

方法二

思路

这种方法参考了这里, 主要思想是利用了双指针围着当前符合 words（包括数量和种类）中的连续子字符串，同时计算围着的子字符串中 word 的数量。

巧妙的地方在于双指针当前包围的子字符串中如果含有某个 word 的数量大于 words 中这个 word 的数量时，移动左指针直到把这个 word 的数量减少。

这种方法的时间复杂度为 O (n), 提交时 AC 的时间约 100ms

实现代码

class Solution(object):  
    def findSubstring(self, s, words):  
        """  
        :type s: str  
        :type words: List[str]  
        :rtype: List[int]  
        """  
        # 注意words中的词可能会重复  
        wordDict = {}  
        for word in words:  
            if word in wordDict:  
                wordDict[word] += 1  
            else:  
                wordDict[word] = 1

        wordLen = len(words[0])  
        wordNum = len(words)  
        wordSet = set(words)  
        sLen = len(s)  
        result = []

        for i in range(wordLen):  
            left = i  
            tmpDict = {}  
            count = 0  
            j=i  
            while j < sLen-wordLen+1:  
                tmp = s[j:j+wordLen]  
                j += wordLen  
                if tmp in wordSet:  
                    if tmp in tmpDict:  
                        tmpDict[tmp]+=1  
                    else:  
                        tmpDict[tmp]=1

                    if tmpDict[tmp]<=wordDict[tmp]:  
                        count+=1  
                    else:  
                        # 某个词的数量比wordDict中规定的要多了，往右移动左指针  
                        while tmpDict[tmp] > wordDict[tmp]:  
                            t = s[left:left+wordLen]  
                            left += wordLen  
                            tmpDict[t] -= 1  
                            if tmpDict[t] < wordDict[t]:  
                                count -= 1  
                    # 仅去掉最左边的一个word，再往右找  
                    if count == wordNum:  
                        result.append(left)  
                        t = s[left:left+wordLen]  
                        tmpDict[t] -= 1  
                        left += wordLen  
                        count -= 1  
                # 当前的词不在wordDict  
                else:  
                    tmpDict = {}  
                    count = 0  
                    left = j  
        return result