HardRating 1891

1032. Stream of Characters

arraystringdesigntriedata-stream

解題說明

Stream of Characters

題目描述：設計 StreamChecker，以字典 words 初始化，支援 query(letter) 操作：將字母 letter 加入字元串流末端後，判斷最近若干個字元（即串流的某個後綴）是否與字典中的某個單詞完全吻合。

解題思路：直接做需要對每個新字元掃描所有字典詞，時間複雜度極高。關鍵洞察：「某個串流後綴匹配字典詞」等價於「將所有字典詞逆序後，串流逆序形成的串匹配某個逆序詞的前綴直到其終止」。

逆序 Trie + 活躍指針集合：

將所有 words 逆序後插入 Trie（例如 "abc" 存為 "cba"）。
維護一個「活躍指針集合」active，儲存目前在 Trie 中還可繼續匹配的節點集合。
每次 query(letter) 時：
- 嘗試從 Trie 根節點以 letter 開始新的匹配（根的 letter 子節點）。
- 對 active 中的所有現有指針，嘗試往 letter 子節點推進；若子節點不存在則移除該指針。
- 將根節點的 letter 子節點（若存在）也加入 active。
- 若 active 中任一指針對應的節點標記為終止節點（isEnd），則回傳 true，否則回傳 false。

此設計讓每個 query 的時間複雜度為 O(|active| × 26) 均攤。由於字典詞最長為 L，active 中的指針數量上限為 O(L)（每個長度至多一條活躍鏈），整體每次查詢均攤 O(L)。

C++ 解法

複雜度分析

時間複雜度：建構 O(N × L)（N 個詞，每個長度 L，逆序插入 Trie）；query O(|active|) ≤ O(L_max)，其中 L_max 為字典最長詞長度。由於每個活躍指針對應不同深度的路徑，活躍指針數量上限為 O(L_max)。

空間複雜度：O(N × L × 26) — Trie 節點空間；O(L_max) — 活躍指針集合。串流本身不需儲存（滑動窗口效果由 active 集合隱含）。

虛擬碼

Constructor(words):
1. root = new TrieNode()
2. For each word in words:
   curr = root
   For i = len(word)-1 down to 0:  // 逆序插入
     Create child if missing
     Advance curr
   Mark curr.isEnd = true
3. active = []                      // 初始活躍集合為空

query(letter):
1. nextActive = []
2. If root has child letter:
   Push root.children[letter] to nextActive
   If that node is isEnd: found = true
3. For each node in active:
   If node has child letter:
     Push node.children[letter] to nextActive
     If that node is isEnd: found = true
   (nodes without child letter are dropped)
4. active = nextActive
5. Return found

其他解法

Aho-Corasick 自動機：為字典建立 Aho-Corasick 多模式匹配自動機（在 Trie 上增加失配鏈接 failure links），維護單一當前狀態指針。每次 query 以 O(1) 轉移狀態（不需要維護活躍集合）。時間 O(N × L) 建構，O(1) per query，但實現複雜度高，需要 BFS 建立 failure links 和 output links。適合字典詞非常多、查詢頻率極高的場景。

暴力滑動窗口 + HashSet：維護一個長度為 L_max 的循環緩衝區儲存最近字元。每次 query 後將所有可能的後綴（長度 1 到 L_max）提取出來查 HashSet。時間 O(L_max²) per query（每個後綴長度 O(L_max) 用於雜湊），空間 O(N × L)。對短字典詞效果尚可，但無 Trie 的結構共享優勢。

逆序 Trie + 串流緩衝區：維護最近 L_max 個字元的緩衝區，每次 query 後將緩衝區逆序後在逆序 Trie 中做前綴搜尋。時間 O(L_max) per query（掃描逆序 Trie 至多 L_max 深），空間 O(L_max) 緩衝區。此方法等價於本解法但表達形式不同。

延伸思考

若需支援動態新增字典詞（addWord 操作），活躍指針集合方案如何調整？
若字典有 10⁵ 個詞、每詞長達 10⁴，活躍指針集合方案的記憶體用量如何估算？此時 Aho-Corasick 是否更優？
若 query 需要回傳所有匹配的字典詞（而非僅判斷是否存在），如何修改 isEnd 的設計以儲存匹配詞資訊？

解題說明

Stream of Characters

逆序 Trie + 活躍指針集合：

將所有 words 逆序後插入 Trie（例如 "abc" 存為 "cba"）。
維護一個「活躍指針集合」active，儲存目前在 Trie 中還可繼續匹配的節點集合。
每次 query(letter) 時：
- 嘗試從 Trie 根節點以 letter 開始新的匹配（根的 letter 子節點）。
- 對 active 中的所有現有指針，嘗試往 letter 子節點推進；若子節點不存在則移除該指針。
- 將根節點的 letter 子節點（若存在）也加入 active。
- 若 active 中任一指針對應的節點標記為終止節點（isEnd），則回傳 true，否則回傳 false。

複雜度分析

空間複雜度：O(N × L × 26) — Trie 節點空間；O(L_max) — 活躍指針集合。串流本身不需儲存（滑動窗口效果由 active 集合隱含）。

虛擬碼

Constructor(words): 1. root = new TrieNode() 2. For each word in words: curr = root For i = len(word)-1 down to 0: // 逆序插入 Create child if missing Advance curr Mark curr.isEnd = true 3. active = [] // 初始活躍集合為空 query(letter): 1. nextActive = [] 2. If root has child letter: Push root.children[letter] to nextActive If that node is isEnd: found = true 3. For each node in active: If node has child letter: Push node.children[letter] to nextActive If that node is isEnd: found = true (nodes without child letter are dropped) 4. active = nextActive 5. Return found

其他解法

延伸思考

若需支援動態新增字典詞（addWord 操作），活躍指針集合方案如何調整？
若字典有 10⁵ 個詞、每詞長達 10⁴，活躍指針集合方案的記憶體用量如何估算？此時 Aho-Corasick 是否更優？
若 query 需要回傳所有匹配的字典詞（而非僅判斷是否存在），如何修改 isEnd 的設計以儲存匹配詞資訊？