[LeetCode] Repeated DNA Sequences

187. Repeated DNA Sequences

The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.

  • For example, “ACGAATTCCG” is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class Solution {
void process(char ch, int& hash) {
hash = (hash<<2) & ((1<<20) - 1);
switch(ch) {
case 'A' : hash += 0; break;
case 'C' : hash += 1; break;
case 'G' : hash += 2; break;
case 'T' : hash += 3; break;
}
}
public:
vector<string> findRepeatedDnaSequences(string s) {
if(s.length() <= 10) return {};
unordered_set<int> us;
unordered_set<string> res;
int hash = 0;
for(int i = 0; i < 10; i++) {
process(s[i], hash);
}
us.insert(hash);
for(int i = 10; i < s.length(); i++) {
process(s[i],hash);
if(!us.insert(hash).second) {
res.insert(s.substr(i - 9, 10));
}
}
return vector<string>(res.begin(), res.end());
}
};
Author: Song Hayoung
Link: https://songhayoung.github.io/2022/02/07/PS/LeetCode/repeated-dna-sequences/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.