Trie implementation
C++CData StructuresTrieC++ Problem Overview
Is there any speed and cacheefficient implementations of trie in C/C++? I know what a trie is, but I don't want reinvent the wheel, implementing it myself.
C++ Solutions
Solution 1  C++
if you are looking for an ANSI C implementation you can "steal" it from FreeBSD. The file you are looking for is called radix.c. It's used for managing routing data in kernel.
Solution 2  C++
I realize the question was about ready implementations, but for reference...
Before you jump on Judy you should have read "http://www.nothings.org/computer/judy/">A Performance Comparison of Judy to Hash Tables". Then googling the title will likely give you a lifetime of discussion and rebutals to read.
The one explicitly cacheconscious trie I know of is the http://crpit.com/confpapers/CRPITV62Askitis.pdf">HATtrie</a>;.
The HATtrie, when implemented correctly, is very cool. However, for prefix search you need a sorting step on the hash buckets, which somewhat clashes with the idea of a prefix structure.
A somewhat simpler trie is the http://www.cs.rmit.edu.au/~jz/fulltext/acmtois02.pdf">bursttrie</a> which essentially gives you an interpolation between a standard tree of some sort (like a BST) and a trie. I like it conceptually and it's much easier to implement.
Solution 3  C++
I've had good luck with libTrie. It may not be specifically cache optimized but the performance has always been decent for my applications.
Solution 4  C++
GCC ships with a handful of data structures as part of its "Policybased data structures". This includes a few trie implementations.
http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html
Solution 5  C++
References,

A DoubleArray Trie implementation article (includes a
C
implementation) 
TRASH  A dynamic LCtrie and hash data structure  (a 2006 PDF reference describing a dynamic LCtrie used in the Linux kernel to implement address lookup in the IP routing table
Solution 6  C++
Judy arrays: Very fast and memory efficient ordered sparse dynamic arrays for bits, integers and strings. Judy arrays are faster and more memory efficient than any binarysearchtree (incl. avl & redblacktrees).
Solution 7  C++
Solution 8  C++
You can also try TommyDS at http://tommyds.sourceforge.net/
See the benchmarks page on the site for a speed comparison with nedtries and judy.
Solution 9  C++
Cache optimizations are something you'll probably are going to have to do, because you'll have to fit the data into a single cacheline which generally is something like 64 bytes (which will probably work if you start combining data, such as pointers). But it's tricky :)
Solution 10  C++
Burst Trie's seem to be a bit more space efficient. I'm not sure how much cacheefficiency you can get out of any index since CPUcaches are so tiny. However, this kind of trie is plenty compact enough to keep large data sets in RAM (where a regular Trie would not).
I wrote a Scala implementation of a burst trie that also incorporates some spacesaving techniques that I found in GWT's typeahead implementation.
Solution 11  C++
I've had very good results (very good balance between performance and size) with marisatrie compared to several TRIE implementations mentioned with my data set.
Solution 12  C++
Sharing my "fast" implementation for Trie, tested just in basic scenario:
#define ENG_LET 26
class Trie {
class TrieNode {
public:
TrieNode* sons[ENG_LET];
int strsInBranch;
bool isEndOfStr;
void print() {
cout << "" << endl;
cout << "sons..";
for(int i=0; i<ENG_LET; ++i) {
if(sons[i] != NULL)
cout << " " << (char)('a'+i);
}
cout << endl;
cout << "strsInBranch = " << strsInBranch << endl;
cout << "isEndOfStr = " << isEndOfStr << endl;
for(int i=0; i<ENG_LET; ++i) {
if(sons[i] != NULL)
sons[i]>print();
}
}
TrieNode(bool _isEndOfStr = false):isEndOfStr(_isEndOfStr), strsInBranch(0) {
for(int i=0; i<ENG_LET; ++i) {
sons[i] = NULL;
}
}
~TrieNode() {
for(int i=0; i<ENG_LET; ++i) {
delete sons[i];
sons[i] = NULL;
}
}
};
TrieNode* head;
public:
Trie() { head = new TrieNode();}
~Trie() { delete head; }
void print() {
cout << "Preorder Print : " << endl;
head>print();
}
bool isExists(const string s) {
TrieNode* curr = head;
for(int i=0; i<s.size(); ++i) {
int letIdx = s[i]'a';
if(curr>sons[letIdx] == NULL) {
return false;
}
curr = curr>sons[letIdx];
}
return curr>isEndOfStr;
}
void addString(const string& s) {
if(isExists(s))
return;
TrieNode* curr = head;
for(int i=0; i<s.size(); ++i) {
int letIdx = s[i]'a';
if(curr>sons[letIdx] == NULL) {
curr>sons[letIdx] = new TrieNode();
}
++curr>strsInBranch;
curr = curr>sons[letIdx];
}
++curr>strsInBranch;
curr>isEndOfStr = true;
}
void removeString(const string& s) {
if(!isExists(s))
return;
TrieNode* curr = head;
for(int i=0; i<s.size(); ++i) {
int letIdx = s[i]'a';
if(curr>sons[letIdx] == NULL) {
assert(false);
return; //string not exists, will not reach here
}
if(curr>strsInBranch==1) { //just 1 str that we want remove, remove the whole branch
delete curr;
return;
}
//more than 1 son
curr>strsInBranch;
curr = curr>sons[letIdx];
}
curr>isEndOfStr = false;
}
void clear() {
for(int i=0; i<ENG_LET; ++i) {
delete head>sons[i];
head>sons[i] = NULL;
}
}
};