Trie implementation

C++ Problem Overview

Is there any speed- and cache-efficient implementations of trie in C/C++? I know what a trie is, but I don't want reinvent the wheel, implementing it myself.

C++ Solutions

Solution 1 - C++

if you are looking for an ANSI C implementation you can "steal" it from FreeBSD. The file you are looking for is called radix.c. It's used for managing routing data in kernel.

Solution 2 - C++

I realize the question was about ready implementations, but for reference...

Before you jump on Judy you should have read "http://www.nothings.org/computer/judy/">A Performance Comparison of Judy to Hash Tables". Then googling the title will likely give you a lifetime of discussion and rebutals to read.

The one explicitly cache-conscious trie I know of is the http://crpit.com/confpapers/CRPITV62Askitis.pdf">HAT-trie</a>;.

The HAT-trie, when implemented correctly, is very cool. However, for prefix search you need a sorting step on the hash buckets, which somewhat clashes with the idea of a prefix structure.

A somewhat simpler trie is the http://www.cs.rmit.edu.au/~jz/fulltext/acmtois02.pdf">burst-trie</a> which essentially gives you an interpolation between a standard tree of some sort (like a BST) and a trie. I like it conceptually and it's much easier to implement.

Solution 3 - C++

I've had good luck with libTrie. It may not be specifically cache optimized but the performance has always been decent for my applications.

Solution 4 - C++

GCC ships with a handful of data structures as part of its "Policy-based data structures". This includes a few trie implementations.

http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html

Solution 5 - C++

References,

A Double-Array Trie implementation article (includes a C implementation)
TRASH - A dynamic LC-trie and hash data structure -- (a 2006 PDF reference describing a dynamic LC-trie used in the Linux kernel to implement address lookup in the IP routing table

Solution 6 - C++

Judy arrays: Very fast and memory efficient ordered sparse dynamic arrays for bits, integers and strings. Judy arrays are faster and more memory efficient than any binary-search-tree (incl. avl & red-black-trees).

Solution 7 - C++

Cedar, HAT-Trie, and JudyArray is quite awesome, you can find the benchmark here.

benchmark result

Solution 8 - C++

You can also try TommyDS at http://tommyds.sourceforge.net/

See the benchmarks page on the site for a speed comparison with nedtries and judy.

Solution 9 - C++

Cache optimizations are something you'll probably are going to have to do, because you'll have to fit the data into a single cacheline which generally is something like 64 bytes (which will probably work if you start combining data, such as pointers). But it's tricky :-)

Solution 10 - C++

Burst Trie's seem to be a bit more space efficient. I'm not sure how much cache-efficiency you can get out of any index since CPU-caches are so tiny. However, this kind of trie is plenty compact enough to keep large data sets in RAM (where a regular Trie would not).

I wrote a Scala implementation of a burst trie that also incorporates some space-saving techniques that I found in GWT's type-ahead implementation.

https://github.com/nbauernfeind/scala-burst-trie

Solution 11 - C++

I've had very good results (very good balance between performance and size) with marisa-trie compared to several TRIE implementations mentioned with my data set.

https://github.com/s-yata/marisa-trie/tree/master

Solution 12 - C++

Sharing my "fast" implementation for Trie, tested just in basic scenario:

#define ENG_LET 26

class Trie {
    class TrieNode {
    public:
        TrieNode* sons[ENG_LET];
        int strsInBranch;
        bool isEndOfStr;
        
        void print() {
            cout << "----------------" << endl;
            cout << "sons..";
            for(int i=0; i<ENG_LET; ++i) {
                if(sons[i] != NULL)
                    cout << " " << (char)('a'+i);
            }
            cout << endl;
            cout << "strsInBranch = " << strsInBranch << endl;
            cout << "isEndOfStr = " << isEndOfStr << endl;
            for(int i=0; i<ENG_LET; ++i) {
                if(sons[i] != NULL)
                    sons[i]->print();
            }
            
        }
        TrieNode(bool _isEndOfStr = false):isEndOfStr(_isEndOfStr), strsInBranch(0) {
            for(int i=0; i<ENG_LET; ++i) {
                sons[i] = NULL;
            }
        }
        ~TrieNode() {
            for(int i=0; i<ENG_LET; ++i) {
                delete sons[i];
                sons[i] = NULL;
            }
        }
    };
    
    TrieNode* head;
public:
    Trie() { head = new TrieNode();}
    ~Trie() { delete head; }
    void print() {
        cout << "Preorder Print : " << endl;
        head->print();
    }
    bool isExists(const string s) {
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';
            if(curr->sons[letIdx] == NULL) {
                return false;
            }
            curr = curr->sons[letIdx];
        }
            
        return curr->isEndOfStr;
    }
    void addString(const string& s) {
        if(isExists(s))
            return;
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';
            if(curr->sons[letIdx] == NULL) {
                curr->sons[letIdx] = new TrieNode();    
            }
            ++curr->strsInBranch;
            curr = curr->sons[letIdx];
        }
        ++curr->strsInBranch;
        curr->isEndOfStr = true;
    }
    void removeString(const string& s) {
        if(!isExists(s))
            return;
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';
            
            if(curr->sons[letIdx] == NULL) {
                assert(false);
                return; //string not exists, will not reach here
            }
            if(curr->strsInBranch==1) {    //just 1 str that we want remove, remove the whole branch
                delete curr;
                return;
            }
            //more than 1 son
            --curr->strsInBranch;
            curr = curr->sons[letIdx];
        }
        curr->isEndOfStr = false;
    }

    void clear() {
        for(int i=0; i<ENG_LET; ++i) {
            delete head->sons[i];
            head->sons[i] = NULL;
        }
    }
    
};

Content Type	Original Author	Original Content on Stackoverflow
Question	Anton Kazennikov	View Question on Stackoverflow
Solution 1 - C++	SashaN	View Answer on Stackoverflow
Solution 2 - C++	eloj	View Answer on Stackoverflow
Solution 3 - C++	SPWorley	View Answer on Stackoverflow
Solution 4 - C++	tgoodhart	View Answer on Stackoverflow
Solution 5 - C++	nik	View Answer on Stackoverflow
Solution 6 - C++	bill	View Answer on Stackoverflow
Solution 7 - C++	Kokizzu	View Answer on Stackoverflow
Solution 8 - C++	amadvance	View Answer on Stackoverflow
Solution 9 - C++	Jasper Bekkers	View Answer on Stackoverflow
Solution 10 - C++	Nate	View Answer on Stackoverflow
Solution 11 - C++	benjist	View Answer on Stackoverflow
Solution 12 - C++	Shadi Serhan	View Answer on Stackoverflow