データ構造-多数のノードをすばやく挿入するための最適な自己分散BST

original title: "data structures - Best self-balancing BST for quick insertion of a large number of nodes"


Translate

I've been able to find details on several self-balancing BSTs through several sources, but I haven't found any good descriptions detailing which one is best to use in different situations (or if it really doesn't matter).

I want a BST that is optimal for storing in excess of ten million nodes. The order of insertion of the nodes is basically random, and I will never need to delete nodes, so insertion time is the only thing that would need to be optimized.

I intend to use it to store previously visited game states in a puzzle game, so that I can quickly check if a previous configuration has already been encountered.



いくつかのソースからいくつかの自己分散BSTの詳細を見つけることができましたが、さまざまな状況で使用するのに最適なものを詳細に説明する適切な説明は見つかりませんでした(または...

これは翻訳後の要約です。完全な翻訳を表示する必要がある場合は、「翻訳」アイコンをクリックしてください。


すべての答え
  • Translate

    Red-black is better than AVL for insertion-heavy applications. If you foresee relatively uniform look-up, then Red-black is the way to go. If you foresee a relatively unbalanced look-up where more recently viewed elements are more likely to be viewed again, you want to use splay trees.


  • Translate

    Why use a BST at all? From your description a dictionary will work just as well, if not better.

    The only reason for using a BST would be if you wanted to list out the contents of the container in key order. It certainly doesn't sound like you want to do that, in which case go for the hash table. O(1) insertion and search, no worries about deletion, what could be better?


  • Anastasia Lee
    Translate

    The two self-balancing BSTs I'm most familiar with are red-black and AVL, so I can't say for certain if any other solutions are better, but as I recall, red-black has faster insertion and slower retrieval compared to AVL.

    So if insertion is a higher priority than retrieval, red-black may be a better solution.


  • Translate

    [hash tables have] O(1) insertion and search

    I think this is wrong.

    First of all, if you limit the keyspace to be finite, you could store the elements in an array and do an O(1) linear scan. Or you could shufflesort the array and then do a linear scan in O(1) expected time. When stuff is finite, stuff is easily O(1).

    So let's say your hash table will store any arbitrary bit string; it doesn't much matter, as long as there's an infinite set of keys, each of which are finite. Then you have to read all the bits of any query and insertion input, else I insert y0 in an empty hash and query on y1, where y0 and y1 differ at a single bit position which you don't look at.

    But let's say the key lengths are not a parameter. If your insertion and search take O(1), in particular hashing takes O(1) time, which means that you only look at a finite amount of output from the hash function (from which there's likely to be only a finite output, granted).

    This means that with finitely many buckets, there must be an infinite set of strings which all have the same hash value. Suppose I insert a lot, i.e. ω(1), of those, and start querying. This means that your hash table has to fall back on some other O(1) insertion/search mechanism to answer my queries. Which one, and why not just use that directly?