Theoretical Discovery at MIT May Boost Data Storage
Published: November 24, 2021
A team of researchers has made a breakthrough that could result in better retrieval in computers and data storage.
The team’s discoveries concern “linear-probing hash tables,” which are among the simplest and fastest data structures. In a table of this kind, the positions in which one could store information lie along a linear array.
Here’s an example. Suppose a database has to record the Social Security numbers of 10,000 people. Researchers will then take your SSN, x, and estimate the hash function of x, h(x). That will give them a random number between 1 and 10,000.
After that, they take that number, h(x), and go to that position in the array. The next step is inserting x, the SSN, in that spot. Then, if the spot is occupied, you go on to the next available position.
To retrieve that SSN, x, you go to the designated spot, h(x). If it isn’t there, you continue moving forward until you locate x or come to an open position and determine that x isn’t in your database.
William Kuszmaul and his colleagues, Bradley Kuszmaul and Michael Bender, discovered that linear-probing hash tables could run at high storage capacities without sacrificing speed. That is true for applications where the number of deletions and insertions is relatively constant, and the volume of added data is almost equivalent to the amount removed.
Furthermore, the team has invented a new approach — “graveyard hashing.” It includes the artificial increase of the number of tombstones in an array until they occupy roughly 50% of the available spots. The said tombstones then set aside areas for forthcoming insertions.
This strategy can achieve optimal efficiency in linear-probing hash tables. Now that the researchers have made significant progress from a theoretical standpoint, they’re starting to examine the practical side of things.