I heard at a data structures seminar that we can break a key into groups of digits and then do the addition of groups. This ensures that all the digits contribute the hash code. The number of digits in a group correspond to the size of the array.
For example, I have a machine number say 424-124-9675
, how do I make the hash function using the Folding technique?
Best Answer
There are 2 types of folding methods used Fold shift
and Fold boundary
.
Fold Shift
You divide the key in parts whose size matches the size of required address. The parts are simply added to get the required address.
Key:123456789 and size of required address is 3 digits.
123+456+789 = 1368. To reduce the size to 3, either 1 or 8 is removed and accordingly the key would be 368 or 136 respectively.
Fold Boundary
You again divide the key in parts whose size matches the size of required address.But now you also applying folding, except for the middle part,if its there.
Key:123456789 and size of required address is 3 digits
321 (folding applied)+456+987 (folding applied) = 1764(discard 1 or 4)
Given 424-124-9675, you decide where you want to break it into groups. For example:
every 3 digits from left: hash = 424 + 124 + 967 + 5
every 3 digits from right: hash = 675 + 249 + 241 + 4
where the dashes are: hash = 424 + 124 + 9675
It's a terribly weak way to hash though - very collision prone.
Following the answers from Tony and Sumeet, I did some more research on digit folding and decided to implement the technique explained by Robert Lafore in his Data Structures book.
For example, suppose you want to hash 10-digit machine numbers. If the array size is 1,000
, you would divide the 10-digit number into three groups of three digits and one group of one-digit. In out example in question machine number is 424-124-9675
, so you would calculate a key value of 424 + 124 + 967 + 5 = 1520
. You can use the %
operator to trim such sums so the highest index is 999
. In this case, 1520 % 1000 = 520
.
If the array size is 100
, you would need to break the 10-digit key into five two-digit numbers:42 + 41 + 24 + 96 + 75 = 278
, and 278 % 100 = 78
.
It’s easier to imagine how this works when the array size is a multiple of 10.However, for best results it should be a prime number.
Here's the Java code of digit folding technique I implemented:
public class DigitFolder {public static void main(String[] args) {int hashVal = hashFunc(424124967);System.out.println(hashVal);}public static int hashFunc(int key) {int arraySize = 1021;int keyDigitCount = getDigitCount(key);int groupSize = getDigitCount(arraySize);int groupSum = 0;String keyString = Integer.toString(key);int i;for (i = 0; i < keyString.length(); i += groupSize) {if (i + groupSize <= keyString.length()) {String group = keyString.substring(i, i + groupSize);groupSum += Integer.parseInt(group);}}// There is no remaining part if count is divisible by groupsize.if (keyDigitCount % groupSize != 0) {String remainingPart = keyString.substring(i - groupSize, keyString.length());groupSum += Integer.parseInt(remainingPart);}return groupSum % arraySize;}public static int getDigitCount(int n) {int numDigits = 1;while (n > 9) {n /= 10;numDigits++;}return numDigits;}}
I found the group making method here. But it makes groups right to left. So, I used String#subString()
method to make left to right groups.