哈希桶处理哈希冲突-白红宇

哈希桶处理哈希冲突

阅读量：4028 次

发布时间：2019-05-24

本文共 5420 字，大约阅读时间需要 18 分钟。

哈希桶：哈希桶就是盛放不同key链表的容器（即是哈希表），我们可以把每个key的位置看作是一个指针，该指针所指向的位置里放了一个链表，可以认为是指针数组，故该方法也叫开链式。

相比闭散列，哈希桶提高了空间利用率：在实现哈希表时，常见的方法是线性探测、二次探测，这两个算法的具体实现可以查看我的博客。但是这两个算法有一个共同点就是：空间利用率低。为什么这么说呢？线性探测、二次探测的高效性很大程度上要取决于它的载荷因子，载荷因子即：存放关键字个数 / 空间大小。

通过查阅资料，我发现，使用素数做除数可以减少哈希冲突。见下：

素数表：使用素数做除数可以减少哈希冲突

// 使用素数表对齐做哈希表的容量，降低哈希冲突

const int _PrimeSize = 28;

static const unsigned long _PrimeList [_PrimeSize] =

{

53ul, 97ul, 193ul, 389ul, 769ul,

1543ul, 3079ul, 6151ul, 12289ul, 24593ul,

49157ul, 98317ul, 196613ul, 393241ul, 786433ul,

1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,

50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,

1610612741ul, 3221225473ul, 4294967291ul

};

下图进行哈希桶处理哈希冲突的展示

下面通过库中的vactor进行存放指向链表的指针，每个结点里包含_key,_value和_next。

#pragmatemplate
   
    struct DefaultHashFunc{	size_t operator()(const K& key)	{		return key;	}};static size_t BKDRHash(const char * str)//字符串哈希算法{	unsigned int seed = 131; // 31 131 1313 13131 131313	unsigned int hash = 0;	while (*str)	{		hash = hash * seed + (unsigned int)(*str++);	}	return (hash & 0x7FFFFFFF);}template<>struct DefaultHashFunc
    
     {	size_t operator()(const string& str)	{		return BKDRHash(str.c_str());	}};template
     
      struct HashTableNode//结点{	K _key;	V _value;	HashTableNode* _next;	HashTableNode(const K& key, const V& value)		:_key(key)		, _value(value)		, _next(NULL)	{}};template
      
       
        >class HashTableBucket{	typedef HashTableNode
        
          Node;public: HashTableBucket(); HashTableBucket(const HashTableBucket
         
          & htb); HashTableBucket
          
           & operator=(HashTableBucket
           
             htb); void PrintTables(); bool Insert(const K& key,const V& value);//防冗余，在删除和查找时只需要key Node* Find(const K& key); bool Remove(const K& key);protected: size_t _HashFunc(const K& key); size_t _GetNextPrime(size_t size);//获取下一个素数（利用素数表，使用素数做除数可以减少哈希冲突） void _CheckExpand();private: vector
            
              _tables;//开链式为指针数组，指针指向链表 size_t _size;//有效数据数，vector中的size()为有效空间数};

实现_HashFunc(const K& key)，通过伪函数来判断不同类型的key所在链表的位置。

template
   
    
     >size_t HashTableBucket
     
      ::_HashFunc(const K& key){	HashFunc htb;	return htb(key) % (_tables.size());//htb(key)伪函数}

1. 插入函数的实现（Insert）

（1）检查容量。调用_CheckExpand()函数检查负载因子a，考虑是否扩张，当a为1时进行扩容。

（2）检查插入的key是否已经存在，不存在返回false，存在进行（3）操作。

（3）进行头插。

template
   
    
     >bool HashTableBucket
     
      ::Insert(const K& key, const V& value){//防冗余，在删除和查找时只需要key	_CheckExpand();//检查是否扩容	for (size_t i = 0; i < _tables.size(); ++i)	{		Node* cur = _tables[i];		while (cur)		{//如果插入的元素存在就返回false			if (cur->_key == key)			{				return false;			}			cur = cur->_next;		}	}		//头插	size_t index = _HashFunc(key);	Node* tmp = new Node(key, value);	tmp->_next = _tables[index];	_tables[index] = tmp;	++_size;	return true;

2. 查找函数的实现（Find）

（1）调用_HashFunc()函数找到要寻找的Key所在的链表位置。

（2）通过遍历链表查找key。

template
   
    
     >HashTableNode
     
      * HashTableBucket
      
       ::Find(const K& key)//查找{	size_t index = _HashFunc(key);//链表结点位置	Node* cur = _tables[index];	while (cur)	{		if (cur->_key == key)		{			return cur;		}		cur = cur->_next;	}	return NULL;}

3. 删除函数的实现（Remove）

（1）调用Find（）函数，判断需要删除的key是否存在，不存在就返回false，存在就进行（2）操作。

（2）调用_HashFunc()函数找到key所在链表的位置，先通过遍历链表找到del结点的上一个结点prev，然后使prev的下一个结点指向del的下一个结点。

template
   
    
     >bool HashTableBucket
     
      ::Remove(const K& key)//删除{	if (Find(key) == NULL)	{		return false;	}	size_t index = _HashFunc(key);	//需要找到删除结点的前后结点	Node* del = Find(key);	Node* next = del->_next;	Node* prev = _tables[index];	while (prev)	{		if (prev->_next == del)		{			break;		}		prev = prev->_next;	}	if (next)//如果next存在时，进行链接	{		prev->_next = next;	}	del = NULL;	return true;}

检查是否需要扩容_CheckExpand()的实现。

template
   
    
     >void HashTableBucket
     
      ::_CheckExpand()//检查负载因子，考虑是否扩容{	if (_size >= _tables.size())//负载因子达到了1，进行扩容	{		size_t NewSize = _GetNextPrime(_size);		//进行结点复制		vector
      
        NewTables;		NewTables.resize(NewSize);		for (size_t i = 0; i < _tables.size(); ++i)		{			Node* cur = _tables[i];			while (cur)//头插			{				Node* tmp = cur;				cur = cur->_next;				size_t index = _HashFunc(tmp->_key);//重新确定元素在表中位置				tmp->_next = NewTables[index];				NewTables[index] = tmp;			}		}		_tables.swap(NewTables);//调用vector中的swap接口进行交换	}}template
       
        
         >size_t HashTableBucket
         
          ::_GetNextPrime(size_t size){//获取下一个素数（利用素数表，使用素数做除数可以减少哈希冲突） //使用素数表对齐做哈希表的容量，降低哈希冲突 const int _PrimeSize = 28; static const unsigned long _PrimeList[_PrimeSize] = { 53ul, 97ul, 193ul, 389ul, 769ul, 1543ul, 3079ul, 6151ul, 12289ul, 24593ul, 49157ul, 98317ul, 196613ul, 393241ul, 786433ul, 1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul, 50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul, 1610612741ul, 3221225473ul, 4294967291ul }; for (size_t i = 0; i < _PrimeSize; ++i) { if (_PrimeList[i] > size) { return _PrimeList[i]; } return _PrimeList[i - 1]; } return _PrimeList[_PrimeSize];//如果size大于或等于素数表中数据，就返回表中最大数}

测试用例如下，实现字典（可以一对多）查询。

HashTableBucket
   
    
     > dict;	vector
     
       v;	v.push_back("manager");	dict.Insert("经理", v);	v.clear();	v.push_back("移动");	v.push_back("距离");	dict.Insert("remove",v);	HashTableNode
      
       
        >* ret = dict.Find("remove");	ret->_value.push_back("搬家");	vector
        
         & words = ret->_value; for (size_t i = 0; i < words.size(); ++i)//打印对应的多个字符串 { cout << words[i].c_str() << endl; }

本文出自 “” 博客，请务必保留此出处

转载地址：http://hilbi.baihongyu.com/

你可能感兴趣的文章