HashTable的实现

这边的哈希表是使用桶加链表，然后链表里面是哈希节点（键值对）

下面边实现边讲吧

首先肯定是模版定义

1	template<typename Key, typename Value, typename Hash = hash<Key>>

Key是键的类型
Value是值的类型
Hash是哈希函数的类型，如果用户没有自定义，默认的是标准库的hash<Key>，也就是对Key的哈希

然后实现一下哈希节点

struct HashNode{
    Key key;
    Value value;

    HashNode(const Key& key_, const Value& value_) : key(key_), value(value_) {}
    explicit HashNode(const Key& key_) : key(key_), value() {}
    
    bool operator== (const HashNode& other) const { return key == other.key; }
    bool operator< (const HashNode& other) const { return key < other.key; }
    bool operator> (const HashNode& other) const { return key > other.key; }
    bool operator!= (const HashNode& other) const { return key != other.key; }
    bool operator== (const Key& key_) const { return key == key_; }
};

注意C++规定函数主体必须用{}包裹，哪怕是实现只有一行
两个构造函数，一个只有键进行初始化（value就为默认值），一个是键值都有初始化
然后就是比较了，只对键比较，如果是字符串的话就按照字典序进行比较
explicit是为了防止隐式转换，比如说对于以下两种初始化

// 直接初始化（Direct Initialization）
HashNode node1("hello");        // 调用构造函数，explicit 不影响

// 拷贝初始化（Copy Initialization）  
HashNode node2 = "hello";       // 需要隐式转换，explicit 会阻止

someFunction("hello");             // ❌ 编译错误，如果函数参数是HashNode的话，会隐式转换
someFunction(HashNode("hello"));   // ✅ 正确，必须显式调用

然后再解释一下value()这种使用默认构造函数直接初始化（在构造函数的初始化列表中，也是HashNode的初始化列表中）

等价于value = Value();

会根据Value类型的不同，进行相应的初始化（调用默认构造函数）

// 内置类型
int value();        // 初始化为 0
double value();     // 初始化为 0.0
bool value();       // 初始化为 false
char* value();      // 初始化为 nullptr

// 自定义类型
string value();     // 调用默认构造函数，得到空字符串 ""
vector<int> value(); // 调用默认构造函数，得到空向量// 内置类型

然后是成员变量

using bucket = list<HashNode>;
vector<bucket> buckets;
Hash hashFunction;
size_t TableSize;
size_t ElementsNums;
float MaxLoadFactor = 0.75;

using bucket = list<HashNode>;就是给list<HashNode>设置了一个别名，叫bucket，可以让代码看起来更加简洁
- 和using namespace std;是不一样的，这个是在当前作用域中，使用 std 命名空间里的所有名字时不需要加 std:: 前缀。”，当然在大型工程文件当中一般是不建议的，可能会和第三方库或者自己命名的冲突，所以还有using std::cout;、using std::endl;或者using std::vector;等等
- 有=就是取别名，没有的就是用于命名空间的
buckets就是整个哈希表，里面都是bucket这些桶，也就是链表，用于处理哈希冲突（碰撞）

类似于

buckets[0]: list<HashNode> -> node1 -> node2 ...
buckets[1]: list<HashNode> -> ...
...
buckets[4]: list<HashNode> -> ...

Hash hashFunction;：创建一个哈希函数对象，用vector<int>想一下就懂了，都是创建对象
TableSize：桶的数量
ElementsNums：哈希表中实际元素的数量
MaxLoadFactor：负载因子，定义是ElementsNums / TableSize，当他们的比值大于0.75（0.75 是查找效率 和 空间利用率 的良好平衡点，也是 std::unordered_map 默认使用的负载因子阈值）的时候，就要进行扩容，并且所有哈希节点重新分配到桶中，否则会影响搜索效率（从O(1)到接近O(n)

然后就是根据哈希节点的键值映射到对应的桶当中

1	size_t hash(const Key& key_) const { return hashFunction(key_) % TableSize; }

就是键的哈希值%桶的数量，就可以得到节点应该在哪个桶

构造函数肯定有的，这个比较长，当时也是花了挺多时间来理解，后面发现了对于某些基础确实还没完全弄明白

1 2	HashTable(size_t TableSize_ = 10, const Hash& hashFunction_ = Hash()) : buckets(TableSize_), TableSize(TableSize_), hashFunction(hashFunction_), ElementsNums(0) {}

先举个例子，vector<int> res()，这个<>里面的就是template里面的那些模版参数（例如int、float等等），而()里面的参数是传给构造函数的（都是这些模版参数对应的对象），两者不要搞混

可看如下例子

1	HashTable<std::string, int> table;

10个桶
std::hash<std::string> 作为哈希函数

1	HashTable<std::string, int> table(100);

100个桶
哈希函数仍是std::hash<std::string>

struct MyHash {
    size_t operator()(const std::string &s) const {
        return s.length();  // 只是举例
    }
};

MyHash h;
HashTable<std::string, int, MyHash> table(50, h);

使用 MyHash 作为哈希函数
初始化时分配 50 个桶

构造函数两个参数，另一个参数使用Hash()，这是一个临时对象，必须加上const引用才可以，就是如果不传参的话默认使用hash<Key>

关于临时对象的可以看如下例子

// 规则1：非const引用只能绑定到左值
Hash obj;
Hash& ref1 = obj;        // ✅ 正确：左值
Hash& ref2 = Hash();     // ❌ 错误：临时对象（右值）

// 规则2：const引用可以绑定到左值和右值
const Hash& ref3 = obj;     // ✅ 正确：左值
const Hash& ref4 = Hash();  // ✅ 正确：临时对象（右值）

桶一般情况下，包括标准的STL，默认是10

接下去是扩容函数

void rehash(size_t newTableSize){
    vector<bucket> newBuckets(newTableSize);

    for (bucket& tmp1 : buckets){
        for (HashNode& tmp2 : tmp1){
            size_t index = hash(tmp2.key);
            newBuckets[index].push_back(tmp2);
        }
    }

    buckets = std::move(newBuckets);
    TableSize = newTableSize;
}

就是创建一个新的哈希表然后把元素都移过去
move是直接移动元素，效率更高，同时移动完成之后newBuckets里面就没有元素了，相比于直接=（也就是拷贝），效率会高很多

insert和insertKey的实现

void insert(const Key& key, const Value& value){
    if (ElementsNums + 1 > TableSize * MaxLoadFactor){
        rehash(2 * TableSize);
    }
    size_t index = hash(key);
    list<HashNode>& tmp = buckets[index];

    if (std::find(tmp.begin(), tmp.end(), key) == tmp.end()){
        tmp.push_back(HashNode(key, value));
        ElementsNums++;
    }
}

void insertKey(const Key& key){
    insert(key, Value{});
}

要先检查需不需要扩容
拿桶的时候注意要取引用
Value{}和Value()（在非构造函数初始化列表的情况下）是一样的，都是一个临时对象，使用默认构造函数创建的，上面value()是用默认构造函数直接初始化，不是临时对象的意思，虽然他们最终都是默认值
注意find必须指出是标准库的，因为编译器会先寻找当前类的成员函数

erase的实现

void erase(const Key& key){
    size_t index = hash(key);
    bucket& buc = buckets[index];

    typename bucket::iterator it = std::find(buc.begin(), buc.end(), key);
    // auto it = std::find(buc.begin(), buc.end(), key);
    if (it != buc.end()){
        buc.erase(it);
        ElementsNums--;
    }
}

链表中去除元素有两种常用的，erase和remove
- erase是需要拿到迭代器iterator，在这边是list<HashNode>::iterator，也就是bucket::iterator，不是指针，同时使用到类模版参数的时候，这边使用到了HashNode，里面有Key和Value，所以需要显示说明它是一个类型（使用typename），不过最简单还是auto自动判断
- remove需要拿到值，不合适元素是键值对的

find的实现，返回的是键所对应值的指针

Value* find(const Key& key){
    size_t index = hash(key);
    bucket& buc = buckets[index];

    typename bucket::iterator it = std::find(buc.begin(), buc.end(), key);
    if (it != buc.end()){
        return & it -> value;
    }
    return nullptr;
}

跟上面差不多，找到了返回值的指针即可

size的实现

1
2
3

size_t size(){
    return ElementsNums;
}

注意返回的是实际的键值对

clear的实现

void clear() {
    buckets.clear();
    ElementsNums = 0;
    TableSize = 0;
}

最终的完整代码如下

#include <list>
#include <functional> //for std::hash
#include <vector>
#include <cstddef>
#include <algorithm> //for find move

using namespace std;


template<typename Key, typename Value, typename Hash = hash<Key>>
class HashTable{
    struct HashNode{
        Key key;
        Value value;

        HashNode(const Key& key_, const Value& value_) : key(key_), value(value_) {}
        explicit HashNode(const Key& key_) : key(key_), value() {}
        
        bool operator== (const HashNode& other) const { return key == other.key; }
        bool operator< (const HashNode& other) const { return key < other.key; }
        bool operator> (const HashNode& other) const { return key > other.key; }
        bool operator!= (const HashNode& other) const { return key != other.key; }
        bool operator== (const Key& key_) const { return key == key_; }
    };

private:
    using bucket = list<HashNode>;
    vector<bucket> buckets;
    Hash hashFunction;
    size_t TableSize;
    size_t ElementsNums;
    float MaxLoadFactor = 0.75;

public:
    HashTable(size_t TableSize_ = 10, const Hash& hashFunction_ = Hash()) : buckets(TableSize_), TableSize(TableSize_), hashFunction(hashFunction_), ElementsNums(0) {}

    size_t hash(const Key& key_) const { return hashFunction(key_) % TableSize; }
    
    void rehash(size_t newTableSize){
        vector<bucket> newBuckets(newTableSize);

        for (bucket& tmp1 : buckets){
            for (HashNode& tmp2 : tmp1){
                size_t index = hash(tmp2.key);
                newBuckets[index].push_back(tmp2);
            }
        }

        buckets = std::move(newBuckets);
        TableSize = newTableSize;
    }

    void insert(const Key& key, const Value& value){
        if (ElementsNums + 1 > TableSize * MaxLoadFactor){
            rehash(2 * TableSize);
        }
        size_t index = hash(key);
        list<HashNode>& tmp = buckets[index];

        if (std::find(tmp.begin(), tmp.end(), key) == tmp.end()){
            tmp.push_back(HashNode(key, value));
            ElementsNums++;
        }
    }

    void insertKey(const Key& key){
        insert(key, Value{});
    }

    void erase(const Key& key){
        size_t index = hash(key);
        bucket& buc = buckets[index];

        typename bucket::iterator it = std::find(buc.begin(), buc.end(), key);
        // auto it = std::find(buc.begin(), buc.end(), key);
        if (it != buc.end()){
            buc.erase(it);
            ElementsNums--;
        }
    }

    Value* find(const Key& key){
        size_t index = hash(key);
        bucket& buc = buckets[index];

        typename bucket::iterator it = std::find(buc.begin(), buc.end(), key);
        if (it != buc.end()){
            return & it -> value;
        }
        return nullptr;
    }


    size_t size(){
        return ElementsNums;
    }


    void clear() {
        buckets.clear();
        ElementsNums = 0;
        TableSize = 0;
    }
    

};

测试代码

#include <iostream>
#include <string>


int main() {
    HashTable<std::string, int> table;

    std::cout << "=== 插入元素 ===\n";
    table.insert("apple", 1);
    table.insert("banana", 2);
    table.insert("cherry", 3);

    std::cout << "Size: " << table.size() << "\n";  // 应该是 3

    std::cout << "=== 查找元素 ===\n";
    if (int* val = table.find("banana")) {
        std::cout << "Found banana: " << *val << "\n";  // 应该是 2
    } else {
        std::cout << "banana not found\n";
    }

    if (int* val = table.find("durian")) {
        std::cout << "Found durian: " << *val << "\n";
    } else {
        std::cout << "durian not found\n";  // 应该输出这句
    }

    std::cout << "=== insertKey() 插入默认值 ===\n";
    table.insertKey("durian");
    if (int* val = table.find("durian")) {
        std::cout << "durian: " << *val << " (默认值)\n";  // 应该是 0
    }

    std::cout << "Size after insertKey: " << table.size() << "\n";  // 应该是 4

    std::cout << "=== 删除元素 ===\n";
    table.erase("banana");
    if (table.find("banana")) {
        std::cout << "banana still exists\n";
    } else {
        std::cout << "banana deleted successfully\n";  // 应该是这句
    }

    std::cout << "Size after erase: " << table.size() << "\n";  // 应该是 3

    std::cout << "=== 批量插入触发扩容（rehash） ===\n";
    for (int i = 0; i < 20; ++i) {
        table.insert("key" + std::to_string(i), i);
    }
    std::cout << "Size after rehash insertions: " << table.size() << "\n";  // 应该是 23（3 + 20）

    std::cout << "=== 清空表 ===\n";
    table.clear();
    std::cout << "Size after clear: " << table.size() << "\n";  // 应该是 0

    std::cout << "All tests completed.\n";
    return 0;
}

输出

=== 插入元素 ===
Size: 3
=== 查找元素 ===
Found banana: 2
durian not found
=== insertKey() 插入默认值 ===
durian: 0 (默认值)
Size after insertKey: 4
=== 删除元素 ===
banana deleted successfully
Size after erase: 3
=== 批量插入触发扩容（rehash） ===
Size after rehash insertions: 23
=== 清空表 ===
Size after clear: 0
All tests completed.