With this data structure in mind, if we had a map of project names to GitHub stars, how would we go about inserting a value into the map? Once the number of entries across each bucket passes some percentage of their total size, known as the load factor, then the map will grow by doubling the number of buckets and redistributing the entries across them. Using powers of two allows the use of cheap bit masks and shifts rather than expensive division.Īs entries are added to a map, assuming a good hash function distribution, then the buckets will fill at roughly the same rate. The classical hashmap is an array of buckets each of which contains a pointer to an array of key/value entries. In this case our hashmap has eight buckets (as this is the value that the Go implementation uses) and each bucket can hold up to eight entries each (again drawn from the Go implementation). The second part of a hashmap is the way data is stored. This property is also known as collision resistance. Secondly as the user can control some of the aspects of the input to the hash function, they may be able to control the output of the hash function, leading to poor distribution which has been a DDoS vector for some languages. Firstly, as we’ll see, values in a hashmap should be distributed evenly across buckets, otherwise the access time is not O(1). Given two near identical keys, the result should be wildly different. This is important for two reasons. The second property is good distribution. If it doesn’t you will not be able to find things you put into the map. Given the same key, your hash function must return the same answer. When used with a hashmap, hash functions have two important properties. It’s important to talk about the properties of a good hash function as the quality of the hash function determines how likely the map function is to run near O(1). However in the case of the former, it returns a value derived from the key, not the value associated with the key. This hash value is almost always an integer for reasons that we’ll see in a moment. What is a hash function? A hash function takes a key of an unknown length and returns a value with a fixed length. The size of this constant is part of the hashmap design and the point at which the map moves from O(1) to O(n) access time is determined by its hash function. That is, when things are working well, the time to execute the map function is a near constant. The specific map implementation I’m going to talk about is the hashmap, because this is the implementation that the Go runtime uses. A hashmap is a classic data structure offering O(1) lookups on average and O(n) in the worst case. Instead we’re just going to focus on these properties of a map insertion, deletion and mapping keys to values. There are other interesting properties of map implementations like querying if a key is present in the map, but they’re outside the scope of what we’re going to discuss today. We’ll need a function that adds data to the map insert(map, key, value)Īnd a function that removes data from the map delete(map, key) Now, a map isn’t going to be very useful unless we can put some data in the map. Given one value, called a key, it will return a second, the value. A map function maps one value to another. To understand how a map works, let’s first talk about the idea of the map function. It is based on a presentation I gave at the GoCon Spring 2018 conference in Tokyo, Japan. This post discusses how maps are implemented in Go.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |