* go mod init; rm -rf vendor * tweak proto files and generation * go mod vendor * clean up build.go * protobuf literals in tests * downgrade gogo/protobuf
121 lines
3.9 KiB
Markdown
121 lines
3.9 KiB
Markdown
[](https://travis-ci.org/chmduquesne/rollinghash)
|
|
[](https://coveralls.io/github/chmduquesne/rollinghash?branch=master)
|
|
[](https://godoc.org/github.com/chmduquesne/rollinghash)
|
|

|
|
|
|
Rolling Hashes
|
|
==============
|
|
|
|
Philosophy
|
|
----------
|
|
|
|
This package contains several various rolling hashes for you to play with
|
|
crazy ideas. The API design philosophy is to stick as closely as possible
|
|
to the interface provided by the builtin hash package (the hashes
|
|
implemented here are effectively drop-in replacements for their builtin
|
|
counterparts), while providing simultaneously the highest speed and
|
|
simplicity.
|
|
|
|
Usage
|
|
-----
|
|
|
|
A [`rollinghash.Hash`](https://godoc.org/github.com/chmduquesne/rollinghash#Hash)
|
|
is just a [`hash.Hash`](https://golang.org/pkg/hash/#Hash) which
|
|
implements the
|
|
[`Roller`](https://godoc.org/github.com/chmduquesne/rollinghash#Roller)
|
|
interface. Here is how it is typically used:
|
|
|
|
```golang
|
|
data := []byte("here is some data to roll on")
|
|
h := buzhash64.New()
|
|
n := 16
|
|
|
|
// Initialize the rolling window
|
|
h.Write(data[:n])
|
|
|
|
for _, c := range(data[n:]) {
|
|
|
|
// Slide the window and update the hash
|
|
h.Roll(c)
|
|
|
|
// Get the updated hash value
|
|
fmt.Println(h.Sum64())
|
|
}
|
|
```
|
|
|
|
Gotchas
|
|
-------
|
|
|
|
The rolling window MUST be initialized by calling `Write` first (which
|
|
saves a copy). The byte leaving the rolling window is inferred from the
|
|
internal copy of the rolling window, which is updated with every call to
|
|
`Roll`.
|
|
|
|
If you want your code to run at the highest speed, do NOT cast the result
|
|
of a `New()` as a rollinghash.Hash. Instead, use the native type returned
|
|
by `New()`. This is because the go compiler cannot inline calls from an
|
|
interface. When later you call Roll(), the native type call will be
|
|
inlined by the compiler, but not the casted type call.
|
|
|
|
```golang
|
|
var h1 rollinghash.Hash
|
|
h1 = buzhash32.New()
|
|
h2 := buzhash32.New()
|
|
|
|
[...]
|
|
|
|
h1.Roll(b) // Not inlined (slow)
|
|
h2.Roll(b) // inlined (fast)
|
|
```
|
|
|
|
What's new in v4
|
|
----------------
|
|
|
|
In v4:
|
|
|
|
* `Write` has become fully consistent with `hash.Hash`. As opposed to
|
|
previous versions, where writing data would reinitialize the window, it
|
|
now appends this data to the existing window. In order to reset the
|
|
window, one should instead use the `Reset` method.
|
|
|
|
* Calling `Roll` on an empty window is considered a bug, and now triggers
|
|
a panic.
|
|
|
|
Brief reminder of the behaviors in previous versions:
|
|
|
|
* From v0.x.x to v2.x.x: `Roll` returns an error for an empty window.
|
|
`Write` reinitializes the rolling window.
|
|
|
|
* v3.x.x : `Roll` does not return anything. `Write` still reinitializes
|
|
the rolling window. The rolling window always has a minimum size of 1,
|
|
which yields wrong results when using roll before having initialized the
|
|
window.
|
|
|
|
Go versions
|
|
-----------
|
|
|
|
The `RabinKarp64` rollinghash does not yield consistent results before
|
|
go1.7. This is because it uses `Rand.Read()` from the builtin `math/rand`.
|
|
This function was [fixed in go
|
|
1.7](https://golang.org/doc/go1.7#math_rand) to produce a consistent
|
|
stream of bytes that is independant of the size of the input buffer. If
|
|
you depend on this hash, it is strongly recommended to stick to versions
|
|
of go superior to 1.7.
|
|
|
|
License
|
|
-------
|
|
|
|
This code is delivered to you under the terms of the MIT public license,
|
|
except the `rabinkarp64` subpackage, which has been adapted from
|
|
[restic](https://github.com/restic/chunker) (BSD 2-clause "Simplified").
|
|
|
|
Notable users
|
|
-------------
|
|
|
|
* [syncthing](https://syncthing.net/), a decentralized synchronisation
|
|
solution
|
|
* [muscato](https://github.com/kshedden/muscato), a genome analysis tool
|
|
|
|
If you are using this in production or for research, let me know and I
|
|
will happily put a link here!
|