* Optimized ReverseBytes to:
a) Minimally allocate --> 60.0% reduction in the number of allocations
b) Only walk halfway the length of the string thus performing
byte swaps from left to right. Improves the performance as well.
Complexity is O(n/2) instead of O(n) which is still O(n) but
benchmarks show the new time is in deed 1/2 of the original time.
* Added unit tests and some common cases to ensure correctness.
* Benchmark shoot out results:
```shell
name old time/op new time/op delta
ReverseBytes-4 554ns ± 4% 242ns ± 3% -56.20% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
ReverseBytes-4 208B ± 0% 114B ± 0% -45.19% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
ReverseBytes-4 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10)
```