j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3rd Generation IntelR CoreTM. For messages whose lengths range from 2 KB to 132 KB, we show that the 4-lanes SHA-256 is between 1.5 to 1.97 times faster than the fastest publicly available implementation that we are aware of, and between ~2 to ~2.5 times faster than the OpenSSL 1.0.1c implementation. For long messages, there is no significant performance difference between different choices of j. We show that the 4-lanes SHA-256 is faster than the two SHA3 finalists (BLAKE and Keccak) that have a published tree mode implementation. Finally, we explain why j-lanes hashing will be faster on the coming AVX2 architecture that facilitates using 256 bits registers. These results suggest that standardizing a tree mode for hash functions (SHA-256 in particular) could be useful for performance hungry applications.
 National Institute of Standards and Technology, “Cryptographic Hash Algorithm Competition,” 2005. http://csrc.nist.gov/groups/ST/hash/sha-3/index.html
 Virtual Applications and Implementations Research Lab, “SUPERCOP,” 2012. http://bench.cr.yp.to/supercop.html
 G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, “Keccak Sponge Function Family Main Document,” 2009. http://cuda-keccak.googlecode.com/svn/trunk/docs/Keccak-main-2.1.pdf
 G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, “Sufficient Conditions for Sound Tree and Sequential Hashing Modes,” 2009. http://eprint.iacr.org/2009/210
 Y. Dodis, L. Reyzin, R. L. Rivest and E. Shen, “Indifferentiability of Permutation-Based Compression Functions and Tree-Based Modes of Operation, with Applications to MD6,” Proceedings of FSE 2009, Lecture Notes in Computer Science, 22-25 February 2009, Leuven, pp. 104-121.
 P. Sarkar, P. J. Schellenberg, “A Parallelizable Design Principle for Cryptographic Hash Functions,” 2002. http://eprint.iacr.org/2002/031
 S. Gueron, V. Krasnov, “[PATCH] Efficient Implementations of SHA256 and SHA512, Using the Simultaneous Message Scheduling method,” 2012. http://rt.openssl.org/Ticket/Display.html?id=2784&user=guest&pass=guest
 M. Buxton, “Haswell New Instruction Descriptions Now Available!” 2011. http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/