UltraSPARC cryptographic performance

The performance benefits of the hardware cryptographic accelerators can be considered at a mirobenchmark level or an application level.

Microbenchmark performance benefits

The peak cryptographic performance for the various supported ciphers and cryptographic hashes is illustrated in the following tables:

Symmetric cipher Modes of operation Chip-wide performance (Gb/s [8-cores, 1.4GHz])
RC4 - 83
DES ECB, CBC, CFB64 83
3DES ECB, CBC, CFB64 27
AES-128 ECB, CBC, CTR 44
AES-192 ECB, CBC, CTR 36
AES-256 ECB, CBC, CTR 31
Cryptographic hash Chip-wide performance (Gb/s [8-cores, 1.4GHz])
MD5 41
SHA-1 32
SHA-256 41
Publick-key algorithm Chip-wide performance (private-key Ops/sec[8-cores, 1.4GHz])
RSA-1024 37,000
RSA-2048 6,000
ECCp-160 52,000
ECCb-163 92,000

Performance is dependent on the size of the object being processed, although the interface to the hardware is very efficient, as illustrated in the following figure:


Additionally, the hardware accelerator is capable of sustaining multiple oustanding read and write requests, such that data can be sourced from DRAM without impacting performance.

 When compared against other processors, the performance delivered by UltraSPARC T2 processor is fairly significant, as illustrated in the following figure: [in this figure, AES-128-CBC processing for 8KB objects is undertaken. On the x86 processing is performed in software via OpenSSL. On the UltraSPARC T2 processing is undertaken by hardware, with the offload occuring via the Solaris userland cryptographi framework or via the Solaris kernel cryptographic framework]

From the previous figure it is apparent that, when focused on pure AES performance, the UltraSPARC T2 processor is capable or significantly outperforming competitive processors -- 2 T2 cores delivering more performance than 8 x86 cores.

RSA performance


For RSA, the performance observed rapidly approaches the hardware peak performance, even using a limited number of requesting threads, as illustrated in the following figure:
Accordingly, the T2 is capable of delivering up to 37K RSA-1024 sign operations per second, while still over 50% idle!


ECC performance

The T2 MAU provides HW support for both prime and binary curves. Given the hardware support for Galois field operations, performance for binary curves is especially impressive compared to the performance delivered by traditional processors, as illustrated in the following figure (ecdsa performance sign operations):

Application-level performance benefits

When looking at application level benefits, it is important to ensure that an apples-2-apples comparison is undertaken. Also, it is important to ensure that the set-up is not cherry-picked to showcase particular strengths. As a result, focus on industry-standard benchmarks is optimal for these comparisons.

One such benchmark that has a significant focus on cryptography is the banking workload from SPECweb2005. In this workload clients interact with a banks webserver, and, as would be expected, all communication is secured -- HTTPS. In the following table the performance of the UltraSPARC T2 processor is compared with other systems: 

Processor SPECweb2005 Banking
1 x T2 [1.4GHz] 70,000
2 x Quad-core Opteron Processor (2356) [2.3GHz] 50,856
2 x Quad-core Xeon Processor X5460 [3.2GHz] 51,840
4 x Quad-core Xeon Processor X7350 [3.0GHz] 71,104

In the T2 system, the RSA, MD5 and RC4 operations are offloaded to the on-chip cryptographic accelerators, whereas the Opteron and Xeon processors perform this crypto processing in software. It is apparent that a single-socket UltraSPARC T2 processor provides equivalent performance to 4-socket x64 systems containing Quad-core processors. Alternaitvely, on a per socket basis, T2 outperforms the competition by over 2.7X.

While this performance leadership is not attributable to the hardware crypto support (onchip NICs, and abundance of threads help somewhat too), the cryptographic overheads associated with HTTPS are pretty significant - RSA ops for session establishment and then RC4 and MD5 (these are the algorithms used for SPECweb2005 anyway) operations to secure and authenticate the subsequent traffic, as illustrated in the following figure:









It is therefore not surprising that providing hardware support to accelerate cryptographic processing provides a significant performance advantage to the UltraSPARC T2 processor on SPECweb05 banking

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact