Introduction
libgpucrypto is subset of SSLShader software that implements few cryptographic algorithms: AES, SHA1, RSA using CUDA. It also includes several data structures to help utilize CUDA's stream for better performance. See here for more details.
Installation
- Install required libraries
you can download CUDA stuff at http://developer.nvidia.com/cuda-toolkit-40
libgpucrypto requires CUDA dev driver, CUDA toolkit, and CUDA SDK.
We have tested under software settings as below.
CUDA 4.0
CUDA driver : 270.41.19
CUDA toolkit : 4.0.17
CUDA SDK : 4.0.17
CUDA 3.2
CUDA driver : 260.19.26
CUDA toolkit : 3.2.16
CUDA SDK : 3.2.16
O/S
Ubuntu 10.04 LTS 64bit
- Install OpenSSL libraries and headers
you can download OpenSSL at http://openssl.org/source/
- Configure following variables in Makefile.in
OPENSSL_DIR
CUDA_TOOLKIT_DIR
CUDA_SDK_DIR
if you're using system default opeenssl development library, then you may leave OPENSSL_DIR as blank.
- Build libgpucrypto
#make
- Try running test code
#./bin/aes_test -m ENC ------------------------------------------ AES-128-CBC ENC, Size: 16KB ------------------------------------------ #msg latency(usec) thruput(Mbps) 1 6012 21 2 6305 41 4 7020 74 8 8737 120 16 11834 177 32 16168 259 64 17244 486 128 19256 871 256 24579 1365 512 27067 2479 1024 31605 4246 2048 40924 6559 4096 61402 8743 Correctness check (batch, random): .............OK #./bin/rsa_test -m MP -snip- #./bin/sha_test -snip-
you can see more detailed usage by running program w/o arguments or w/ incorrect one :).
How to use?
Here, I'll explain how to use libgpucrypto with an example of AES. Below is part of the code from aes_test.cc.
device_context dev_ctx;
pinned_mem_pool *pool;
aes_enc_param_t param;
operation_batch_t ops;
//1. initialize device context
dev_ctx.init(num_flows * flow_len * 3, 0);
//2. create aes_context.
aes_context aes_ctx(&dev_ctx);
//generate test random test case
gen_aes_cbc_data(&ops,
key_bits,
num_flows,
flow_len,
true);
//3. prepare data to be encrypted
pool = new pinned_mem_pool();
pool->init(num_flows * flow_len * 3);
aes_cbc_encrypt_prepare(&ops, ¶m, pool);
//4. Launch GPU code
aes_ctx.cbc_encrypt(param.memory_start,
param.in_pos,
param.key_pos,
param.ivs_pos,
param.pkt_offset_pos,
param.tot_in_len,
param.out,
param.num_flows,
param.tot_out_len,
0);
//5. Wait for completion
aes_ctx.sync(0);
-
Initialize device_context:
libgpucrypto has several wrapper for CUDA initialization and stream manipulation.
To utilize libgpucrypto, you need to create
device_context .
-
Create aes_context:
class aes_context provides APIs to launch GPU code using CUDA library.
You need an initialized device_context for this.
-
Prepare data to be encrypted:
To use aes_context, you need to organize data and prepare some metadata.
GPU requires large batch size to get maximum throughput and
you need to copy data into GPU's memory before processing.
Data copy cost between GPU's memory and host memory is relatively huge
when you copy small amount of data.
For this reason, we gather all data into one big buffer before passing to aes_context.
Please read sample code aes_test.cc in test directory for details.
In the above example we used pinned_page to avoid another copy in CPU's memory. Before CUDA4.0, unless you allocate pinned page using CUDA, it will copy data into pinned page internally before copying into GPU. To avoid this we use pinned page explicitly.
We know it's not very friendly. We're working on improving the interface.
-
Launch GPU code:
aes_context will copy data into GPU's memory and launch GPU kernel.
-
Wait for completion:
sync function poll to check whether the GPU execution has finished,
and it will copy data back to host memory once kernel execution is done.
You can use this function in async manner to just check status.
See here
for more details.
Please see files in test directory for more examples.
Documentation
Source Code
SSLShader is in the process of
being tech-transferred, and we no longer release the source
code.
Sorry for the inconvenience.