In conjunction with the Sixth ACMIEEE International
Conference on Formal Methods and Models for Codesign (MEMOCODE'2008) 

Cryptosorter 
Overview The objective of the 2008 MEMOCODE Hardware Software Codesign Contest is to sort an encrypted database of records as fast as possible. Sorting an encrypted database requires that each record is decrypted, that a proper index for each decrypted record is determined, and that each record is reencrypted and stored at the proper index. The record length is 4 words. A record consists for four fields {f1, f2, f3, f4}. Each field is one word long and encrypted. Encryption and decryption transforms the four fields of a record {f1, f2, f3, f4} into four decrypted fields {g1, g2, g3, g4}. Each of g1, g2, g3, and g4 is one word long. SortingThe sorting key of a decrypted record is determined by weighted numerical ordering of the decrypted words of the record. g1 is the mostsignificant word and g4 is the least significant word. g1 through g4 are unsigned numbers. The database does not contain any duplicate records, so that there is always a unique ordering possible. Some examples of valid sorting key relations: Encryption and decryption is done using four keys {k1, k2, k3, k4}, each one word long. The encryption/decryption is done using XOR operations: {f1, f2, f3, f4} = {g1 xor k1, g2 xor k2, g3 xor k3, g4 xor k4}. Based on this mechanism, encryption and decryption of records are identical operations. The four keys {k1, k2, k3, k4} are determined using AES128 as follows. The record index is encrypted with AES under a fixed, database wide key which is specified below under 'System Testbench and Design Parameters'. The encrypted record index yields 128 bits of key material. These 128 bits are split in four words corresponding to {k1, k2, k3, k4}. The encryption/decryption process is illustrated below in pseudocode. // 1. determine record key using record index and databasewide key unsigned data_in[4]; unsigned data_out[4]; const unsigned key[4]; // databasewide key data_in[0] = index; data_in[1] = 0; data_in[2] = 0; data_in[3] = 0; aes_ecb_encrypt(data_in, key, data_out); // 2. decrypt record using record key unsigned f[4]; unsigned g[4]; unsigned k[4]; k[0] = data_out[0]; k[1] = data_out[1]; k[2] = data_out[2]; k[3] = data_out[3]; g[0] = k[0] ^ f[0]; g[1] = k[1] ^ f[1]; g[2] = k[2] ^ f[2]; g[3] = k[3] ^ f[3];System Testbench and Design Paramters The cryptosorter reference implementation includes a testbench, that must be used by your optimized implementation.
Important parameters for the reference implementation are as follows. A parameter either has the suffix (fixed), in which case it cannot be changed by your optimized implementation, or else it indicates the required supported range.

Reference Implementation 
The reference implementation can be downloaded here This reference implementation was designed on a Xilinx XUP board using ISE/EDK 9.1.2. The reference implementation uses offchip DDR memory which may need to be added on your XUP board. The reference implementation is a softwareonly design using a 300 MHz PPC processor. The PPC has a 16KByte cache enabled for code and data. The reference implementation takes approximately 20,692,099 microseconds (+ 5 microseconds) to execute on a XUP board (with O2level optimization in gcc). If you use a different platform, you will need to port the reference implementation to your chosen platform and demonstrate that the reference implementation operates correctly. Also, make sure you follow the Calibration rules provided in the Design Contest Rules. 
Submission Instructions 
You have time until midnight Eastern Time, March 9, 2008, to send your solution. Note that the countdown timer at the top of this page relies on the host clock, and may be inaccurate if your host clock is set wrong. Submit your design in a tar or zip file via http://www.ece.cmu.edu/tools/upload/. Use jhoe@ece.cmu.edu as the recipient address. After your submission, you will receive a confirmation email within 24 hours. If you do not receive this confirmation email, please contact the organizers immediately. Your submission should contain the following items in a single zip or tar file.
Please note that we will treat your design sources as confidential, and that they will only be shared with the judging panel for the purpose of judging. We will not post your source code unless you give us explicit permission to do so. 
Test Cases 
We would like to obtain a uniform performance metric used by all participants. Therefore, we require you to evaluate the performance of the following testbench for your design and include the evaluation in your submission . The overall performance metric for your design is obtained as the geometric mean of the performance of 8 individual test metrics. The 8 testcases are partioned in two groups of 4 testcases each. The 4 testcases in each group differ only in the number of records. The four cases include 64, 1026, 16384, and 262144 records respectively. As a power of two, these cases correspond to 2^6, 2^10, 2^14 and 2^18 records respectively. Random testbench group The first group uses the random testbench, specified in the cryptosorter assignment. The random generator must use a seed of 0x64646464. The random testbench uses encryption with the key defined in the assignment specification. Rotated testbench group The second group uses a pathological testbench, defined by the elements of a matrix M as follows. Given a square matrix T with inorder elements in rowmajor order: 0 1 2 3 T= 4 5 6 7 8 9 10 11 12 13 14 15 then M is a 90degree clockwiserotated version of T: 12 8 4 0 M= 13 9 5 1 14 10 6 2 15 11 7 3 The contents of the plaintext database is the rowmajor order scan of M, and thus consists 12, 8, 4, 0, 13, 9, 5, 1, 14, 10, 6, 2, 15, 11, 7, 3. For a database of size 2^Q, M becomes a square matrix with 2^(Q1) elements on each side. The rotated testbench uses encryption with the key defined in the assignment specification. Measurements We require the contestants to follow the testbench code. When you integrate the testbench code in your design, follow the directives below to obained performance numbers on your design.
The example output of the testbench code on the reference code is shown below. 
Getting Started 
Start by studying the reference implementation in combination with the documentation above. On a standard XUP board, the reference implementation takes around 20 seconds to execute, as illustrated by the following screen shot. The testbench also prints '1' if the result is sorted correctly. Note that the Hyperterm terminal does not render newlines correctly. The complexity of this design problem stems from two interacting problems: a sorting problem and a crypto problem. Solving only one problem will yield a suboptimal design  considering the system level perspective is essential for optimal performance. You may make changes to the algorithms, such as changing the sorting strategy, as long the overall functionality of the system is preserved. See above (under 'System Testbench') to find what 'preserving system functionality' means. In practice, this means that you should not modify anything besides the function sortrecord() in main.c of the reference implementation, as well as the underlying call tree of sortrecord(). You can find some examples of AES cores on OpenCores. 