



# Implementation a Sieving Algorithm on a Dynamic Reconfigurable Processor

Takeshi Shimoyama (FUJITSU)

Tetsuya Izu (FUJITSU)

Jun Kogure (FUJITSU)

\*This work is financially supported by a consignment research from the NICT, Japan

## Outline

- 1. Estimation of each GNFS steps from software
- Factorization hardware
- 3. Dynamic reconfigurable processor DAPDNA2
- 4. Sieving algorithm on DAPDNA2
- Demo & Evaluation
- Conclusions & Future works



## Estimation of the Sieving Ratio from Software implementation

- GNFS steps
  - 1. Polynomial selection
  - 2. Relation finding (Sieving)
  - 3. Filtering + Linear algebra
  - 4. Square root
- Evironments
  - Test data: 164 digit composit number from cuningham number
  - Algorithm: Lattice sieve with 2+2 large prime variation, etc...
  - Machine: Pentium4 2.53GHz Northwood



## Ratio of each steps of GNFS

(164digit factoring timing data by Lattice Sieve in Software)





## Ratio of the parts in Relation Finding

(164digit factoring timing data by Lattice Sieve in Software)



## Outline

- 1. Estimation of each GNFS steps from software
- 2. Factorization hardware
- 3. Dynamic reconfigurable processor DAPDNA2
- 4. Sieving algorithm on DAPDNA2
- 5. Demo & Evaluation
- 6. Conclusions & Future works



#### **Factorization Hardware**



## Produce and Run a Factorization Hardware!



- Motivation
  - There are many previous works in virtual world
  - But, these Hardware device of factorizations have not seen (in the real world), yet.
- Target Device
  - ■ASIC ⇒ too reckless!
  - FPGA ⇒ not easy (for me)!
  - DAPDNA2 ⇒ Current Target!



## Outline

- 1. Estimation of each GNFS steps from software
- 2. Factorization hardware
- 3. What is DAPDNA2?
- 4. Sieving algorithm on DAPDNA2
- 5. Demo & Evaluation
- 6. Conclusions & Future works



## What is DAPDNA2(DD2)?



- A Dynamic Reconfigurable Processor introduced and manufactured by IPFlex Inc.
- Multi processor structure
  - DAP(RISC CPU) for System Control
  - DNA (Reconfigurable processor core) for high speed data processing

#### →1 chip solution

- DAP (Digital Application Processor)
  - 32bit RISC CPU
  - for control of configurations of DNA
- DNA (Distributed Network Architecture)
  - Two-dimensional array of 376 Processing Elements(PE)
  - Allows arbitrary configuration of the degree of parallelism and pipeline depth
  - Dynamic reconfiguration switch to another configuration in 1 clock cycle

6



8

## Concept of DAPDNA2







## DAPDNA-2 Processing Elements (PEs)

|                   | Processing<br>Elements | #   | Function                                                                               |  |
|-------------------|------------------------|-----|----------------------------------------------------------------------------------------|--|
| Data<br>Operation | EXE                    | 168 | 32bit, 2 input 1 output execution Include 56 multipliers (16-bit input, 32 bit output) |  |
|                   | DLE                    | 136 | 32bit, 2 input 2 output deley Configurable delay length                                |  |
|                   | RAM                    | 32  | Internal memory, (16KB × 32=512KB)                                                     |  |
| Data IO           | C16E                   | 12  | Address generation for LDB/STB access Generic 16bit counter                            |  |
|                   | C32E                   | 12  | Address generation for external memory access Generic 32bit counter                    |  |
|                   | LDB                    | 4   | Data input to DNA through the load buffer                                              |  |
|                   | STB                    | 4   | Data output from DNA through the store buffer                                          |  |
|                   | LDX                    | 4   | Data input to DNA through the Direct I/O                                               |  |
|                   | STX                    | 4   | Data output from DNA through the Direct I/O                                            |  |
| total             |                        | 376 |                                                                                        |  |





#### Comparison between Software and DAPDNA2



#### **Evaluation Environment**



## **Outline**

- 1. Estimation of each GNFS steps from software
- Factorization hardware
- 3. Dynamic reconfigurable processor DAPDNA2
- 4. Sieving algorithm on DAPDNA2
- 5. Demo & Evaluation
- 6. Conclusions & Future works

## Line Sieving



```
For b \leftarrow 1 to H_b
       set S[a] to log(F(a,b)) for all a
       For prime p \leftarrow 2 to B
               compute the sieving points a \ge -H_a
               While a < H_a
                       S[a] \leftarrow S[a] - \log p
                       a \leftarrow a + p
               end while
       end for
end for
```



## Our Approach



#### Combine two methodologies

- Pipeline Method
  - ⇒ Sieving by Smallish Prime (P < one sieving size)
- Bucket Sort Method
  - $\Rightarrow$  Sieving by Largish Prime (P  $\ge$  one sieving size)



## Pipeline Sieving



※ Suitable for smallish primes





## Bucket Sort Sieving (Phase 1)



#### ※ Suitable for largish primes





## Bucket Sort Sieving (Phase 2)



#### **Bucket0**







## Current status of our Implemenataion into DAPDNA2



- Phases in sieving
  - 1. memory setup phase
  - pipeline sieving phase
  - 3. bucket sort sieving phase
  - 4. extract relation phase

status

completed (05/2/14)

completed (05/2/21)

not completed \*

completed (05/2/14)

X Since "the arbitration" of memory interface becomes too complex.







## Demonstration

Simulator: DAPDNA2 FrameWork2

### **Evaluation on DAPDNA2**

(sample data. RSA100)



|           |                  | clock    | timing   |
|-----------|------------------|----------|----------|
| algebraic | memory setup     | 65818    | 0.396ms  |
|           | sieving          | 7477661  | 45.046ms |
|           | extract relation | 65611    | 0.395ms  |
| rational  | memory setup     | 65818    | 0.396ms  |
|           | sieving          | 3633901  | 21.890ms |
|           | extract relation | 65611    | 0.395ms  |
| total     |                  | 11374420 | 68.520ms |

clock cycle : 166MHz

Estimation of whole sieving time without trial division: 682.16 hours
 It's more than 40 times slower than software...

Software: 16.7h on one Pentium4(2.8GHz) by using lattice sieve



## Outline

- 1. Estimation of each GNFS steps from software
- 2. Factorization hardware
- 3. Dynamic reconfigurable processor DAPDNA2
- 4. Sieving algorithm on DAPDNA2
- 5. Demo & Evaluation
- Conclusions & Future works



## Conclusion



- Showed a ratio of the sieving time in GNFS by software
- Introduced the dynamic reconfigurable processor DAPDNA2
- Proposed the sieving algorithm for DAPDNA2
- Evaluated our implementation



### **Future Works**

- Continue Implementations of
  - Bucket Sort Sieving
  - Trial Division
  - Lattice Sieve
  - etc...
- Try it on another target devices
  - FPGA
  - ASIC





THE POSSIBILITIES ARE INFINITE