In the field of cryptography till date the 2-byte in 1-clock is the best known RC4 hardware design , while 1-byte in 1-clock , and the 1-byte in 3 clocks  are the best known implementation. The design algorithm in considers two consecutive bytes together and processes them in 2 clocks. The design  is a pipelining architecture of . The design of 1-byte in 3-clocks is too much modular and clock hungry. In this paper considering the RC4 algorithm, as it is, a simpler RC4 hardware design providing higher throughput is proposed in which 6 different architecture has been proposed. In design 1, 1-byte is processed in 1-clock, design 2 is a dynamic KSA-PRGA architecture of Design 1. Design 3 can process 2 byte in a single clock, where as Design 4 is Dynamic KSA-PRGA architecture of Design 3. Design 5 and Design 6 are parallelization architecture design 2 and design 4 which can compute 4 byte in a single clock. The maturity in terms of throughput, power consumption and resource usage, has been achieved from design 1 to design 6. The RC4 encryption and decryption designs are respectively embedded on two FPGA boards as co-processor hardware, the communication between the two boards performed using Ethernet.