Hardware support for floating-point (FP) arithmetic is a mandatory feature of modern microprocessor design. There are many alternatives in floating-point unit (FPU) design, and overall performance can be greatly affected by the organization of a floating-point unit. In this paper, design considerations and trade-off factors are evaluated for two types of floating-point unit architecture and implementation optimized under different design goals. The implementation results of the proposed FPUs based on standard cell methodology in TSMC 0.18 /spl mu/m technology exhibit that both designs are well optimized for their target applications. A single-instruction issue design is implemented in very small area; however, a design capable of concurrently executing FP add and multiply instructions is achievable with only a modest 24% area increase.