Finite element methods require the composition of the global stiffness matrix from local finite element contributions. The composition process combines the computation of element stiffness matrices and their assembly into the global stiffness matrix, which is commonly sparse. In this paper we focus on the assembly process of the global stiffness matrix and explore different algorithms and their efficiency on shared memory systems using C++. A key aspect of our investigation is the use of atomic synchronization primitives for the derivation of data-race free algorithms and data structures. Furthermore, we propose a new flexible storage format for sparse matrices and compare its performance with the compressed row storage format using abstract benchmarks based on common characteristics of finite element problems.