This paper addresses the problem of joint downlink channel estimation and user grouping in massive multiple-input multiple-output (MIMO) systems, where the motivation comes from the fact that the channel estimation performance can be improved if we exploit additional common sparsity among nearby users. In the literature, a commonly used group sparsity model assumes that users in each group share a uniform sparsity pattern. In practice, however, this oversimplified assumption usually fails to hold, even for physically close users.