
Still it is important because it has many applications such as singer identification, lyrics recognition, and melody extraction.

Separating singing voice from monaural song recording is a highly difficult task. Finally, all existing metrics and data sets are discussed and required future metrics and data sets for BAD in order to experiment and evaluate with new multimedia applications presented, with the conclusion of the future directions are discussed. Subsequently, we explain a new BAD model and the set of challenges that music and speech processing research algorithms should focus on and required novel items to big data processing in the future. Next, we present the processing of voice, speech and music separately, and we explain machine hearing to analyze existing information approaches.

First, we describe the main sound characteristics and features in order to discuss the approaches for separating sounds into speech and music in order to categorize the related literature. This paper critically reviews the various approaches and methods adopted in speech and music separation, and highlights how the algorithms and techniques can help machine hearing applications. The separation of speech, music, and environmental sounds plays an important role in the automatic machine hearing to develop future applications for big acoustic data (BAD) processing. With the growth of acoustic data in the development of multimedia tools, mobile phones and the Internet of Multimedia Things (IoMT), recent studies exploit different models of machine hearing capable of capturing sounds, classification and separating them in different types of speech, music and environmental sounds.
