J. Valk, T. Alumäe, in IEEE Spoken Language Technology Workshop (SLT). Voxlingua107: a dataset for spoken language recognition (IEEE, 2021), pp. 652–658.
A. S. Ba Wazir, H. A. Karim, M. H. L. Abdullah, N. AlDahoul, S. Mansor, M. F. A. Fauzi, J. See, A. S. Naim, Design and implementation of fast spoken foul language recognition with different end-to-end deep neural network architectures. Sensors. 21(3), 710 (2021).
H. Nguyen, Y. Estève, L. Besacier, in Proc. Interspeech 2021. Impact of encoding and segmentation strategies on end-to-end simultaneous speech translation, (2021), pp. 2371–2375. https://doi.org/10.21437/Interspeech.2021-608.
F. Albu, D. Hagiescu, M. Puica, L. Vladutu, in Proceedings of the International Technology, Education and Development Conference. Intelligent tutor for first grade children’s handwriting application (IATEDValencia, 2015), pp. 3708–3717.
T. Theodorou, I. Mporas, N. Fakotakis, Int. J. Inf. Technol. Comput. Sci. (IJITCS). 6(11), 1 (2014).
B. Meléndez-Catalán, E. Molina, E. Gómez, Open broadcast media audio from tv: a dataset of tv broadcast audio with relative music loudness annotations. Trans. Int. Soc. Music Inf. Retr.2(1), 43–51 (2019).
J. Schlüter, R. Sonnleitner, in Proceedings of International Conference on Digital Audio Effects. Unsupervised feature learning for speech and music detection in radio broadcasts (DAFxYork, 2012).
S. Venkatesh, D. Moffat, E. R. Miranda, Investigating the effects of training set synthesis for audio segmentation of radio broadcast. Electronics. 10(7), 827 (2021).
Q. Lemaire, A. Holzapfel, in Proceedings of International Society for Music Information Retrieval Conference (ISMIR). Temporal convolutional networks for speech and music detection in radio broadcast (ISMIRDelft, 2019), pp. 229–236.
S. Chaudhuri, J. Roth, D. P. Ellis, A. Gallagher, L. Kaver, R. Marvin, C. Pantofaru, N. Reale, L. G. Reid, K. Wilson, et al., in Proceedings of ISCA Interspeech. Ava-speech: a densely labeled dataset of speech activity in movies (ISCAHyderabad, 2018).
G. Tzanetakis, P. Cook, Marsyas: a framework for audio analysis. Organised Sound. 4(3), 169–175 (2000).
E. Scheirer, M. Slaney, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2. Construction and evaluation of a robust multifeature speech/music discriminator, (1997), pp. 1331–1334.
D. Snyder, G. Chen, D. Povey, Musan: a music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015).
D. Wolff, T. Weyde, E. Benetos, D. Tidhar, MIREX muspeak sample dataset (2015). http://mirg.city.ac.uk/datasets/muspeak/. Accessed 30 Sept 2020.
R. Huang, J. H. Hansen, Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora. IEEE Trans. Audio Speech Lang. Process.14(3), 907–919 (2006).
D. Wang, R. Vogt, M. Mason, S. Sridharan, in International Conference on Signal Processing and Communication Systems. Automatic audio segmentation using the generalized likelihood ratio (IEEEGold Coast, 2008), pp. 1–5.
N. Tsipas, L. Vrysis, C. Dimoulas, G. Papanikolaou, Efficient audio-driven multimedia indexing through similarity-based speech/music discrimination. Multimedia Tools Appl.76(24), 25603–25621 (2017).
D. Doukhan, E. Lechapt, M. Evrard, J. Carrive, in Music Information Retrieval Evaluation eXchange. Ina’s mirex 2018 music and speech detection system (ISMIRParis, 2018).
M. Papakostas, T. Giannakopoulos, Speech-music discrimination using deep visual feature extractors. Expert Syst. Appl.114:, 334–344 (2018).
P. Gimeno, I. Viñals, A. Ortega, A. Miguel, E. Lleida, Multiclass audio segmentation based on recurrent neural networks for broadcast domain data. J. Audio Speech Music Process.2020(1), 1–19 (2020).
E. Tarr, Hack audio: an introduction to computer programming and digital signal processing in MATLAB® (Routledge, USA, 2018).
M. Torcoli, A. Freke-Morin, J. Paulus, C. Simon, B. Shirley, Preferred levels for background ducking to produce esthetically pleasing audio for TV with clear speech. J. Audio Eng. Soc.67(12), 1003–1011 (2019).
S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al., in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Cnn architectures for large-scale audio classification (IEEENew Orleans, 2017), pp. 131–135.
G. Kour, N. Mehan, Music genre classification using MFCC, SVM and BPNN. Int. J. Comput. Appl.112(6), 43–47 (2015).
K. Koutini, H. Eghbal-zadeh, G. Widmer, Receptive field regularization techniques for audio classification and tagging with deep convolutional neural networks. IEEE/ACM Trans. Audio Speech Lang. Proc.29:, 1987–2000 (2021). https://doi.org/10.1109/TASLP.2021.3082307.
M. J. Carey, E. S. Parris, H. Lloyd-Thomas, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1. A comparison of features for speech, music discrimination (IEEEPhoenix, 1999), pp. 149–152.
K. El-Maleh, M. Klein, G. Petrucci, P. Kabal, in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4. Speech/music discrimination for multimedia applications (IEEEIstanbul, 2000), pp. 2445–2448.
E. D. Scheirer, M. Slaney, Multi-feature speech/music discrimination system (Google Patents, USA, 2003).
C. Panagiotakis, G. Tziritas, A speech/music discriminator based on RMS and zero-crossings. IEEE Trans. Multimed.7(1), 155–166 (2005).
E. Wieser, M. Husinsky, M. Seidl, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Speech/music discrimination in a large database of radio broadcasts from the wild (IEEEFlorence, 2014), pp. 2134–2138.
J. Han, B. Coover, in IEEE International Conference on Multimedia and Expo Workshops (ICMEW). Leveraging structural information in music-speech dectection (IEEESan Jose, 2013), pp. 1–6.
M. K. S. Khan, W. G. Al-Khatib, Machine-learning based classification of speech and music. Multimed. Syst.12(1), 55–67 (2006).
J. Pinquier, J. -L. Rouas, R. A. E-OBRECHT, Robust speech/music classification in audio documents. Entropy. 1(2), 3 (2002).
J. Ajmera, I. McCowan, H. Bourlard, Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Speech Comm.40(3), 351–363 (2003).
B. -Y. Jang, W. -H. Heo, J. -H. Kim, O. -W. Kwon, Music detection from broadcast contents using convolutional neural networks with a Mel-scale kernel. J. Audio Speech Music Process.2019(1), 1–12 (2019).
D. de Benito-Gorron, A. Lozano-Diez, D. T. Toledano, J. Gonzalez-Rodriguez, Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset. J. Audio Speech Music Process.2019(1), 1–18 (2019).
K. Choi, G. Fazekas, M. B. Sandler, K. Cho, in Proceedings of International Society for Music Information Retrieval Conference (ISMIR). Transfer learning for music classification and regression tasks, (ISMIRSuzhou, 2017), pp. 141–149.
A. Standard, A52/A: digital audio Compression Standard (AC-3, E-AC-3), Revision B, Adv. TV Syst. Comm.78–79 (2005).
Y. Wang, P. Getreuer, T. Hughes, R. F. Lyon, R. A. Saurous, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Trainable frontend for robust and far-field keyword spotting (IEEENew Orleans, 2017), pp. 5670–5674.
V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. P. Bello, Robust sound event detection in bioacoustic sensor networks. PloS ONE. 14(10), 0214168 (2019).
V. Lostanlen, J. Salamon, M. Cartwright, B. McFee, A. Farnsworth, S. Kelling, J. P. Bello, Per-channel energy normalization: why and how. IEEE Signal Process. Lett.26(1), 39–43 (2019).
D. P. Kingma, J. Ba, in 3rd International Conference on Learning Representations. Adam: a method for stochastic optimization, (2015).
A. Mesaros, T. Heittola, T. Virtanen, Metrics for polyphonic sound event detection. Appl. Sci.6(6), 162 (2016).
K. Seyerlehner, T. Pohle, M. Schedl, G. Widmer, in Proceedings of of the 10th International Conference on Digital Audio Effects (DAFx). Automatic music detection in television productions (DAFxBordeaux, 2007).
M. Won, K. Choi, X. Serra, in Proceedings of International Society for Music Information Retrieval Conference (ISMIR). Semi-supervised music tagging transformer (ISMIR, 2021).
S. Kum, J. -H. Lin, L. Su, J. Nam, in Proceedings of International Society for Music Information Retrieval Conference (ISMIR). Semi-supervised learning using teacher-student models for vocal melody extraction (ISMIR, 2020), pp. 93–100.
Y. Gong, J. Yu, J. Glass, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vocalsound: a dataset for improving human vocal sounds recognition (IEEEICASSP, 2022), pp. 151–155.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
This article is autogenerated using RSS feeds and has not been created or edited by OA JF.
Click here for Source link (https://www.springeropen.com/)