• J. Valk, T. Alumäe, in IEEE Spoken Language Technology Workshop (SLT). Voxlingua107: a dataset for spoken language recognition (IEEE, 2021), pp. 652–658.

  • A. S. Ba Wazir, H. A. Karim, M. H. L. Abdullah, N. AlDahoul, S. Mansor, M. F. A. Fauzi, J. See, A. S. Naim, Design and implementation of fast spoken foul language recognition with different end-to-end deep neural network architectures. Sensors. 21(3), 710 (2021).

    Article 

    Google Scholar
     

  • H. Nguyen, Y. Estève, L. Besacier, in Proc. Interspeech 2021. Impact of encoding and segmentation strategies on end-to-end simultaneous speech translation, (2021), pp. 2371–2375. https://doi.org/10.21437/Interspeech.2021-608.

  • F. Albu, D. Hagiescu, M. Puica, L. Vladutu, in Proceedings of the International Technology, Education and Development Conference. Intelligent tutor for first grade children’s handwriting application (IATEDValencia, 2015), pp. 3708–3717.


    Google Scholar
     

  • T. Theodorou, I. Mporas, N. Fakotakis, Int. J. Inf. Technol. Comput. Sci. (IJITCS). 6(11), 1 (2014).

  • B. Meléndez-Catalán, E. Molina, E. Gómez, Open broadcast media audio from tv: a dataset of tv broadcast audio with relative music loudness annotations. Trans. Int. Soc. Music Inf. Retr.2(1), 43–51 (2019).


    Google Scholar
     

  • J. Schlüter, R. Sonnleitner, in Proceedings of International Conference on Digital Audio Effects. Unsupervised feature learning for speech and music detection in radio broadcasts (DAFxYork, 2012).


    Google Scholar
     

  • S. Venkatesh, D. Moffat, E. R. Miranda, Investigating the effects of training set synthesis for audio segmentation of radio broadcast. Electronics. 10(7), 827 (2021).

    Article 

    Google Scholar
     

  • Q. Lemaire, A. Holzapfel, in Proceedings of International Society for Music Information Retrieval Conference (ISMIR). Temporal convolutional networks for speech and music detection in radio broadcast (ISMIRDelft, 2019), pp. 229–236.


    Google Scholar
     

  • S. Chaudhuri, J. Roth, D. P. Ellis, A. Gallagher, L. Kaver, R. Marvin, C. Pantofaru, N. Reale, L. G. Reid, K. Wilson, et al., in Proceedings of ISCA Interspeech. Ava-speech: a densely labeled dataset of speech activity in movies (ISCAHyderabad, 2018).


    Google Scholar
     

  • G. Tzanetakis, P. Cook, Marsyas: a framework for audio analysis. Organised Sound. 4(3), 169–175 (2000).

    Article 

    Google Scholar
     

  • E. Scheirer, M. Slaney, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2. Construction and evaluation of a robust multifeature speech/music discriminator, (1997), pp. 1331–1334.

  • D. Snyder, G. Chen, D. Povey, Musan: a music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015).

  • D. Wolff, T. Weyde, E. Benetos, D. Tidhar, MIREX muspeak sample dataset (2015). http://mirg.city.ac.uk/datasets/muspeak/. Accessed 30 Sept 2020.

  • R. Huang, J. H. Hansen, Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora. IEEE Trans. Audio Speech Lang. Process.14(3), 907–919 (2006).

    Article 

    Google Scholar
     

  • D. Wang, R. Vogt, M. Mason, S. Sridharan, in International Conference on Signal Processing and Communication Systems. Automatic audio segmentation using the generalized likelihood ratio (IEEEGold Coast, 2008), pp. 1–5.


    Google Scholar
     

  • N. Tsipas, L. Vrysis, C. Dimoulas, G. Papanikolaou, Efficient audio-driven multimedia indexing through similarity-based speech/music discrimination. Multimedia Tools Appl.76(24), 25603–25621 (2017).

    Article 

    Google Scholar
     

  • D. Doukhan, E. Lechapt, M. Evrard, J. Carrive, in Music Information Retrieval Evaluation eXchange. Ina’s mirex 2018 music and speech detection system (ISMIRParis, 2018).


    Google Scholar
     

  • M. Papakostas, T. Giannakopoulos, Speech-music discrimination using deep visual feature extractors. Expert Syst. Appl.114:, 334–344 (2018).

    Article 

    Google Scholar
     

  • P. Gimeno, I. Viñals, A. Ortega, A. Miguel, E. Lleida, Multiclass audio segmentation based on recurrent neural networks for broadcast domain data. J. Audio Speech Music Process.2020(1), 1–19 (2020).

    Article 

    Google Scholar
     

  • E. Tarr, Hack audio: an introduction to computer programming and digital signal processing in MATLAB® (Routledge, USA, 2018).

    Book 

    Google Scholar
     

  • M. Torcoli, A. Freke-Morin, J. Paulus, C. Simon, B. Shirley, Preferred levels for background ducking to produce esthetically pleasing audio for TV with clear speech. J. Audio Eng. Soc.67(12), 1003–1011 (2019).

    Article 

    Google Scholar
     

  • S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al., in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Cnn architectures for large-scale audio classification (IEEENew Orleans, 2017), pp. 131–135.


    Google Scholar
     

  • G. Kour, N. Mehan, Music genre classification using MFCC, SVM and BPNN. Int. J. Comput. Appl.112(6), 43–47 (2015).


    Google Scholar
     

  • K. Koutini, H. Eghbal-zadeh, G. Widmer, Receptive field regularization techniques for audio classification and tagging with deep convolutional neural networks. IEEE/ACM Trans. Audio Speech Lang. Proc.29:, 1987–2000 (2021). https://doi.org/10.1109/TASLP.2021.3082307.

    Article 

    Google Scholar
     

  • M. J. Carey, E. S. Parris, H. Lloyd-Thomas, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1. A comparison of features for speech, music discrimination (IEEEPhoenix, 1999), pp. 149–152.


    Google Scholar
     

  • K. El-Maleh, M. Klein, G. Petrucci, P. Kabal, in IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4. Speech/music discrimination for multimedia applications (IEEEIstanbul, 2000), pp. 2445–2448.


    Google Scholar
     

  • E. D. Scheirer, M. Slaney, Multi-feature speech/music discrimination system (Google Patents, USA, 2003).


    Google Scholar
     

  • C. Panagiotakis, G. Tziritas, A speech/music discriminator based on RMS and zero-crossings. IEEE Trans. Multimed.7(1), 155–166 (2005).

    Article 

    Google Scholar
     

  • E. Wieser, M. Husinsky, M. Seidl, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Speech/music discrimination in a large database of radio broadcasts from the wild (IEEEFlorence, 2014), pp. 2134–2138.


    Google Scholar
     

  • J. Han, B. Coover, in IEEE International Conference on Multimedia and Expo Workshops (ICMEW). Leveraging structural information in music-speech dectection (IEEESan Jose, 2013), pp. 1–6.


    Google Scholar
     

  • M. K. S. Khan, W. G. Al-Khatib, Machine-learning based classification of speech and music. Multimed. Syst.12(1), 55–67 (2006).

    Article 

    Google Scholar
     

  • J. Pinquier, J. -L. Rouas, R. A. E-OBRECHT, Robust speech/music classification in audio documents. Entropy. 1(2), 3 (2002).


    Google Scholar
     

  • J. Ajmera, I. McCowan, H. Bourlard, Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Speech Comm.40(3), 351–363 (2003).

    Article 

    Google Scholar
     

  • B. -Y. Jang, W. -H. Heo, J. -H. Kim, O. -W. Kwon, Music detection from broadcast contents using convolutional neural networks with a Mel-scale kernel. J. Audio Speech Music Process.2019(1), 1–12 (2019).

    Article 

    Google Scholar
     

  • D. de Benito-Gorron, A. Lozano-Diez, D. T. Toledano, J. Gonzalez-Rodriguez, Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset. J. Audio Speech Music Process.2019(1), 1–18 (2019).

    Article 

    Google Scholar
     

  • K. Choi, G. Fazekas, M. B. Sandler, K. Cho, in Proceedings of International Society for Music Information Retrieval Conference (ISMIR). Transfer learning for music classification and regression tasks, (ISMIRSuzhou, 2017), pp. 141–149.


    Google Scholar
     

  • A. Standard, A52/A: digital audio Compression Standard (AC-3, E-AC-3), Revision B, Adv. TV Syst. Comm.78–79 (2005).

  • Y. Wang, P. Getreuer, T. Hughes, R. F. Lyon, R. A. Saurous, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Trainable frontend for robust and far-field keyword spotting (IEEENew Orleans, 2017), pp. 5670–5674.


    Google Scholar
     

  • V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. P. Bello, Robust sound event detection in bioacoustic sensor networks. PloS ONE. 14(10), 0214168 (2019).

    Article 

    Google Scholar
     

  • V. Lostanlen, J. Salamon, M. Cartwright, B. McFee, A. Farnsworth, S. Kelling, J. P. Bello, Per-channel energy normalization: why and how. IEEE Signal Process. Lett.26(1), 39–43 (2019).

    Article 

    Google Scholar
     

  • D. P. Kingma, J. Ba, in 3rd International Conference on Learning Representations. Adam: a method for stochastic optimization, (2015).

  • A. Mesaros, T. Heittola, T. Virtanen, Metrics for polyphonic sound event detection. Appl. Sci.6(6), 162 (2016).

    Article 

    Google Scholar
     

  • K. Seyerlehner, T. Pohle, M. Schedl, G. Widmer, in Proceedings of of the 10th International Conference on Digital Audio Effects (DAFx). Automatic music detection in television productions (DAFxBordeaux, 2007).


    Google Scholar
     

  • M. Won, K. Choi, X. Serra, in Proceedings of International Society for Music Information Retrieval Conference (ISMIR). Semi-supervised music tagging transformer (ISMIR, 2021).

  • S. Kum, J. -H. Lin, L. Su, J. Nam, in Proceedings of International Society for Music Information Retrieval Conference (ISMIR). Semi-supervised learning using teacher-student models for vocal melody extraction (ISMIR, 2020), pp. 93–100.

  • Y. Gong, J. Yu, J. Glass, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vocalsound: a dataset for improving human vocal sounds recognition (IEEEICASSP, 2022), pp. 151–155.


    Google Scholar
     

  • Rights and permissions

    Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

    Disclaimer:

    This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

    Click here for Source link (https://www.springeropen.com/)

    Loading