Editors: Björn Schuller (Imperial College London, UK) Florian Eyben (audEERING GmbH) Content and Software
Authors: Felix Weninger, Alexander Lehmann, Björn Schuller openBliSSART is a C++ framework and toolbox that provides "Blind Source Separation for Audio Recognition Tasks". Its areas of application include, but are not limited to, instrument separation (e.g. extraction of drum tracks from popular music), speech enhancement, and feature extraction. It features various source separation algorithms, with a strong focus on variants of Non-Negative Matrix Factorization (NMF). Besides basic blind (unsupervised) source separation, it provides support for component classification by Support Vector Machines (SVM) using common acoustic features from speech and music processing. For component playback and data set creation, a Qt-based GUI is available. Furthermore, supervised NMF can be performed for source separation as well as audio feature extraction. openBliSSART is fast: typical real-time factors are in the order of 0.1 (Euclidean NMF) on a state-of-the-art desktop PC. It is written in C++, enforcing strict coding standards, and adhering to modular design principles for seamless integration into multimedia applications.Interfaces are provided to Weka and HTK (Hidden Markov Model Toolkit). openBliSSART is free software and licensed under the GNU General Public License. We provide a demonstrator that uses various features of openBliSSART to separate drum tracks from popular music. This demonstrator, along with extensive documentation, including a tutorial, reference manual, and description of the framework API, can be found in the openBliSSART source distribution. If you want to use openBliSSART for your research, please cite the following paper:
openSMILE. Authors: Florian Eyben, Martin Wöllmer, Björn Schuller The openSMILE tool enables you to extract large audio feature spaces in realtime. SMILE is an acronym for Speech & Music Interpretation by Large Space Extraction. It is written in C++ and is available as both a standalone commandline executable as well as a dynamic library (A GUI version is to come soon). The main features of openSMILE are its capability of on-line incremental processing and its modularity. Feature extractor components can be freely interconnected to create new and custom features, all via a simple configuration file. New components can be added to openSMILE via an easy plugin interface and a comprehensive API. openSMILE is free software licensed under the GPL license and is currently available via Subversion (http://subversion.tigris.org/) in a pre-release state here. Commercial licensing options are available upon request.
To directly check out the Subversion repository, type the following command in a
command-line prompt on a system where SVN is installed: Florian Eyben, Martin Wöllmer, Björn Schuller: "openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor", Proc. ACM Multimedia (MM), ACM, Firenze, Italy, 25.-29.10.2010. A brief summary of openSMILE's features is given here:
openEAR. Authors: Florian Eyben, Martin Wöllmer, Björn Schuller The Munich openEAR toolkit is a complete package for automatic speech emotion recognition. Its acronym stands for open Emotion and Affect Recognition Toolkit. It is based on the openSMILE feature extractor and thus is capable of real-time on-line emotion recognition. Pre-trained models on various standard corpora are included, as well as scripts and tools to quickly build and evaluate custom model sets. As classifier currently included are Support-Vector Machines using the LibSVM libray. Soon to come are also Bidirectional Long-Short-Term-Memory Recurrent Neural Nets, Discriminative Muli-nominal Bayesian Networks, and Lazy Learners. openEAR is free software licensed under the GPL license. The first release (including model sets and pre-compiled openSMILE) will be available soon on Sourceforge: openEAR. Meanwhile, please refer to the openSMILE project, where we provide the feature extraction engine.
If you use openEAR for your research, please cite the following paper: DOWNLOAD: The first release of openEAR can be downloaded at: http://www.mmk.ei.tum.de/~eyb/openEAR-0.1.0.tar.gz . A short tutorial is included with the release. Further, the release contains pre-compiled binaries of the openSMILE engine for Windows and Linux, including PortAudio support. The live emotion recognition GUI is not yet included in the release, it will be made available within the next few weeks.
iHEARu-EAT Database. Authors: Simone Hantke, Björn Schuller, and others (cf. below)
The iHEARu-EAT database contains audio and video of subjects speaking under eating condition with different food types. The audio-track was featured as a Sub-Challenge for the Interspeech 2015 Computational Paralinguistics Challenge (Interspeech ComParE 2015). Here, we provide a richer version including additional annotations and mappings as well as a video track of the subjects.
If you use iHEARu-EAT for your research, please cite the following paper where you will find an extensive descriptions and baseline results: DOWNLOAD: Please obtain the License Agreement to get a password and further instructions for the download of the datasets: Please fill it out, print, sign, scan, and email accordingly (simone.hantke@uni-passau.de). The agreement has to be signed by a permanent staff member. After downloading the data you can directly start your experiments with the dataset.
Annotations. Authors: Björn Schuller The annotation of the MTV music data set for Automatic Mood Classification is accessible as PDF or Comma-Separated-Values (CSV) Text Files. For details please refer to and cite in case of usage:
The annotation of the UltraStar Singer Traits Database is subdivided into an ARFF file containing the singer meta-data and a ZIP file containing the beat-level alignments of the singers in the songs, in the UltraStar format. The subdivision of the songs into training, development, and test set is defined by the folder structure in the ZIP file. For copyright reasons, lyrics have been blinded in the alignments. Each singer change is annotated by the "word" _SINGERid=nnnn. In case you use this data set for your own research, please cite:
The annotation of the Emotional Sound Database is available as plain text readable CSV file containing the sound category, sound file names and the four individual labeler ratings per arousal and valence. For copyright reasons, the sound files need to be retrieved via the FINDSOUNDS page. In case you use this data set for your own research, please cite:
Coding Schemes. Authors: Björn Schuller and others (cf. below) The coding scheme of acoustic features for inter-site comparison as used in CEICES is accessible as PDF File. For details please refer to and cite in case of usage:
Demo Sounds. Authors: Björn Schuller and others (cf. below)
Examples of music written by our Deep Neural Network:
Clip 1,
Clip 2,
Clip 3,
Clip 4,
Clip 5.
In any case do not hesitate to contact us . Looking forward to hearing from you, Björn Schuller Florian Eyben More information will follow in short. |