Research

A brief summary of my scientific production referred to publications in journals and conferences, follows.

MIDDLE-Net: Middle-Output Deep Learning for Hyperspectral and Multispectral Image Fusion
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS ANDREMOTE SENSING

Under Review
Cascaded Convolutional Generator Networks For Solving Imaging Inverse Problems
2021 XXIII Symposium on Image, Signal Processing and Artificial Vision (STSIVA),Popayán, Colombia

Abstract: Image restoration or generation covers a range of important image inverse problems that aim to enhance a degraded image to obtain a restored image dataset. Several techniques based on deep learning have been developed for solving inverse problems. However, these methods require large training data sets to obtain a model with a good recovery performance. Recently, deep image prior (DIP) has emerged as a network-based approach that exploits the image representation power of convolutional neural networks (CNN) without resorting to the training stage but requiring the knowledge of the degradation models that describe the degraded observations. In this work, we propose a cascaded convolutional generator network that estimates the target image from degraded observations at an intermediate network stage. Furthermore, the proposed network architecture learns the degradation model by downscaling the recovered image adding relevant information in the restoration process. This approach was implemented to solve three imaging inverse problems: inpainting, deblurring, and super-resolution. The experimental results demonstrate the remarkable performance of the proposed approach, improving the DIP recovery under different scenarios.
Ideal Neighbourhood Mask for Speech Enhancement Using Deep Neural Networks
The 2019 International Joint Conference on Neural Networks (IJCNN)” The Special Session on Deep Neural Audio Processing, in Budapest, Hungary

Abstract: Degradation of speech signal due to adverse conditions is the major challenge for automatic speech recognition (ASR) systems. This paper introduces a novel approach to estimate an Ideal Neighbourhood Mask (INM) for speech segregation based on deep neural networks estimator. The method described here is based on the local binary patterns (LBP) technique often used in digital image processing. Ideal Neighbourhood Mask will indicate which time-frequency (T-F) units of the noisy speech are canceled. The performance assessment of the proposed application in conjunction with the traditional mask techniques, i.e., Ideal Binary Mask (IBM) and Ideal Ratio Mask (IRM), are carried out under various environments regarding the objective speech quality measures. The recognition experiments including results in the AURORA IV framework indicate that the proposed scheme, when applied in adverse environments yield significantly better performance than the conventional techniques.
Segregação de Voz Usando Mascaramento INM sobre o Banco de Filtros Gammatone
XXXVI Simpósio Brasileiro de Telecomunicações e processamento de sinais (SBrT-2018), Campina Grande, PB

Abstract: This paper presents an innovative approach thatemploys an ideal neighbourhood mask (INM) that has the abilityto efficiently use Local Binary Pattern (LBP) to indicate whichTime-Frequency units of the corrupted voice are dominated bynoise. Experimental results obtained with a DNN based voicerecogniser in noisy environments demonstrate that the proposedtechnique achieves significant improvements in terms of worderror rate corroborating the superiority of the proposed schemein comparison with the traditional masking algorithms IBM and IRM.
Ideal neighbourhood mask for speech enhancement
Electronics Letters, January 2018, the Institute of Engineering and Technology (the IET)

Abstract: A novel approach for speech enhancement applications by applying spectral mask estimation is introduced. The new application uses the local binary patterns to estimate an ideal neighbourhood mask. This will indicate which time-frequency units of the noisy speech are dominated by the noise. The performance assessment of the proposed application in conjunction with the traditional mask techniques, i.e. ideal binary mask and ideal ratio mask, are carried out under various environments in terms of the objective speech quality measures, as well as word error rate performance in speech recognition systems using deep neural networks. Results indicated that the proposed mask yielded significantly better performance than the conventional techniques.
Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns
International Journal of Computer, Electrical, Automation, Control and Information Engineering, 11(12), 1216 - 1221

Abstract: In this paper, we present a wavelet coefficients masking based on Local Binary Patterns (WLBP) approach to enhance the temporal spectra of the wavelet coefficients for speech enhancement. This technique exploits the wavelet denoising scheme, which splits the degraded speech into pyramidal subband components and extracts frequency information without losing temporal information. Speech enhancement in each high-frequency subband is performed by binary labels through the local binary pattern masking that encodes the ratio between the original value of each coefficient and the values of the neighbour coefficients. This approach enhances the high-frequency spectra of the wavelet transform instead of eliminating them through a threshold. A comparative analysis is carried out with conventional speech enhancement algorithms, demonstrating that the proposed technique achieves significant improvements in terms of PESQ, an international recommendation of objective measure for estimating subjective speech quality. Informal listening tests also show that the proposed method in an acoustic context improves the quality of speech, avoiding the annoying musical noise present in other speech enhancement techniques. Experimental results obtained with a DNN based speech recognizer in noisy environments corroborate the superiority of the proposed scheme in the robust speech recognition scenario.
Median filtering the temporal probability distribution in histogram mapping for robust continuous speech recognition
IEEE 24th European Signal Processing Conference (EUSIPCO), 2016 (pp. 1198-1201).

Abstract: The nonlinear distortion in the cepstral coefficients domain introduced by additive noise in the speech signal, results in high degradation performance in systems of Automatic Speech Recognition (ASR). For this reason, we propose a median filter which smooths the probability distribution functions of degraded features, thus reducing the mismatch between training data and test. The new proposal uses a histogram mapping to obtain the PDFs (probability distribution functions) of each feature vector and applies a nonlinear median filtering before mapping to the reference PDF. The algorithm efficiency is analyzed and compared to a recently proposed linear mean filtering technique on the PDFs. From the experimental results it can be concluded that the histogram smoothing through the median nonlinear filtering reduces the mismatch between training data and test, improving the system performance under adverse conditions.
Avaliação de um Novo Reconhecedor de Voz Robusto Baseado em Filtragem por Mediana da Função Distribuição de Probabilidade Usando o Corpus AURORA-4
XXXIV Simpósio Brasileiro de Telecomunicações SBrT 2016, Santarém, PA.

Abstract: This article examines and presents contributions to the robustness of cepstral attributes PNCC used in automatic speech recognition (ASR), through a non-linear filtering on the probability distribution functions (PDFs) of the attributes corrupted. Previous work has demostrated that Histogram mapping (HMAP) reduced the word error rate of the recognition system. However the HMAP introduces a small mismatch when there is no distortion of the test data, generating a loss of information of each noise-free features. In this letter we propose a novel approach to this problem, known as MED-MAP, wich consist in smoothing the PDFs for each attribute vector using median filtering. Experimental results on the AURORA-4 database have been and the effectiveness of the algorithm is analyzed in comparation with a linear filter technique on the PDFs. concluding that the smoothing of probability distributions through non-linear filtering improves system performance in adverse conditions
PNCC Features and FNN - MAP Compensation Techniques for Continuous Speech Recognition
International Telecommunications Symposium (ITS 2014), São Paulo.

Abstract: One of the biggest problems of a speech recognition system is the signal degradation due to adverse conditions. Such situations usually lead to mismatch between the test conditions and the training data, caused by non-linear distortion. The authors propose a histogram mapping followed by a filter through neural networks techniques (based on the features compensation), in order to minimize the misfit caused by noise insertion in the speech signal. The proposed method has been evaluated using the TIMIT and Noisex-92 databases. Recognition results show that the histogram mapping combined with filter with neural networks in the field of the cepstral coefficients do improve the recognition rates.
Speech Enhancement and Features Compensation Algorithms for Continuous Speech Recognition
2nd IEEE China Summit and International Conference on Signal and Information Processing 2014, Xian.

Abstract: The degradation of the speech signal due to adverse conditions generates low accuracy rates in speech recognition systems. The authors propose mixing two methods: pre-extraction of features for speech enhancement and post-extraction of features for features compensation. According to their main focus, they are fundamentally oriented to minimize the misfit caused by noise insertion in the speech signal. These methods will be applied before and after the extraction of features, respectively, therefore allowing the best possible estimation of the clear signal from its degraded version.
Reconhecimento de Voz Contínua com Atributos PNCC e Métodos de Robustez WD e MAP
XXXI Simpósio Brasileiro de Telecomunicações, 2013, Fortaleza.

Resumo: A degradação do sinal de voz devido a condições ad-versas gera baixas taxas de acerto nos sistemas de reconhecimento de voz. Os autores propõem a mistura de dois métodos: pré-extração de atributos para realce de fala e pós-extração de atributos para compensação de características. Segundo seu foco principal, esses métodos estão orientados fundamentalmente a minimizar os desajustes causados pela inserção de ruído no sinal de voz. Estes métodos serão aplicados antes e depois da extração de atributos, respectivamente, conseguindo assim estimar o máximo possível o sinal limpo a partir da sua versão degradada. .

CHRISTIAN ARCOS

Research Statement: Ph.D. Christian Dayan Arcos Gordillo
Artificial intelligence and signal processing research

CHRISTIAN ARCOS

Research Statement: Ph.D. Christian Dayan Arcos Gordillo Artificial intelligence and signal processing research

Research Statement: Ph.D. Christian Dayan Arcos Gordillo
Artificial intelligence and signal processing research