Abstract
Automatic speech recognition can be considered a viable and mature technology. Many applications facilitating communication between people, and between humans and machines, are based on it. However, the automatic recognition of sign languages is not as advanced as that of spoken languages; if it were, Deaf people could communicate with someone who does not know sign language without an interpreter, gaining privacy and independence. And they could make use of automated video-controlled systems that recognize their instructions in an inclusive, quick and mobile way, similarly to how hearing people access voice-controlled systems. The GRADES and GTM research groups at the University of Vigo intend to advance in the development of an automatic recognizer for Spanish sign language (LSE) based on image recognition. From the review of the state of the art, it is concluded the need to develop an LSE database specifically designed for this purpose. The complexity of this task makes it advisable to approach it incrementally, for which we propose the goal of developing a recording methodology that allows the database to grow in size and complexity over time. This methodology includes the selection of the lexicon, the design of the recording station, the data storage structure, the computer programs for managing the video database and the associated metadata, and the protection of personal data of the signers. An initial version of the database, called LSE_Lex40_UVIGO, is made up of multiple repetitions of 40 isolated signs in LSE, carried out by different signers. This first version of the database will be useful for us to develop an isolated sign recognizer in diverse environments and independently of the user, and to demonstrate the usefulness of the acquisition methodology described in this contribution.
References
Athitsos, V. et al. (2008). “The american sign language lexicon video dataset”. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops: 1-8.
Brentari, D. (2012). “Phonology”. En R. Pfau, M. Steinbach y B. Woll (eds.), Sign language: An international handbook (pp. 21-54). Berlin; Boston: De Gruyter Mouton.
Cihan, N. et al. (2017). “Subunets: End-to-end hand shape and continuous sign language recognition”. Proc. ICCV: 3056-3065.
Cooper, H. et al. (2012). “Sign language recognition using sub-units”. Journal of Machine Learning Research, 13: 2205–2231.
DictaSign (2012). DICTA-SIGN: Sign Language Recognition, Generation and Μodelling with application in Deaf Communication. IEA-LSP.
Dreuw, P. et al. (2007). “Speech recognition techniques for a sign language recognition system,” Proc. Interspeech: 2513–2516.
España. Ley Orgánica 3/2018, de 5 de diciembre, de Protección de Datos Personales y Garantía de los Derechos Digitales. Boletín Oficial del Estado, 6 de diciembre de 2018, núm. 294, pp. 119788-119857.
Economic and Social Research Council (2018). British Sign Language Corpus Project, Economic and Social Research Council. ESRC.
ExTol (2018). ExTOL: End to End Translation of British Sign Language. ExTol.
Fundación CNSE (2008). Diccionario normativo de lengua de signos española [DVD]. Madrid: Fundación CNSE.
Fundación CNSE (2008). DILSE III: Tesoro de la lengua de signos española [DVD]. Madrid: Fundación CNSE.
GRADES-GTM (2019). Procedimiento para la gestión de los derechos de los interesados: Corpus LSE_Lex40_UVigo: Acceso, Rectificación, Supresión, Oposición, Limitación del Tratamiento y Portabilidad de los datos: Gramática, Discurso e Sociedade y Grupo de Tecnologías Multimedia. Vigo: Universidad de Vigo.
GTM (2019). Descripción del corpus LSE_Lex40_UVIGO: Grupo de Tecnologías Multimedia. Vigo: Universidad de Vigo.
Gutierrez-Sigut, E. et al. (2016). “LSE-Sign: A lexical database for Spanish Sign Language”. Behavior Research Methods, 48(1): 123–137.
Herrero, Á. (2009). Gramática didáctica de lengua de signos española. Madrid: SM.
Huang, J. et al. (2018). “Video-based Sign Language Recognition without Temporal Segmentation”. Proc. The Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18): 2257–2264.
Keskin C. et al. (2011). “Real Time Hand Pose Estimation using Depth Sensors”. Proc. ICCV Workshops: 1228-1234.
Koller, O. et al. (2016). “Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition”. Proc. of the British Machine Vision Conference (BMVC).
Koller, O. et al. (2015). “Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers”. Computer Vision and Image Understanding, 141:108–125.
López-Ludeña, V. et al. (2014a). "Translating Bus Information into Sign Language for Deaf People". Engineering Applications of Artificial Intelligence, 32: 258-269.
López-Ludeña, V. et al. (2014b). "Methodology for Developing an Advanced Communications System for the Deaf in a New Domain". Knowledge-Based Systems, 56: 240-252.
Martínez-Hinarejos, C. D. y Parcheta, Z. (2017). “Spanish Sign Language Recognition with Different Topology Hidden Markov Models”. Proc. of the Interspeech, 2017: 3349-3353.
San-Segundo, R. et al. (2008). “Proposing a speech to gesture translation architecture for Spanish deaf people”. Journal of Visual Languages and Computing, 19: 523–538.
Schembri, A. y Johnston, T. (2012). “Sociolinguistic aspects of variation and change”. En R. Pfau, M. Steinbach y B. Woll (eds.), Sign language: An international handbook (pp. 788-816). Berlin; Boston: De Gruyter Mouton.
SignSpeak (2012). Scientific understanding and vision-based technological development for continuous sign language recognition and translation.
Starner, T. et al. (1998). “Real-time american sign language recognition using desk and wearable computer based video”. IEEE TPAMI 20, (12): 1371 – 1375.
Stokoe, W. C. (2005). “Sign Language Structure: An Outline of the Visual Communication Systems of the American Deaf”. Journal of Deaf Studies and Deaf Education, 10(1): 3-37.
Tilves-Santiago, D. et al. (2018). “Experimental framework design for sign language automatic recognition”. Proc. of IberSPEECH: 72-76.
Van der Hulst, H (1993). “Units in the Analysis of Signs”. Phonology, 10(2): 209-241.
Vogler, C. y Metaxas, D. (1999). “Parallel hidden markov models for american sign language recognition”. Proc. of ICCV, 1: 116 – 122.
Wong, S. F. y Cipolla, R. (2005). “Real-time interpretation of hand motions using a sparse bayesian classifier on motion gradient orientation images”. Proc. of BMVC, 1:379–388.
Yang, H-D. (2015). “Sign Language Recognition with the Kinect Sensor Based on Conditional Random Fields”. Sensors, 15(1): 135–147.