July 18, 2019

Sign Language Translation – How AI can change the way we communicate.

Languages are everywhere and communication is the integral part of advancement in any field. For a long time, physically challenged people with hearing and speaking difficulties had trouble to interact with society. Sign languages are the way in which they communicate a thought or an idea. The main setback for them is that regular people don’t learn these languages and it becomes a challenge to communicate with a stranger without an interpreter present.


This problem can be addressed using Artificial Intelligence and aid sign language users to communicate through gesture recognition. Using computer vision techniques, gestures are translated into both text and speech. Sentences spoken by a person are translated to text using STT(Speech to Text) for the Sign language users to read.  


Sign language translation comes with its own challenges. To translate it accurately one must account to various facial expressions, head tilting, shoulder raising, mouthing and various signals apart from hand signs to create meaning.

 For a complete sign language translator, we would need three sub-domains of computer vision:

∙ Detecting body movement and position

∙ Analyzing facial expressions

∙ Detecting hand and finger shapes

Fig 1: Subdomains of computer vision required for sign language translator


Steps involved in Sign Language Translation:

For a sign language translator to work the AI system goes through several steps to complete the process they are:

  • Gesture Recognition System
  • Dataset Creation
  • Words Recognition
  • Sentence Formation

    Fig 2: Steps involved in sign language translation


Design of Gesture Recognition System

An important stage in the gesture recognition process is to model the hand in such a way that can be understandable by the Human-Computer Interface (HCI). For modelling the hand, the kinematic structure of the hand is taken into consideration.


Fig 3: Kinetic structure of hand


The gesture modelling is classified into two types, 

  •  Spatial modelling  
  •  Temporal modelling.

Spatial modelling considers posture or gesture’s shape whereas temporal modelling considers the hand gesture dynamically i.e., it refers to the gesture’s motion in real-time. 


Hand Modelling in the spatial domain can be implemented in both 2D and 3D space. 2D hand modelling is represented by its shape, motion and deformable templates. Its shape can further be categorized into geometric models and non-geometric models: where Geometric models comprise of location and position of fingertips and palm features, while the non-geometric models consider features like colour, texture, outline, edges. Deformable templates capture the basic outline of the hand. Some of these features are used for extraction and analysis. 


The handshape in 3D can be categorized into volumetric, skeletal, and geometrical models. Volumetric models are complex which use a lot of parameters to represent hand shape. Skeletal models in which the basic structure of the hand is captured require much fewer parameters. Geometrical models are used widely in real applications. They efficiently simulate visual hand image but also requires a lot of parameters. An alternative option is to use geometric forms to approach the visual shape such as cylinders and ellipsoids. Polygon meshes and cardboard models are examples of geometrical models. The figure below shows examples of all these modelling methods.


Sign language translators would bridge the gap between the world of hearing and the non-hearing. It is the future for a more inclusive society and a step towards advancement. With over 200 sign Languages in the world, A sign language translator can take us a step closer to a universal mode of communication.




  2. American Sign Language: Nonmanual Markers