Skip to content

Garvit354/Sign-Language-Detection.AtoZ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sign Language Detection (A to Z)

Live Detection

ABSTRACT:

Sign language is one of the oldest and most natural form of language for communication, but since most people do not know sign language and interpreters are very difficult to come by I have come up with a real time method using neural networks for fingerspelling based american sign language.

In this method, the hand is first passed through a filter and after the filter is applied the hand is passed through a classifier which predicts the class of the hand gestures. This method provides 90.00 % accuracy for the 26 letters of the alphabet.

Description

American sign language is a predominant sign language since the only disability Deaf and Dumb (hereby referred to as D&M) people have is communication related and since they cannot use spoken languages, the only way for them to communicate is through sign language. Communication is the process of exchange of thoughts and messages in various ways such as speech, signals, behavior and visuals. D&M people make use of their hands to express different gestures to express their ideas with other people. Gestures are the non-verbally exchanged messages and these gestures are understood with vision. This nonverbal communication of deaf and dumb people is called sign language. A sign language is a language which uses gestures instead of sound to convey meaning combining hand-shapes, orientation and movement of the hands, arms or body, facial expressions and lip-patterns. Contrary to popular belief, sign language is not international. These vary from region to region. Sign language is a visual language and consists of 3 major components:

INTRO In this project I basically focus on producing a model which can recognize Fingerspelling based hand gestures in order to form a complete word by combining each gesture.

The gestures I trained are as given in the image below. ASL

Steps of Building the Project

1.Creating the Data Collection Folder

Create the folder with name Data in that folder there will be sub Folder with name A to Z for each alphabet

# Specify dataset path dynamically
dataset_path = "Data"
class_name = input("Enter the sign label (e.g., A, B, C): ").upper()
folder = os.path.join(dataset_path, class_name)

# Create folder if it doesn't exist
if not os.path.exists(folder):
    os.makedirs(folder)

2.DataCollection File creation for data collecting

Crop the detected hand region with some padding (offset)

    imgCrop = img[max(0, y - offset):min(y + h + offset, img.shape[0]),
                  max(0, x - offset):min(x + w + offset, img.shape[1])]

Calculate aspect ratio of the cropped image

        aspectRatio = h / w

        # If height is greater than width (portrait orientation)
        if aspectRatio > 1:
            k = imgSize / h  
            wCal = math.ceil(k * w)  
            imgResize = cv2.resize(imgCrop, (wCal, imgSize)) 
            wGap = (imgSize - wCal) // 2  
            imgWhite[:, wGap:wGap + wCal] = imgResize  
        else:  # If width is greater than height (landscape orientation)
            k = imgSize / w 
            hCal = math.ceil(k * h)  
            imgResize = cv2.resize(imgCrop, (imgSize, hCal))  
            hGap = (imgSize - hCal) // 2  
            imgWhite[hGap:hGap + hCal, :] = imgResize  

Original webcam Image display and capture keyboard input

    # Show the original webcam image
    cv2.imshow("Image", img)

    # Capture keyboard input
    key = cv2.waitKey(1)
    
    # If 's' is pressed, save the processed image
    if key == ord("s"):
        counter += 1
        cv2.imwrite(f'{folder}/Image_{time.time()}.jpg', imgWhite)
        print(counter)

    # If 'q' is pressed, exit the loop
    elif key == ord("q"):
        break

3. The second step, after the folder creation is of creating the training and testing dataset.

I captured each frame shown by the webcam of our machine.

In each frame I defined a region of interest (ROI) which is denoted by a blue bounded square as shown in the image below.

DC

After capturing the image from the ROI,

The image after capturing ROI look like below.

DC1

4.After the creation of the training and testing data. The third step is of creating a model for training. Here, I have used Convolutional Neural Network(CNN) for building this model. The model summary is as following

Convolutional Neural Network(CNN)

Unlike regular Neural Networks, in the layers of CNN, the neurons are arranged in 3 dimensions: width, height, depth.

The neurons in a layer will only be connected to a small region of the layer (window size) before it, instead of all of the neurons in a fully-connected manner.

Moreover, the final output layer would have dimensions(number of classes), because by the end of the CNN architecture we will reduce the full image into a single vector of class scores.

cnn

1. Convolutional Layer:

In convolution layer I have taken a small window size [typically of length 5*5] that extends to the depth of the input matrix.

The layer consists of learnable filters of window size. During every iteration I slid the window by stride size [typically 1], and compute the dot product of filter entries and input values at a given position.

As I continue this process well create a 2-Dimensional activation matrix that gives the response of that matrix at every spatial position.

That is, the network will learn filters that activate when they see some type of visual feature such as an edge of some orientation or a blotch of some colour.

2. Pooling Layer:

We use pooling layer to decrease the size of activation matrix and ultimately reduce the learnable parameters.

There are two types of pooling:

a. Max Pooling:

In max pooling we take a window size [for example window of size 2*2], and only taken the maximum of 4 values.

Well lid this window and continue this process, so well finally get an activation matrix half of its original Size.

b. Average Pooling:

In average pooling we take average of all Values in a window.

pooling

3. Fully Connected Layer:

In convolution layer neurons are connected only to a local region, while in a fully connected region, well connect the all the inputs to neurons.

fcl

5.Testing Dataset

# Load the model separately
model = load_model("Model/keras_model.h5", compile=False)


cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)
classifier = Classifier("Model/keras_model.h5","Model/labels.txt") 

offset = 20
imgSize = 300

labels = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q"
          ,"R","S","T","U","V","W","X","Y","Z"]  #label for Ato Z

prediction code that will show the results

            prediction, index =classifier.getPrediction(imgWhite,draw=False)
            print(prediction,index)

The bounding box around the hand and the predicted label are updated in real-time as the user performs different hand gestures.

        cv2.rectangle(imgOutput,(x-offset,y-offset-50),(x-offset+100,y-offset-50+50),(255,0,255),4,cv2.FILLED)
        cv2.putText(imgOutput,labels[index],(x,y-26),cv2.FONT_HERSHEY_COMPLEX,2,(255,255,255),2)
        cv2.rectangle(imgOutput,(x-offset,y-offset),(x+w+offset,y+h+offset),(255,0,255),4)

        # cv2.imshow("ImageCrop",imgCrop)
        # cv2.imshow("ImageWhite",imgWhite)    # if you want img crop and img white window then un comment this

outcome

Flowchart


System Flowchart

System Flowchart


DFD Diagram

dfd

Libraries Requirements -(Requires the latest pip version to install all the packages)

Note : Python 3.8 or above is required to build this project, as some of the libraries required can't be installed on the lastest version of the Python

1. Lastest pip -> pip install --upgrade pip

2. numpy -> pip install numpy

3. opencv -> pip install opencv-python

4. pip install tensorflow==2.12.0

5. keras -> pip install keras==2.12.0

6. cvzone -> pip install czvone

7. mediapipe -> pip install mediapipe==0.10.18

Running the Project

python /path/to/the/resultPredict.py

About

This project is a Sign Language Detection System that recognizes A to Z letters using Deep Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages