CS6375-ResearchProject

Visual Language Model (LLaVA) focusing on testing different parsing techniques from generated responses. This study investigates three different parsing techniques: rule-based parsing, dependency parsing and lastly sequence to sequence parsing to determine their efficiency in converting LLM generated responses into executable UI actions

Presentation Link

Datasets

Implemenation

LLaVa Model
- Model Used: LLaVA 1.6 Multi-Modal Language Model
- Challenge: Resource Limitations
- Solution: Kaggle GPU T4 x 2 processor & loaded the model with the use of bitsandbytes
- Prompt Engineering:
  - Key Challenge: VLM hallucinated on key UI elements in output
  - Final Effective Prompt: Fed in the dataset prompt and UI image, specifically asked the model to not make assumptions
Rule-Based Parser
- Libraries Used: Regex (re python)
- Challenge: Identifying all of the necessary patterns to structure the parser
Dependency Parser
- Libraries Used: SpaCy, NLTK
- Strategy: Analyze dependency trees and take commonly observable grammar rules
Seq2Seq Parser
- Model Used: BART
- Challenge: Poor performance on converting VLM responses to desired JSON objects
- Expanded training dataset leading to a fine-tuned BART model
- Optimizer: AdamW
- Hyperparameters:
  - Learning Rate: 2e-5
  - Epochs: 10
  - Early Stopping Patience: 3
  - Gradient Accumulation Steps (for small batch sizes): 2

Evaluation

TF-IDF Cosine Similarity:
- TF-IDF vectorizer was used from sklearn python library.
- Ranks each word based on keyword importance.
- The vectorizer inputs were fed into the cosine similarity function which resulted in an accuracy score.
- Offers a baseline lexical similarity score between the ground truth and parsed outputs.
BERT Embedding Cosine Similarity:
- BERT Embeddings (all-distilroberta-v1 model) were created for each ground truth and parsed outputs.
- The embeddings were fed into the cosine similarity function which resulted in an accuracy.
- Offers more of a context aware result.
Recall Curve & AUC:
- We developed a custom recall-like curve to see which percentage of each parsing technique surpassed a certain similarity threshold to visualize the parser consistency.
- An AUC value was computed to see how the overall model performed. An AUC value closer to 1 reflects a better model.

Final Results

The final overall accuracy score:
- Weight of 0.2 to the TF-IDF cosine similarity scores
- Weight of 0.8 to the BERT embedding cosine similarity scores
- Valued semantic correctness versus lexical correctness
Recall Curves and AUCs
- The AUC scores for each parser are shown below:
  - Rule-based Parsing AUC: 0.7553
  - Sequence to Sequence Parsing AUC: 0.7225
  - Dependency Parsing AUC: 0.6937
- Conclusion:
  - Highest AUC --> Rule-Based Parsing - In Our implementation this model ended up having the best perfomance
  - Seq2Seq Model has a lot of potential but the lack of data caused the model to decline overall
  - The dependency parser relies on syntactic structure without considering the variability limits performance

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README.md		README.md
Seq2SeqParsingOutput.csv		Seq2SeqParsingOutput.csv
VLMTrainingResponses.csv		VLMTrainingResponses.csv
VLM__training_responses.csv		VLM__training_responses.csv
VLM_responses_Baseline.csv		VLM_responses_Baseline.csv
VLM_responses_Baseline_DependencyParser.csv		VLM_responses_Baseline_DependencyParser.csv
VLM_responses_Baseline_SemanticParser.csv		VLM_responses_Baseline_SemanticParser.csv
VLM_responses_Baseline_i.csv		VLM_responses_Baseline_i.csv
dependencyParser.ipynb		dependencyParser.ipynb
evaluationMetrics.ipynb		evaluationMetrics.ipynb
main.ipynb		main.ipynb
preprocess.ipynb		preprocess.ipynb
ruleBasedParser.ipynb		ruleBasedParser.ipynb
seq2seqParser.ipynb		seq2seqParser.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS6375-ResearchProject

Datasets

Implemenation

Evaluation

Final Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

rng190001/CS6375-ResearchProject

Folders and files

Latest commit

History

Repository files navigation

CS6375-ResearchProject

Datasets

Implemenation

Evaluation

Final Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages