Skip to content

Why can Zero Shot be achieved? #26

@Coder-Liuu

Description

@Coder-Liuu

Hi, I'm also part of the research on Zero Shot Tempoarl Localization Action, and I found that if I use Transformer to model CLIP video frame features, it leads to high mAP in the training set and low mAP in the test set. My guess is that the video frame information from CLIP, after Transformer leads to difficulty in matching with text information. What is the core of solving this problem?

Can you help me? 😭

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions