Skip to content

Conversation

@abhishekg999
Copy link
Contributor

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @abhishekg999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request marks a significant architectural shift by introducing a new v1 API format for the SDK. The primary goal is to enhance the clarity, consistency, and long-term maintainability of the SDK's core components for tracing, scoring, and evaluation. This involves deprecating older packages, updating examples to reflect the new usage patterns, and establishing a robust, automatically generated API client. The changes aim to provide a more intuitive and future-proof developer experience.

Highlights

  • New v1 API Format Introduction: This pull request introduces a completely new v1 API format, providing a more structured and maintainable interface for the SDK's core functionalities, including tracing, scoring, and evaluation. All new API components are housed under the v1 directory.
  • Refactoring and Deprecation of Legacy Packages: Existing pkg/data, pkg/scorers, and pkg/tracer packages have been marked as deprecated, with comments guiding users to transition to their respective v1 counterparts. This signifies a clear migration path for users to adopt the new API.
  • Updated Example Applications: The example applications (simple-chat and manual-otel) have been updated to utilize the new v1 API, demonstrating its usage and ensuring compatibility with the new structure. This includes changes to imports, client initialization, and attribute setting.
  • Enhanced Build and Scripting Infrastructure: The project's build and scripting mechanisms have been modernized. The Makefile has been removed, and package.json has been introduced to manage scripts for formatting, building, testing, and running examples, offering a more standardized development workflow.
  • Automated API Client Generation: A new Python script (scripts/generate-client-v1.py) has been added to automate the generation of the v1 API client and its models from an OpenAPI specification, streamlining API updates and ensuring consistency.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new v1 API format for the Judgeval Go SDK, deprecating the previous structure. The changes are extensive, touching upon API models, client generation, tracer implementation, and examples. The new v1 API is well-structured, using idiomatic Go patterns like functional options and factories. The client generation from an OpenAPI spec is a solid approach.

I've found one critical issue in the example code related to resource management (defer in a loop) and one medium-severity issue in the new tracer implementation regarding incomplete attribute handling. My feedback includes suggestions to fix these issues. Overall, this is a great step forward for the SDK's architecture.

Copy link

@justinsheu justinsheu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is the plan to eventually add llm client wrapping? or is it different since it seems like openai calls are made directly to the http endpoint instead of some go library
  2. also i'm assuming this is because of the limitations of go, but i noticed in the example that instrumentation is a lot more manual than python, such as having to manually indicate the beginning and end of a span and having to manually set input/output

return &Client{
apiClient: apiClient,
Tracer: &TracerFactory{client: apiClient},
Scorers: newScorersFactory(apiClient),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to same pattern as tracer and evaluation factories

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this is correct since scorerFactory creates a further interface.

Initialize *bool
}

func (f *TracerFactory) Create(ctx context.Context, params TracerCreateParams) (*Tracer, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to use options here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there definately is an argument both ways, I think with tracer tho it might be good for more explicit specification?

Also following stainless / openai sdk pattenrs, only the top level client uses options so I think this is fair

return scorer, nil
}

func (f *PromptScorerFactory) Create(params PromptScorerCreateParams) (*PromptScorer, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to tracer, is there a reason this uses params struct instead of options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes trying to follow same kinda design pattern as: https://github.com/openai/openai-go

client *api.Client
}

func (f *EvaluationFactory) Create(params EvaluationCreateParams) *Evaluation {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this wip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be where run evaluation goes, but yes nothing for now, same spec as rest of clients

@abhishekg999
Copy link
Contributor Author

@justinsheu

is the plan to eventually add llm client wrapping? or is it different since it seems like openai calls are made directly to the http endpoint instead of some go library

Regarding llm clients, we definately can.
I also noticed that: https://docs.langwatch.ai/integration/go/integrations/open-ai does work with us so we can also have a integrations page in our docs for this.

also i'm assuming this is because of the limitations of go, but i noticed in the example that instrumentation is a lot more manual than python, such as having to manually indicate the beginning and end of a span and having to manually set input/output

This is correct: Regarding manual spans this the recommended way, we maybe can handle callable utils but there is no concept of decorators of go So the api will always be more manual for arbitrary things (similar to java). For lower level languages I want to prevent the amount of "magic" that we do.

Copy link

@justinsheu justinsheu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just minor questions:

  1. for scorers, i noticed that only promptscorer and tracepromptscorer return a tuple with error, while others like built in and custom just return the return value. should this be standardized?

v1/tracer.go Outdated
ProjectName: b.projectName,
EvalName: runID,
Model: modelName,
Id: uuid.New().String(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this remove the ability to specify evaluation model on sdk? if so does this mean the backend will always use the default model now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Model lives on the Scorer object and can be configured there and moving stuff out of async evaluate

Copy link

@justinsheu justinsheu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@abhishekg999
Copy link
Contributor Author

for scorers, i noticed that only promptscorer and tracepromptscorer return a tuple with error, while others like built in and custom just return the return value. should this be standardized?

actually good point, for now custom scorer doesnt actually make a api call, but ill make it match teh same behavior , tho it never actually fails since it just returns a struct with the provided vals

@abhishekg999 abhishekg999 merged commit 203027e into main Nov 11, 2025
@abhishekg999 abhishekg999 deleted the ahh/v1 branch November 11, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants