Skip to content

Conversation

@MUDASSIR-75
Copy link

Description

This PR adds a new SupadataLoader to @langchain/community’s document loaders.

The loader is a thin wrapper around the official @supadata/js SDK and supports two Supadata operations:

  • transcript – fetches text transcripts for a URL (e.g. YouTube).
  • metadata – fetches structured metadata for a URL (YouTube via youtube.video, other URLs via web.scrape).

Implementation details:

  • Loader lives at: libs/langchain-community/src/document_loaders/web/supadata.ts.
  • Exposes a typed SupadataLoaderParams interface with:
    • urls (required array of URLs),
    • operation ("transcript" | "metadata", default "transcript"),
    • lang, text, mode,
    • params for arbitrary extra Supadata options.
  • API key resolution:
    • Uses the apiKey constructor param when provided.
    • Fallback to the SUPADATA_API_KEY environment variable via getEnvironmentVariable.
  • Returns LangChain Document instances with:
    • metadata.source set to the URL,
    • metadata.supadataOperation set to "transcript", "transcript_job", or "metadata",
    • for transcripts, pageContent is the transcript text (or a message indicating an async job with jobId).

Usage

import { SupadataLoader } from "@langchain/community/document_loaders/web/supadata";

const loader = new SupadataLoader({
  urls: ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
  operation: "transcript",
  apiKey: process.env.SUPADATA_API_KEY,
  lang: "en",
  text: true,
  mode: "auto",
});

const docs = await loader.load();
console.log(docs[0].pageContent);

Metadata example:

const loader = new SupadataLoader({
  urls: ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
  operation: "metadata",
  apiKey: process.env.SUPADATA_API_KEY,
});

const [doc] = await loader.load();
console.log(doc.metadata.supadataOperation); // "metadata"
console.log(doc.pageContent); // JSON string with Supadata metadata

Tests

From libs/langchain-community:

pnpm test -- supadata

This runs src/document_loaders/tests/supadata.test.ts, which:

  • Mocks @supadata/js and verifies the Supadata client is constructed with the correct apiKey.
  • Asserts that transcript responses are converted into Document objects with the expected pageContent.
  • Asserts that metadata responses are converted into Document objects with the expected pageContent and metadata.supadataOperation.

All tests are currently passing.

Issue

N/A – new integration.

Dependencies

  • Adds @supadata/js as a dev dependency for @langchain/community to support unit tests.
  • The loader dynamically imports @supadata/js at runtime (await import("@supadata/js")), so users must install @supadata/js in their own project to use this loader.

@changeset-bot
Copy link

changeset-bot bot commented Nov 29, 2025

⚠️ No Changeset found

Latest commit: 363cc9c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added community Issues related to `@langchain/community` pkg:@langchain/community labels Nov 29, 2025
@MUDASSIR-75
Copy link
Author

Hi @hntrl, could you please help approve the workflows for this PR? Thank you!

}

private async loadMetadata(client: any, url: string): Promise<Document> {
const isYoutube = url.includes("youtube.com") || url.includes("youtu.be");

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization

'[youtube.com](1)' can be anywhere in the URL, and arbitrary hosts may come before or after it.
@MUDASSIR-75 MUDASSIR-75 force-pushed the feat/supadata-loader-js branch from 007960f to 363cc9c Compare December 4, 2025 05:17
@MUDASSIR-75
Copy link
Author

Hi @jacoblee93 @bracesproul ,

I submitted this, adding a SupadataLoader integration to @langchain/community. All tests are passing and the implementation follows the integration guidelines.

This PR adds:

  • A new document loader for the Supadata API
  • Support for both transcript and metadata operations
  • Full unit test coverage

Would appreciate a review when you have a chance.

Thank you!

@christian-bromann
Copy link
Member

Thank you @MUDASSIR-75 for the contribution 🙏 Unfortunately at this point we no longer accept new additions to the community package anymore. Given the package is already very crowded and has tons of the dependencies I suggest the best approach is to:

  • create your own repository to distribute LangChain integrations, e.g. https://github.com/MUDASSIR-75/langchain-supadata
  • publish the package to NPM as e.g. langchain-supadata
  • comment here so we can add it to the list of recommended integrations

Our team is still working on finding the ideal way to recommend integration packages like that to our community, if you have any feedback here, let me know. Please reach out if I can support with above steps and let me know when you have the package published.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Issues related to `@langchain/community` pkg:@langchain/community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants