From dc0021704315c2099128527f1652c01612641339 Mon Sep 17 00:00:00 2001 From: Changqing Wang Date: Fri, 29 Aug 2025 15:16:56 +1000 Subject: [PATCH 1/2] add LongBench --- datasets/longbench.yaml | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 datasets/longbench.yaml diff --git a/datasets/longbench.yaml b/datasets/longbench.yaml new file mode 100644 index 000000000..a45cf817d --- /dev/null +++ b/datasets/longbench.yaml @@ -0,0 +1,35 @@ +Name: LongBench - cross-platform reference dataset profiling cancer cell lines with bulk and single-cell approaches +Description: > + LongBench is a comprehensive benchmark dataset of the latest long-read transcriptomics technologies from Oxford Nanopore (ON) and Pacific Biosciences, alongside a comparison with next-generation sequencing from Illumina. We generated bulk and single-cell libraries from lung cancer cell lines which include different cancer subtypes to capture real biological variation. To further compare and assess sequencing platform performance, Sequins and SIRVs (Set 4) synthetic spike-ins have been included. +Documentation: https://github.com/mritchielab/LongBench.io +Contact: mritchie@wehi.edu.au +ManagedBy: Richie Lab, Walter and Eliza Hall Institute of Medical Research +UpdateFrequency: New data is added as soon as it is available. +Tags: + - benchmark + - long read sequencing + - single-cell transcriptomics + - short read sequencing + - bioinformatics + - fastq + - pod5 + - bam + - vcf + - cancer + - life sciences + +License: "[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)" +Resources: + - Description: Bulk, single-cell, and single-nucleus RNA-seq data from the LongBench project, covering eight human lung cancer cell lines. Bulk sequencing (FASTQ) was performed on ONT PCR-cDNA, ONT direct RNA (including pod5 files for RNA modification analysis), PacBio Kinnex, and Illumina platforms. Single-cell and single-nucleus sequencing (FASTQ) was performed on ONT PCR-cDNA, PacBio Kinnex, and Illumina platforms. Aligned reads (BAM), variant calls (VCF), and processed gene expression data are also provided, along with reference genome annotations (GTF and FASTA). + ARN: arn:aws:s3:::longbench-data + Region: ap-southeast-2 + Type: S3 Bucket + +DataAtWork: + Tutorials: + - Title: Benchmarking long-read DE gene and transcript analysis with edgeR + URL: https://mritchielab.github.io/LongBench.io/bulk-de-benchmarking/ + AuthorName: Yupei You + +ADXCategories: + - Healthcare & Life Sciences Data From 7a14f020ba05ff85ca1565da09fb719e4d953013 Mon Sep 17 00:00:00 2001 From: Yupei You Date: Wed, 3 Sep 2025 09:49:25 +1000 Subject: [PATCH 2/2] Add longbench, update license Corrected the phrasing for data availability and updated the license format. --- datasets/longbench.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/datasets/longbench.yaml b/datasets/longbench.yaml index a45cf817d..278c68206 100644 --- a/datasets/longbench.yaml +++ b/datasets/longbench.yaml @@ -4,7 +4,7 @@ Description: > Documentation: https://github.com/mritchielab/LongBench.io Contact: mritchie@wehi.edu.au ManagedBy: Richie Lab, Walter and Eliza Hall Institute of Medical Research -UpdateFrequency: New data is added as soon as it is available. +UpdateFrequency: New data will be added as soon as they are available. Tags: - benchmark - long read sequencing @@ -18,7 +18,7 @@ Tags: - cancer - life sciences -License: "[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)" +License: CC BY-4.0 Resources: - Description: Bulk, single-cell, and single-nucleus RNA-seq data from the LongBench project, covering eight human lung cancer cell lines. Bulk sequencing (FASTQ) was performed on ONT PCR-cDNA, ONT direct RNA (including pod5 files for RNA modification analysis), PacBio Kinnex, and Illumina platforms. Single-cell and single-nucleus sequencing (FASTQ) was performed on ONT PCR-cDNA, PacBio Kinnex, and Illumina platforms. Aligned reads (BAM), variant calls (VCF), and processed gene expression data are also provided, along with reference genome annotations (GTF and FASTA). ARN: arn:aws:s3:::longbench-data