Skip to content

CellAgent

An LLM-driven agent for single-cell data analysis, ensuring high-quality results with minimal effort.

By coordinating several LLM-driven biological experts, CellAgent automatically conducts step-by-step execution and iterative optimization for various tasks, substantially reducing the workload for science data analyses, bringing us into the “Agent for Science” era. ​

For complex scRNA-seq data analysis tasks, CellAgent can emulate the process of a human expert by first breaking down the task into sub-steps and then executing them sequentially.
Input
This is a classic single-cell data analysis dataset, consisting of 3k human peripheral blood mononuclear cells from a healthy donor. Please perform the cell type annotation task.
Output
After an initial analysis, 6 steps are required to complete your request:
1. Quality Control
2. Normalization
3. Identification of Highly Variable Genes
4. Dimensionality Reduction
5. Clustering
6. Cell Type Annotation

CellAgent surpasses traditional analysing process in its advanced automation capabilities on single-cell data. ​

Traditional scRNA-seq analysis
â—Ź Require programming skills.
â—Ź Require biological expertise.
â—Ź Adjust hyperparameters manually.
python
# Filter based on the properties of current dataset and empirical thresholds
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
# Calculate quality control metrics
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)

# Filter cells with less than 2500 genes and mitochondrial gene percentage less than 5%
adata = adata[adata.obs.n_genes_by_counts < 2500, :]
adata = adata[adata.obs.pct_counts_mt < 5, :]

# Other steps omitted

sc.tl.pca(adata, svd_solver='arpack')
# For this dataset, n_neighbors=10 and n_pcs=40 show clear results
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
sc.tl.leiden(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['leiden'])
"Based on the current data analysis results, and in biomedical research, these genes can typically serve as markers to help us identify different cell types..."
python
# Find differentially expressed genes for each cluster
sc.tl.rank_genes_groups(adata, 'leiden', method='t-test')
sc.pl.rank_genes_groups(adata, n_genes=20, sharey=False)

#After comparing the differentially expressed genes in each cluster, and under the guidance of experts, cell type labels were assigned to each cluster.
cluster_annotations = {
    '0': 'T cells','1': 'B cells','2': 'NK cells','3': 'Monocytes','4': 'Dendritic cells'...
}
adata.obs['cell_type'] = adata.obs['leiden'].map(cluster_annotations)
📄 "Finally,we get the annotation result."
CellAgent
â—Ź Code less, dialogue only.
â—Ź Collaborate with LLM biological experts.
â—Ź Optimize result automatically.
"This is raw PBMC dataset. Please help me complete cell type annotation."
After an initial anlysis, 6 steps are required to complete your request:

1. Quality Control
2. Normalization
3. Identification of Highly
4. Dimensionality Reduction
5. Clustering

...
"Please using the Leiden algorithm to cluster."
...

"Great, help me complete the cell type annotation finally."
Combining differential gene expression across clusters and results from cell type annotation tools like Celltypist, the cell labels for these clusters were finally confirmed and saved as `.obs['final_type']` after evaluation.


CellAgent consistently adapts appropriate tools and hyperparameters to achieve superior outcomes. ​

Task Completion Rate
47%
GPT-4
92%
CellAgent
Task Performance**
107.23%
CellAgent*

* In typical scRNA-seq data analysis tasks, CellAgent's performance can reach 107.23% compared to the widely used and effective existing algorithms.
** The tasks referred to here mainly include batch effect correction, cell type annotation, and trajectory inference, corresponding to the existing algorithms Scanorama, GPT-4 annotation, and Slingshot, respectively.

CellAgent can streamline your single-cell data analysis workflow, allowing you to obtain high-quality results without the need for complex coding. Whether you are a domain expert or a novice, our online platform enables effortless data analysing and interpretation. With CellAgent, you can:

  • Automate Analysis: From data preprocessing to conclusions, CellAgent automates the entire analysis process, significantly reducing manual intervention and allowing you to focus more on scientific discoveries.
  • Interactively Query: Through continuous dialogue, you can submit new requests seamlessly. CellAgent will strive to meet your needs and provide real-time analysis and response.
  • Obtain Reliable Conclusions: CellAgent employs a unique self-iterative optimization mechanism that ensures the reliability and high quality of results throughout the processing.

More on CellAgent ​

Considering the double-blind review principle, all external links on this page are hidden during the paper submission period, and this part of the description is temporarily invisible.

We are excited to see the potential of CellAgent to greatly enhance productivity, foster new discoveries, and deepen our understanding of biological systems.
Try on CellAgent 👉View CellAgent research >