LangChain 教程 34｜项目实战：AI 研究助手

📖 本篇导读：这是 LangChain 系列教程的第 34 篇。本篇将构建一个 AI 研究助手，支持多源文献检索、自动摘要提取、观点对比分析和研究报告撰写。读完预计需要 25 分钟。

项目概述

难度：⭐⭐⭐⭐⭐

核心功能：

多源文献检索：学术数据库、论文仓库、网络资源
自动摘要提取：从长文档中提取关键信息
观点对比分析：比较不同文献的观点异同
研究报告撰写：生成结构化的研究报告

技术栈：

AI 框架：LangChain 1.x + LangGraph
Agent 架构：Agentic RAG + Subagents + MCP + 上下文管理
前端：React 18 + TypeScript + Ant Design + Zustand
后端：Express + Prisma + MySQL + Redis
向量数据库：Chroma
文档处理：PDF.js + Mammoth

第一部分：架构设计

1.1 什么是 Agentic RAG？

传统 RAG vs Agentic RAG

传统 RAG 是一个相对被动的检索-生成流程：

传统 RAG:
用户问题 → 向量检索 → 拼接上下文 → 生成回答

Agentic RAG 则是一个主动、智能的研究过程：

Agentic RAG:
用户问题 → 规划检索策略 → 多轮主动检索 → 验证信息质量 
         → 发现信息缺口 → 补充检索 → 整合分析 → 生成回答

核心区别：

特性	传统 RAG	Agentic RAG
检索方式	一次性检索	多轮迭代检索
策略制定	固定流程	动态规划
信息验证	无	主动验证
缺口发现	无	自动识别并补充
来源整合	简单拼接	智能整合去重

传统 RAG vs Agentic RAG 对比

生活化比喻：

传统 RAG 像是图书馆的自动借书机器 —— 你说要借什么书，它就去书架上拿。

Agentic RAG 像是一个研究生助理：

你说"我想研究量子计算的最新进展"
他先思考："量子计算？我应该看计算机期刊、物理期刊，还有看看谷歌、IBM 的最新论文"
他去不同地方搜索，找到一些论文
读完发现："这篇提到了 error correction，但没详细说，我应该再找找这方面的资料"
继续搜索，补充信息
最后整理所有材料，给你一份完整的研究报告

1.2 系统整体架构

┌─────────────────────────────────────────────────────────────┐
│                        用户界面层                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ 研究输入  │  │ 文献管理  │  │ 进度展示  │  │ 报告导出  │    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     研究助手主 Agent                         │
│  ┌─────────────────────────────────────────────────────────┐│
│  │                    研究规划器                            ││
│  │  • 分析研究问题                                          ││
│  │  • 制定检索策略                                          ││
│  │  • 协调子 Agent                                         ││
│  │  • 评估研究进度                                          ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│   文献检索 Agent │ │   摘要提取 Agent │ │   分析对比 Agent │
│                 │ │                 │ │                 │
│ • 学术搜索      │ │ • 关键信息提取  │ │ • 观点识别      │
│ • 网络搜索      │ │ • 核心论点识别  │ │ • 异同对比      │
│ • 本地库检索    │ │ • 方法论提取    │ │ • 论据评估      │
│ • 结果排序过滤  │ │ • 结论总结      │ │ • 关系图谱      │
└─────────────────┘ └─────────────────┘ └─────────────────┘
          │                   │                   │
          └───────────────────┼───────────────────┘
                              ▼
                    ┌─────────────────┐
                    │   报告撰写 Agent │
                    │                 │
                    │ • 结构组织      │
                    │ • 内容整合      │
                    │ • 引用管理      │
                    │ • 格式输出      │
                    └─────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                       上下文管理层                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ 会话历史  │  │ 文献缓存  │  │ 检索状态  │  │ 报告草稿  │    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                         MCP 工具层                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ 学术搜索  │  │ 网络搜索  │  │ PDF 处理  │  │ 文件系统  │    │
│  │  Server   │  │  Server   │  │  Server   │  │  Server   │    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
└─────────────────────────────────────────────────────────────┘

AI 研究助手四层系统架构

1.3 Agentic RAG 工作流详解

┌───────────────────────────────────────────────────────────────┐
│                     Agentic RAG 工作流                        │
├───────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌─────────┐                                                  │
│  │  START  │                                                  │
│  └────┬────┘                                                  │
│       │                                                       │
│       ▼                                                       │
│  ┌─────────────────┐                                          │
│  │   问题分析       │ ← 理解研究主题，识别关键概念             │
│  └────────┬────────┘                                          │
│           │                                                   │
│           ▼                                                   │
│  ┌─────────────────┐                                          │
│  │   策略规划       │ ← 确定检索源、关键词、优先级             │
│  └────────┬────────┘                                          │
│           │                                                   │
│           ▼                                                   │
│  ┌─────────────────┐        ┌─────────────────┐              │
│  │   执行检索       │───────→│   质量评估       │              │
│  └────────┬────────┘        └────────┬────────┘              │
│           │                          │                        │
│           │                          ▼                        │
│           │                 ┌─────────────────┐              │
│           │                 │  信息是否充足?   │              │
│           │                 └────────┬────────┘              │
│           │                    ┌─────┴─────┐                  │
│           │                    │           │                  │
│           │                   否           是                  │
│           │                    │           │                  │
│           │                    ▼           │                  │
│           │           ┌───────────────┐    │                  │
│           │           │  发现信息缺口  │    │                  │
│           │           └───────┬───────┘    │                  │
│           │                   │            │                  │
│           │                   ▼            │                  │
│           │           ┌───────────────┐    │                  │
│           └───────────│  补充检索策略  │    │                  │
│                       └───────────────┘    │                  │
│                                            │                  │
│                                            ▼                  │
│                               ┌─────────────────┐            │
│                               │   信息整合       │            │
│                               └────────┬────────┘            │
│                                        │                      │
│                                        ▼                      │
│                               ┌─────────────────┐            │
│                               │   生成报告       │            │
│                               └────────┬────────┘            │
│                                        │                      │
│                                        ▼                      │
│                                   ┌─────────┐                │
│                                   │   END   │                │
│                                   └─────────┘                │
│                                                               │
└───────────────────────────────────────────────────────────────┘

1.4 数据库设计

prisma

// prisma/schema.prisma

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "mysql"
  url      = env("DATABASE_URL")
}

// 用户表
model User {
  id           Int       @id @default(autoincrement())
  email        String    @unique
  name         String?
  passwordHash String
  
  projects     ResearchProject[]
  papers       Paper[]
  
  createdAt    DateTime  @default(now())
  updatedAt    DateTime  @updatedAt
}

// 研究项目
model ResearchProject {
  id          Int      @id @default(autoincrement())
  userId      Int
  user        User     @relation(fields: [userId], references: [id])
  
  title       String
  description String?  @db.Text
  topic       String
  keywords    String   @db.Text  // JSON array
  status      String   @default("active")  // active, completed, archived
  
  papers      ProjectPaper[]
  reports     ResearchReport[]
  sessions    ResearchSession[]
  
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt
  
  @@index([userId])
  @@index([status])
}

// 论文/文献
model Paper {
  id           Int      @id @default(autoincrement())
  userId       Int
  user         User     @relation(fields: [userId], references: [id])
  
  title        String   @db.Text
  authors      String   @db.Text  // JSON array
  abstract     String?  @db.LongText
  content      String?  @db.LongText
  
  source       String   // arxiv, semantic_scholar, google_scholar, web, local
  sourceUrl    String?  @db.Text
  sourceId     String?  // 外部ID
  doi          String?
  
  publishedAt  DateTime?
  venue        String?  // 期刊/会议名称
  citations    Int      @default(0)
  
  // 处理状态
  isProcessed  Boolean  @default(false)
  summary      String?  @db.LongText  // AI 生成的摘要
  keyPoints    String?  @db.LongText  // JSON array，关键观点
  methodology  String?  @db.Text      // 方法论
  conclusions  String?  @db.Text      // 结论
  
  // 向量嵌入
  embeddingId  String?  // Chroma 中的 ID
  
  projects     ProjectPaper[]
  
  createdAt    DateTime @default(now())
  updatedAt    DateTime @updatedAt
  
  @@unique([source, sourceId])
  @@index([userId])
  @@index([source])
  @@index([isProcessed])
}

// 项目-论文关联表
model ProjectPaper {
  id        Int             @id @default(autoincrement())
  projectId Int
  project   ResearchProject @relation(fields: [projectId], references: [id])
  paperId   Int
  paper     Paper           @relation(fields: [paperId], references: [id])
  
  relevance Float           @default(0)  // 相关性评分
  notes     String?         @db.Text     // 用户笔记
  tags      String?         @db.Text     // JSON array
  
  createdAt DateTime        @default(now())
  
  @@unique([projectId, paperId])
  @@index([projectId])
  @@index([paperId])
}

// 研究报告
model ResearchReport {
  id        Int             @id @default(autoincrement())
  projectId Int
  project   ResearchProject @relation(fields: [projectId], references: [id])
  
  title     String
  content   String          @db.LongText
  format    String          @default("markdown")  // markdown, html, pdf
  version   Int             @default(1)
  
  // 报告结构
  outline   String?         @db.LongText  // JSON，报告大纲
  sections  String?         @db.LongText  // JSON，各节内容
  references String?        @db.LongText  // JSON，引用列表
  
  status    String          @default("draft")  // draft, final
  
  createdAt DateTime        @default(now())
  updatedAt DateTime        @updatedAt
  
  @@index([projectId])
  @@index([status])
}

// 研究会话
model ResearchSession {
  id        Int             @id @default(autoincrement())
  projectId Int
  project   ResearchProject @relation(fields: [projectId], references: [id])
  
  messages  Message[]
  
  // 会话上下文
  context   String?         @db.LongText  // JSON，当前研究上下文
  
  createdAt DateTime        @default(now())
  updatedAt DateTime        @updatedAt
  
  @@index([projectId])
}

// 对话消息
model Message {
  id        Int             @id @default(autoincrement())
  sessionId Int
  session   ResearchSession @relation(fields: [sessionId], references: [id])
  
  role      String          // user, assistant, system
  content   String          @db.LongText
  
  // 元数据
  metadata  String?         @db.LongText  // JSON，包含搜索结果、引用等
  
  createdAt DateTime        @default(now())
  
  @@index([sessionId])
}

// 检索记录
model SearchLog {
  id        Int      @id @default(autoincrement())
  projectId Int?
  
  query     String   @db.Text
  source    String   // 搜索源
  results   Int      // 结果数量
  duration  Int      // 耗时（毫秒）
  
  createdAt DateTime @default(now())
  
  @@index([projectId])
  @@index([source])
}

第二部分：MCP 集成 - 多源文献检索

2.1 MCP 服务架构

MCP（Model Context Protocol）为我们提供了标准化的工具接口。研究助手需要对接多个数据源：

┌─────────────────────────────────────────────────────────────┐
│                      MCP Client Manager                     │
│                                                             │
│   ┌────────────────────────────────────────────────────┐   │
│   │              Tool Router                            │   │
│   │   根据工具名称路由到正确的 MCP Server               │   │
│   └────────────────────────────────────────────────────┘   │
│                            │                                │
│       ┌────────────────────┼────────────────────┐          │
│       │                    │                    │          │
│       ▼                    ▼                    ▼          │
│  ┌─────────┐         ┌─────────┐         ┌─────────┐      │
│  │ Arxiv   │         │ Scholar │         │ Web     │      │
│  │ Server  │         │ Server  │         │ Server  │      │
│  ├─────────┤         ├─────────┤         ├─────────┤      │
│  │• search │         │• search │         │• search │      │
│  │• fetch  │         │• cite   │         │• fetch  │      │
│  │• pdf    │         │• author │         │• extract│      │
│  └─────────┘         └─────────┘         └─────────┘      │
│                                                             │
│       ┌────────────────────┼────────────────────┐          │
│       │                    │                    │          │
│       ▼                    ▼                    ▼          │
│  ┌─────────┐         ┌─────────┐         ┌─────────┐      │
│  │ PDF     │         │ File    │         │ Vector  │      │
│  │ Server  │         │ Server  │         │ Server  │      │
│  ├─────────┤         ├─────────┤         ├─────────┤      │
│  │• parse  │         │• read   │         │• search │      │
│  │• extract│         │• write  │         │• add    │      │
│  │• chunk  │         │• list   │         │• delete │      │
│  └─────────┘         └─────────┘         └─────────┘      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

MCP 多源工具集成架构

2.2 学术搜索 MCP Server

typescript

// src/mcp/servers/arxiv-server.ts

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
  Tool,
} from '@modelcontextprotocol/sdk/types.js';
import axios from 'axios';
import { XMLParser } from 'fast-xml-parser';

interface ArxivPaper {
  id: string;
  title: string;
  authors: string[];
  abstract: string;
  published: string;
  updated: string;
  categories: string[];
  pdfUrl: string;
  doi?: string;
}

interface SearchResult {
  papers: ArxivPaper[];
  totalResults: number;
  query: string;
}

const ARXIV_API_URL = 'http://export.arxiv.org/api/query';

async function searchArxiv(
  query: string,
  maxResults: number = 10,
  start: number = 0,
  sortBy: 'relevance' | 'lastUpdatedDate' | 'submittedDate' = 'relevance'
): Promise<SearchResult> {
  const params = new URLSearchParams({
    search_query: `all:${query}`,
    start: start.toString(),
    max_results: maxResults.toString(),
    sortBy,
    sortOrder: 'descending',
  });

  const response = await axios.get(`${ARXIV_API_URL}?${params}`);
  const parser = new XMLParser({
    ignoreAttributes: false,
    attributeNamePrefix: '@_',
  });

  const result = parser.parse(response.data);
  const feed = result.feed;

  if (!feed.entry) {
    return { papers: [], totalResults: 0, query };
  }

  const entries = Array.isArray(feed.entry) ? feed.entry : [feed.entry];

  const papers: ArxivPaper[] = entries.map((entry: any) => {
    const authors = Array.isArray(entry.author)
      ? entry.author.map((a: any) => a.name)
      : [entry.author?.name || 'Unknown'];

    const categories = Array.isArray(entry.category)
      ? entry.category.map((c: any) => c['@_term'])
      : [entry.category?.['@_term'] || 'Unknown'];

    const links = Array.isArray(entry.link) ? entry.link : [entry.link];
    const pdfLink = links.find((l: any) => l['@_title'] === 'pdf');

    return {
      id: entry.id.split('/abs/').pop() || entry.id,
      title: entry.title.replace(/\n/g, ' ').trim(),
      authors,
      abstract: entry.summary.replace(/\n/g, ' ').trim(),
      published: entry.published,
      updated: entry.updated,
      categories,
      pdfUrl: pdfLink?.['@_href'] || `https://arxiv.org/pdf/${entry.id.split('/abs/').pop()}`,
      doi: entry['arxiv:doi']?.['#text'],
    };
  });

  const totalResults = parseInt(feed['opensearch:totalResults'] || '0', 10);

  return { papers, totalResults, query };
}

async function fetchPaperDetails(arxivId: string): Promise<ArxivPaper | null> {
  const params = new URLSearchParams({
    id_list: arxivId,
  });

  const response = await axios.get(`${ARXIV_API_URL}?${params}`);
  const parser = new XMLParser({
    ignoreAttributes: false,
    attributeNamePrefix: '@_',
  });

  const result = parser.parse(response.data);
  const entry = result.feed.entry;

  if (!entry) {
    return null;
  }

  const authors = Array.isArray(entry.author)
    ? entry.author.map((a: any) => a.name)
    : [entry.author?.name || 'Unknown'];

  const categories = Array.isArray(entry.category)
    ? entry.category.map((c: any) => c['@_term'])
    : [entry.category?.['@_term'] || 'Unknown'];

  return {
    id: arxivId,
    title: entry.title.replace(/\n/g, ' ').trim(),
    authors,
    abstract: entry.summary.replace(/\n/g, ' ').trim(),
    published: entry.published,
    updated: entry.updated,
    categories,
    pdfUrl: `https://arxiv.org/pdf/${arxivId}`,
    doi: entry['arxiv:doi']?.['#text'],
  };
}

async function downloadPdf(arxivId: string): Promise<Buffer> {
  const pdfUrl = `https://arxiv.org/pdf/${arxivId}.pdf`;
  const response = await axios.get(pdfUrl, {
    responseType: 'arraybuffer',
  });
  return Buffer.from(response.data);
}

const tools: Tool[] = [
  {
    name: 'arxiv_search',
    description: 'Search for academic papers on arXiv. Returns paper titles, authors, abstracts, and links.',
    inputSchema: {
      type: 'object',
      properties: {
        query: {
          type: 'string',
          description: 'Search query for finding papers',
        },
        maxResults: {
          type: 'number',
          description: 'Maximum number of results to return (default: 10, max: 50)',
          default: 10,
        },
        sortBy: {
          type: 'string',
          enum: ['relevance', 'lastUpdatedDate', 'submittedDate'],
          description: 'Sort order for results',
          default: 'relevance',
        },
      },
      required: ['query'],
    },
  },
  {
    name: 'arxiv_fetch',
    description: 'Fetch detailed information about a specific arXiv paper by its ID',
    inputSchema: {
      type: 'object',
      properties: {
        arxivId: {
          type: 'string',
          description: 'arXiv paper ID (e.g., "2301.00234" or "cs.AI/0601001")',
        },
      },
      required: ['arxivId'],
    },
  },
  {
    name: 'arxiv_download_pdf',
    description: 'Download the PDF of an arXiv paper. Returns the PDF as base64.',
    inputSchema: {
      type: 'object',
      properties: {
        arxivId: {
          type: 'string',
          description: 'arXiv paper ID',
        },
      },
      required: ['arxivId'],
    },
  },
];

const server = new Server(
  {
    name: 'arxiv-server',
    version: '1.0.0',
  },
  {
    capabilities: {
      tools: {},
    },
  }
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools,
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  try {
    switch (name) {
      case 'arxiv_search': {
        const { query, maxResults = 10, sortBy = 'relevance' } = args as {
          query: string;
          maxResults?: number;
          sortBy?: 'relevance' | 'lastUpdatedDate' | 'submittedDate';
        };
        const results = await searchArxiv(query, Math.min(maxResults, 50), 0, sortBy);
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(results, null, 2),
            },
          ],
        };
      }

      case 'arxiv_fetch': {
        const { arxivId } = args as { arxivId: string };
        const paper = await fetchPaperDetails(arxivId);
        if (!paper) {
          return {
            content: [{ type: 'text', text: `Paper not found: ${arxivId}` }],
            isError: true,
          };
        }
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(paper, null, 2),
            },
          ],
        };
      }

      case 'arxiv_download_pdf': {
        const { arxivId } = args as { arxivId: string };
        const pdfBuffer = await downloadPdf(arxivId);
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                arxivId,
                size: pdfBuffer.length,
                base64: pdfBuffer.toString('base64'),
              }),
            },
          ],
        };
      }

      default:
        return {
          content: [{ type: 'text', text: `Unknown tool: ${name}` }],
          isError: true,
        };
    }
  } catch (error) {
    return {
      content: [
        {
          type: 'text',
          text: `Error: ${error instanceof Error ? error.message : 'Unknown error'}`,
        },
      ],
      isError: true,
    };
  }
});

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
}

main().catch(console.error);

2.3 Semantic Scholar MCP Server

typescript

// src/mcp/servers/semantic-scholar-server.ts

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
  Tool,
} from '@modelcontextprotocol/sdk/types.js';
import axios from 'axios';

interface SemanticScholarPaper {
  paperId: string;
  title: string;
  abstract: string | null;
  authors: Array<{ authorId: string; name: string }>;
  year: number | null;
  venue: string | null;
  citationCount: number;
  referenceCount: number;
  fieldsOfStudy: string[] | null;
  url: string;
  openAccessPdf: { url: string } | null;
  tldr: { text: string } | null;
}

interface SearchResponse {
  total: number;
  offset: number;
  data: SemanticScholarPaper[];
}

interface AuthorInfo {
  authorId: string;
  name: string;
  affiliations: string[];
  paperCount: number;
  citationCount: number;
  hIndex: number;
}

const API_BASE = 'https://api.semanticscholar.org/graph/v1';
const API_KEY = process.env.SEMANTIC_SCHOLAR_API_KEY;

const headers = API_KEY ? { 'x-api-key': API_KEY } : {};

async function searchPapers(
  query: string,
  limit: number = 10,
  fields: string[] = ['paperId', 'title', 'abstract', 'authors', 'year', 'venue', 'citationCount', 'openAccessPdf', 'tldr'],
  yearFilter?: { min?: number; max?: number }
): Promise<SearchResponse> {
  const params: Record<string, string> = {
    query,
    limit: limit.toString(),
    fields: fields.join(','),
  };

  if (yearFilter?.min) params.year = `${yearFilter.min}-`;
  if (yearFilter?.max) params.year = yearFilter.min ? `${yearFilter.min}-${yearFilter.max}` : `-${yearFilter.max}`;

  const response = await axios.get<SearchResponse>(`${API_BASE}/paper/search`, {
    params,
    headers,
  });

  return response.data;
}

async function getPaperDetails(paperId: string): Promise<SemanticScholarPaper> {
  const fields = [
    'paperId', 'title', 'abstract', 'authors', 'year', 'venue',
    'citationCount', 'referenceCount', 'fieldsOfStudy', 'url',
    'openAccessPdf', 'tldr'
  ].join(',');

  const response = await axios.get<SemanticScholarPaper>(
    `${API_BASE}/paper/${paperId}`,
    {
      params: { fields },
      headers,
    }
  );

  return response.data;
}

async function getPaperCitations(
  paperId: string,
  limit: number = 10
): Promise<{ citingPaper: SemanticScholarPaper }[]> {
  const response = await axios.get(
    `${API_BASE}/paper/${paperId}/citations`,
    {
      params: {
        fields: 'paperId,title,authors,year,citationCount',
        limit: limit.toString(),
      },
      headers,
    }
  );

  return response.data.data;
}

async function getPaperReferences(
  paperId: string,
  limit: number = 10
): Promise<{ citedPaper: SemanticScholarPaper }[]> {
  const response = await axios.get(
    `${API_BASE}/paper/${paperId}/references`,
    {
      params: {
        fields: 'paperId,title,authors,year,citationCount',
        limit: limit.toString(),
      },
      headers,
    }
  );

  return response.data.data;
}

async function getAuthorInfo(authorId: string): Promise<AuthorInfo> {
  const response = await axios.get<AuthorInfo>(
    `${API_BASE}/author/${authorId}`,
    {
      params: {
        fields: 'authorId,name,affiliations,paperCount,citationCount,hIndex',
      },
      headers,
    }
  );

  return response.data;
}

async function getAuthorPapers(
  authorId: string,
  limit: number = 10
): Promise<SemanticScholarPaper[]> {
  const response = await axios.get(
    `${API_BASE}/author/${authorId}/papers`,
    {
      params: {
        fields: 'paperId,title,year,citationCount,venue',
        limit: limit.toString(),
      },
      headers,
    }
  );

  return response.data.data;
}

const tools: Tool[] = [
  {
    name: 'semantic_scholar_search',
    description: 'Search for academic papers in Semantic Scholar database',
    inputSchema: {
      type: 'object',
      properties: {
        query: {
          type: 'string',
          description: 'Search query for finding papers',
        },
        limit: {
          type: 'number',
          description: 'Maximum number of results (default: 10, max: 100)',
          default: 10,
        },
        yearMin: {
          type: 'number',
          description: 'Minimum publication year',
        },
        yearMax: {
          type: 'number',
          description: 'Maximum publication year',
        },
      },
      required: ['query'],
    },
  },
  {
    name: 'semantic_scholar_paper_details',
    description: 'Get detailed information about a specific paper',
    inputSchema: {
      type: 'object',
      properties: {
        paperId: {
          type: 'string',
          description: 'Semantic Scholar paper ID, DOI, or arXiv ID',
        },
      },
      required: ['paperId'],
    },
  },
  {
    name: 'semantic_scholar_citations',
    description: 'Get papers that cite a specific paper',
    inputSchema: {
      type: 'object',
      properties: {
        paperId: {
          type: 'string',
          description: 'Semantic Scholar paper ID',
        },
        limit: {
          type: 'number',
          description: 'Maximum number of citations to return',
          default: 10,
        },
      },
      required: ['paperId'],
    },
  },
  {
    name: 'semantic_scholar_references',
    description: 'Get papers referenced by a specific paper',
    inputSchema: {
      type: 'object',
      properties: {
        paperId: {
          type: 'string',
          description: 'Semantic Scholar paper ID',
        },
        limit: {
          type: 'number',
          description: 'Maximum number of references to return',
          default: 10,
        },
      },
      required: ['paperId'],
    },
  },
  {
    name: 'semantic_scholar_author',
    description: 'Get information about an author and their papers',
    inputSchema: {
      type: 'object',
      properties: {
        authorId: {
          type: 'string',
          description: 'Semantic Scholar author ID',
        },
        includePapers: {
          type: 'boolean',
          description: 'Whether to include author papers',
          default: false,
        },
      },
      required: ['authorId'],
    },
  },
];

const server = new Server(
  {
    name: 'semantic-scholar-server',
    version: '1.0.0',
  },
  {
    capabilities: {
      tools: {},
    },
  }
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools,
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  try {
    switch (name) {
      case 'semantic_scholar_search': {
        const { query, limit = 10, yearMin, yearMax } = args as {
          query: string;
          limit?: number;
          yearMin?: number;
          yearMax?: number;
        };
        const yearFilter = yearMin || yearMax ? { min: yearMin, max: yearMax } : undefined;
        const results = await searchPapers(query, Math.min(limit, 100), undefined, yearFilter);
        return {
          content: [{ type: 'text', text: JSON.stringify(results, null, 2) }],
        };
      }

      case 'semantic_scholar_paper_details': {
        const { paperId } = args as { paperId: string };
        const paper = await getPaperDetails(paperId);
        return {
          content: [{ type: 'text', text: JSON.stringify(paper, null, 2) }],
        };
      }

      case 'semantic_scholar_citations': {
        const { paperId, limit = 10 } = args as { paperId: string; limit?: number };
        const citations = await getPaperCitations(paperId, limit);
        return {
          content: [{ type: 'text', text: JSON.stringify(citations, null, 2) }],
        };
      }

      case 'semantic_scholar_references': {
        const { paperId, limit = 10 } = args as { paperId: string; limit?: number };
        const references = await getPaperReferences(paperId, limit);
        return {
          content: [{ type: 'text', text: JSON.stringify(references, null, 2) }],
        };
      }

      case 'semantic_scholar_author': {
        const { authorId, includePapers = false } = args as {
          authorId: string;
          includePapers?: boolean;
        };
        const authorInfo = await getAuthorInfo(authorId);
        let result: any = authorInfo;
        
        if (includePapers) {
          const papers = await getAuthorPapers(authorId);
          result = { ...authorInfo, papers };
        }
        
        return {
          content: [{ type: 'text', text: JSON.stringify(result, null, 2) }],
        };
      }

      default:
        return {
          content: [{ type: 'text', text: `Unknown tool: ${name}` }],
          isError: true,
        };
    }
  } catch (error) {
    return {
      content: [
        {
          type: 'text',
          text: `Error: ${error instanceof Error ? error.message : 'Unknown error'}`,
        },
      ],
      isError: true,
    };
  }
});

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
}

main().catch(console.error);

2.4 PDF 处理 MCP Server

typescript

// src/mcp/servers/pdf-server.ts

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
  Tool,
} from '@modelcontextprotocol/sdk/types.js';
import * as pdfParse from 'pdf-parse';
import * as fs from 'fs/promises';
import * as path from 'path';

interface PDFMetadata {
  title: string | null;
  author: string | null;
  subject: string | null;
  keywords: string | null;
  creator: string | null;
  producer: string | null;
  creationDate: Date | null;
  modificationDate: Date | null;
}

interface PDFContent {
  text: string;
  pages: number;
  metadata: PDFMetadata;
}

interface TextChunk {
  content: string;
  pageNumber: number;
  chunkIndex: number;
  startChar: number;
  endChar: number;
}

async function parsePDF(filePath: string): Promise<PDFContent> {
  const absolutePath = path.resolve(filePath);
  const dataBuffer = await fs.readFile(absolutePath);
  const data = await pdfParse(dataBuffer);

  return {
    text: data.text,
    pages: data.numpages,
    metadata: {
      title: data.info?.Title || null,
      author: data.info?.Author || null,
      subject: data.info?.Subject || null,
      keywords: data.info?.Keywords || null,
      creator: data.info?.Creator || null,
      producer: data.info?.Producer || null,
      creationDate: data.info?.CreationDate ? new Date(data.info.CreationDate) : null,
      modificationDate: data.info?.ModDate ? new Date(data.info.ModDate) : null,
    },
  };
}

async function parsePDFFromBuffer(buffer: Buffer): Promise<PDFContent> {
  const data = await pdfParse(buffer);

  return {
    text: data.text,
    pages: data.numpages,
    metadata: {
      title: data.info?.Title || null,
      author: data.info?.Author || null,
      subject: data.info?.Subject || null,
      keywords: data.info?.Keywords || null,
      creator: data.info?.Creator || null,
      producer: data.info?.Producer || null,
      creationDate: data.info?.CreationDate ? new Date(data.info.CreationDate) : null,
      modificationDate: data.info?.ModDate ? new Date(data.info.ModDate) : null,
    },
  };
}

function chunkText(
  text: string,
  chunkSize: number = 1000,
  overlap: number = 200
): TextChunk[] {
  const chunks: TextChunk[] = [];
  const paragraphs = text.split(/\n\n+/);
  
  let currentChunk = '';
  let currentStartChar = 0;
  let charPosition = 0;
  let chunkIndex = 0;

  for (const paragraph of paragraphs) {
    if (currentChunk.length + paragraph.length > chunkSize && currentChunk.length > 0) {
      chunks.push({
        content: currentChunk.trim(),
        pageNumber: 0,
        chunkIndex,
        startChar: currentStartChar,
        endChar: charPosition,
      });
      
      chunkIndex++;
      const overlapStart = Math.max(0, currentChunk.length - overlap);
      currentChunk = currentChunk.slice(overlapStart);
      currentStartChar = charPosition - currentChunk.length;
    }
    
    currentChunk += (currentChunk ? '\n\n' : '') + paragraph;
    charPosition += paragraph.length + 2;
  }

  if (currentChunk.trim()) {
    chunks.push({
      content: currentChunk.trim(),
      pageNumber: 0,
      chunkIndex,
      startChar: currentStartChar,
      endChar: charPosition,
    });
  }

  return chunks;
}

function extractSections(text: string): Record<string, string> {
  const sections: Record<string, string> = {};
  
  const sectionPatterns = [
    /^(abstract|摘要)[:\s]*/im,
    /^(introduction|引言|1\.?\s*introduction)[:\s]*/im,
    /^(background|背景|related work|相关工作)[:\s]*/im,
    /^(method|methodology|方法)[:\s]*/im,
    /^(experiment|experiments|实验)[:\s]*/im,
    /^(result|results|结果)[:\s]*/im,
    /^(discussion|讨论)[:\s]*/im,
    /^(conclusion|conclusions|结论)[:\s]*/im,
    /^(reference|references|参考文献)[:\s]*/im,
  ];

  const lines = text.split('\n');
  let currentSection = 'preamble';
  let currentContent: string[] = [];

  for (const line of lines) {
    let foundSection = false;
    
    for (const pattern of sectionPatterns) {
      if (pattern.test(line)) {
        if (currentContent.length > 0) {
          sections[currentSection] = currentContent.join('\n').trim();
        }
        
        const match = line.match(pattern);
        currentSection = match ? match[1].toLowerCase() : 'unknown';
        currentContent = [];
        foundSection = true;
        break;
      }
    }
    
    if (!foundSection) {
      currentContent.push(line);
    }
  }

  if (currentContent.length > 0) {
    sections[currentSection] = currentContent.join('\n').trim();
  }

  return sections;
}

const tools: Tool[] = [
  {
    name: 'pdf_parse',
    description: 'Parse a PDF file and extract its text content and metadata',
    inputSchema: {
      type: 'object',
      properties: {
        filePath: {
          type: 'string',
          description: 'Path to the PDF file',
        },
      },
      required: ['filePath'],
    },
  },
  {
    name: 'pdf_parse_base64',
    description: 'Parse a PDF from base64 encoded content',
    inputSchema: {
      type: 'object',
      properties: {
        base64Content: {
          type: 'string',
          description: 'Base64 encoded PDF content',
        },
      },
      required: ['base64Content'],
    },
  },
  {
    name: 'pdf_chunk',
    description: 'Split PDF text into overlapping chunks for processing',
    inputSchema: {
      type: 'object',
      properties: {
        filePath: {
          type: 'string',
          description: 'Path to the PDF file',
        },
        chunkSize: {
          type: 'number',
          description: 'Target size for each chunk in characters (default: 1000)',
          default: 1000,
        },
        overlap: {
          type: 'number',
          description: 'Overlap between chunks in characters (default: 200)',
          default: 200,
        },
      },
      required: ['filePath'],
    },
  },
  {
    name: 'pdf_extract_sections',
    description: 'Extract common academic paper sections (abstract, introduction, methods, etc.)',
    inputSchema: {
      type: 'object',
      properties: {
        filePath: {
          type: 'string',
          description: 'Path to the PDF file',
        },
      },
      required: ['filePath'],
    },
  },
];

const server = new Server(
  {
    name: 'pdf-server',
    version: '1.0.0',
  },
  {
    capabilities: {
      tools: {},
    },
  }
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools,
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  try {
    switch (name) {
      case 'pdf_parse': {
        const { filePath } = args as { filePath: string };
        const content = await parsePDF(filePath);
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                pages: content.pages,
                metadata: content.metadata,
                textLength: content.text.length,
                textPreview: content.text.slice(0, 2000) + (content.text.length > 2000 ? '...' : ''),
              }, null, 2),
            },
          ],
        };
      }

      case 'pdf_parse_base64': {
        const { base64Content } = args as { base64Content: string };
        const buffer = Buffer.from(base64Content, 'base64');
        const content = await parsePDFFromBuffer(buffer);
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                pages: content.pages,
                metadata: content.metadata,
                textLength: content.text.length,
                textPreview: content.text.slice(0, 2000) + (content.text.length > 2000 ? '...' : ''),
              }, null, 2),
            },
          ],
        };
      }

      case 'pdf_chunk': {
        const { filePath, chunkSize = 1000, overlap = 200 } = args as {
          filePath: string;
          chunkSize?: number;
          overlap?: number;
        };
        const content = await parsePDF(filePath);
        const chunks = chunkText(content.text, chunkSize, overlap);
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                totalChunks: chunks.length,
                chunks: chunks.map(c => ({
                  ...c,
                  contentPreview: c.content.slice(0, 200) + (c.content.length > 200 ? '...' : ''),
                })),
              }, null, 2),
            },
          ],
        };
      }

      case 'pdf_extract_sections': {
        const { filePath } = args as { filePath: string };
        const content = await parsePDF(filePath);
        const sections = extractSections(content.text);
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                sectionsFound: Object.keys(sections),
                sections: Object.fromEntries(
                  Object.entries(sections).map(([key, value]) => [
                    key,
                    {
                      length: value.length,
                      preview: value.slice(0, 500) + (value.length > 500 ? '...' : ''),
                    },
                  ])
                ),
              }, null, 2),
            },
          ],
        };
      }

      default:
        return {
          content: [{ type: 'text', text: `Unknown tool: ${name}` }],
          isError: true,
        };
    }
  } catch (error) {
    return {
      content: [
        {
          type: 'text',
          text: `Error: ${error instanceof Error ? error.message : 'Unknown error'}`,
        },
      ],
      isError: true,
    };
  }
});

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
}

main().catch(console.error);

2.5 MCP Client Manager

typescript

// src/mcp/client-manager.ts

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
import { Tool } from '@langchain/core/tools';
import { spawn, ChildProcess } from 'child_process';
import * as path from 'path';

interface MCPServerConfig {
  name: string;
  command: string;
  args?: string[];
  env?: Record<string, string>;
}

interface MCPConnection {
  client: Client;
  transport: StdioClientTransport;
  process: ChildProcess;
  tools: Map<string, any>;
}

export class MCPClientManager {
  private connections: Map<string, MCPConnection> = new Map();
  private configs: Map<string, MCPServerConfig> = new Map();

  constructor() {
    this.initializeConfigs();
  }

  private initializeConfigs() {
    const serverDir = path.join(__dirname, 'servers');

    this.configs.set('arxiv', {
      name: 'arxiv',
      command: 'npx',
      args: ['ts-node', path.join(serverDir, 'arxiv-server.ts')],
    });

    this.configs.set('semantic-scholar', {
      name: 'semantic-scholar',
      command: 'npx',
      args: ['ts-node', path.join(serverDir, 'semantic-scholar-server.ts')],
      env: {
        SEMANTIC_SCHOLAR_API_KEY: process.env.SEMANTIC_SCHOLAR_API_KEY || '',
      },
    });

    this.configs.set('pdf', {
      name: 'pdf',
      command: 'npx',
      args: ['ts-node', path.join(serverDir, 'pdf-server.ts')],
    });

    this.configs.set('web-search', {
      name: 'web-search',
      command: 'npx',
      args: ['ts-node', path.join(serverDir, 'web-search-server.ts')],
      env: {
        TAVILY_API_KEY: process.env.TAVILY_API_KEY || '',
      },
    });

    this.configs.set('vector-store', {
      name: 'vector-store',
      command: 'npx',
      args: ['ts-node', path.join(serverDir, 'vector-store-server.ts')],
    });
  }

  async connect(serverName: string): Promise<void> {
    if (this.connections.has(serverName)) {
      return;
    }

    const config = this.configs.get(serverName);
    if (!config) {
      throw new Error(`Unknown MCP server: ${serverName}`);
    }

    const serverProcess = spawn(config.command, config.args || [], {
      stdio: ['pipe', 'pipe', 'pipe'],
      env: { ...process.env, ...config.env },
    });

    const transport = new StdioClientTransport({
      reader: serverProcess.stdout!,
      writer: serverProcess.stdin!,
    });

    const client = new Client(
      {
        name: 'research-assistant-client',
        version: '1.0.0',
      },
      {
        capabilities: {},
      }
    );

    await client.connect(transport);

    const toolsResponse = await client.listTools();
    const tools = new Map<string, any>();
    
    for (const tool of toolsResponse.tools) {
      tools.set(tool.name, tool);
    }

    this.connections.set(serverName, {
      client,
      transport,
      process: serverProcess,
      tools,
    });

    console.log(`Connected to MCP server: ${serverName} (${tools.size} tools)`);
  }

  async connectAll(): Promise<void> {
    const connectPromises = Array.from(this.configs.keys()).map(name =>
      this.connect(name).catch(err => {
        console.error(`Failed to connect to ${name}:`, err.message);
      })
    );

    await Promise.all(connectPromises);
  }

  async disconnect(serverName: string): Promise<void> {
    const connection = this.connections.get(serverName);
    if (!connection) {
      return;
    }

    await connection.client.close();
    connection.process.kill();
    this.connections.delete(serverName);
  }

  async disconnectAll(): Promise<void> {
    const disconnectPromises = Array.from(this.connections.keys()).map(name =>
      this.disconnect(name)
    );

    await Promise.all(disconnectPromises);
  }

  async callTool(toolName: string, args: Record<string, any>): Promise<any> {
    for (const [serverName, connection] of this.connections) {
      if (connection.tools.has(toolName)) {
        const result = await connection.client.callTool({
          name: toolName,
          arguments: args,
        });
        return result;
      }
    }

    throw new Error(`Tool not found: ${toolName}`);
  }

  getAllTools(): Tool[] {
    const langchainTools: Tool[] = [];

    for (const [serverName, connection] of this.connections) {
      for (const [toolName, toolDef] of connection.tools) {
        const tool = this.createLangChainTool(toolName, toolDef, connection.client);
        langchainTools.push(tool);
      }
    }

    return langchainTools;
  }

  getToolsByServer(serverName: string): Tool[] {
    const connection = this.connections.get(serverName);
    if (!connection) {
      return [];
    }

    const langchainTools: Tool[] = [];

    for (const [toolName, toolDef] of connection.tools) {
      const tool = this.createLangChainTool(toolName, toolDef, connection.client);
      langchainTools.push(tool);
    }

    return langchainTools;
  }

  private createLangChainTool(
    toolName: string,
    toolDef: any,
    client: Client
  ): Tool {
    return {
      name: toolName,
      description: toolDef.description,
      schema: toolDef.inputSchema,
      invoke: async (input: Record<string, any>) => {
        const result = await client.callTool({
          name: toolName,
          arguments: input,
        });
        
        if (result.content && Array.isArray(result.content)) {
          const textContent = result.content.find((c: any) => c.type === 'text');
          if (textContent) {
            return textContent.text;
          }
        }
        
        return JSON.stringify(result);
      },
    } as Tool;
  }

  listConnectedServers(): string[] {
    return Array.from(this.connections.keys());
  }

  listAvailableTools(): Array<{ server: string; tool: string; description: string }> {
    const toolList: Array<{ server: string; tool: string; description: string }> = [];

    for (const [serverName, connection] of this.connections) {
      for (const [toolName, toolDef] of connection.tools) {
        toolList.push({
          server: serverName,
          tool: toolName,
          description: toolDef.description,
        });
      }
    }

    return toolList;
  }
}

export const mcpManager = new MCPClientManager();

第三部分：上下文管理

3.1 上下文管理的重要性

研究助手需要处理大量信息，包括：

用户的研究问题和要求
检索到的多篇论文
已提取的摘要和关键观点
生成中的报告草稿
对话历史

核心挑战：LLM 有上下文窗口限制，我们需要智能地管理这些信息。

┌─────────────────────────────────────────────────────────────┐
│                    上下文管理策略                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────┐      ┌─────────────────┐             │
│  │   短期记忆       │      │   长期记忆       │             │
│  │  (当前会话)      │      │  (跨会话持久化)   │             │
│  ├─────────────────┤      ├─────────────────┤             │
│  │ • 当前问题       │      │ • 已处理论文     │             │
│  │ • 最近对话       │      │ • 提取的摘要     │             │
│  │ • 工作状态       │      │ • 知识图谱       │             │
│  │ • 临时结果       │      │ • 用户偏好       │             │
│  └─────────────────┘      └─────────────────┘             │
│           │                        │                       │
│           └────────┬───────────────┘                       │
│                    ▼                                       │
│           ┌─────────────────┐                             │
│           │   上下文窗口     │                             │
│           │   (Token 限制)   │                             │
│           └─────────────────┘                             │
│                    │                                       │
│           ┌────────┼────────┐                             │
│           ▼        ▼        ▼                             │
│     ┌─────────┐ ┌─────────┐ ┌─────────┐                  │
│     │ 压缩    │ │ 选择    │ │ 摘要    │                  │
│     │ 历史    │ │ 相关    │ │ 长文档  │                  │
│     └─────────┘ └─────────┘ └─────────┘                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

上下文管理三层策略

3.2 上下文管理器实现

typescript

// src/context/context-manager.ts

import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage, AIMessage, SystemMessage, BaseMessage } from '@langchain/core/messages';
import { ChromaClient, Collection } from 'chromadb';
import { OpenAIEmbeddings } from '@langchain/openai';
import { PrismaClient } from '@prisma/client';

interface Paper {
  id: number;
  title: string;
  abstract: string;
  summary?: string;
  keyPoints?: string[];
}

interface ContextWindow {
  systemPrompt: string;
  researchContext: string;
  relevantPapers: Paper[];
  conversationHistory: BaseMessage[];
  currentTask: string;
  tokenCount: number;
}

interface ResearchState {
  projectId: number;
  topic: string;
  keywords: string[];
  processedPapers: number[];
  currentPhase: 'search' | 'analyze' | 'compare' | 'write';
  findings: string[];
  reportDraft: string;
}

export class ContextManager {
  private llm: ChatOpenAI;
  private embeddings: OpenAIEmbeddings;
  private chromaClient: ChromaClient;
  private prisma: PrismaClient;
  private collection: Collection | null = null;
  
  private maxTokens: number = 128000;
  private reservedTokens: number = 4000;
  private maxHistoryMessages: number = 20;
  
  private researchState: Map<number, ResearchState> = new Map();
  private conversationCache: Map<number, BaseMessage[]> = new Map();

  constructor() {
    this.llm = new ChatOpenAI({
      modelName: 'gpt-4o-mini',
      temperature: 0,
    });
    
    this.embeddings = new OpenAIEmbeddings();
    this.chromaClient = new ChromaClient();
    this.prisma = new PrismaClient();
  }

  async initialize(): Promise<void> {
    this.collection = await this.chromaClient.getOrCreateCollection({
      name: 'research_papers',
      metadata: { description: 'Research paper embeddings' },
    });
  }

  async buildContext(
    projectId: number,
    currentQuery: string,
    sessionId: number
  ): Promise<ContextWindow> {
    const project = await this.prisma.researchProject.findUnique({
      where: { id: projectId },
      include: {
        papers: {
          include: { paper: true },
          orderBy: { relevance: 'desc' },
        },
      },
    });

    if (!project) {
      throw new Error(`Project not found: ${projectId}`);
    }

    const relevantPapers = await this.selectRelevantPapers(
      currentQuery,
      project.papers.map(p => p.paper),
      5
    );

    const conversationHistory = await this.getCompressedHistory(sessionId);

    const researchState = this.researchState.get(projectId);
    const researchContext = this.buildResearchContext(project, researchState);

    const systemPrompt = this.buildSystemPrompt(project.topic);
    const currentTask = this.inferCurrentTask(currentQuery, researchState);

    const tokenCount = await this.estimateTokenCount({
      systemPrompt,
      researchContext,
      relevantPapers,
      conversationHistory,
      currentTask,
    });

    if (tokenCount > this.maxTokens - this.reservedTokens) {
      return await this.compressContext({
        systemPrompt,
        researchContext,
        relevantPapers,
        conversationHistory,
        currentTask,
        tokenCount,
      });
    }

    return {
      systemPrompt,
      researchContext,
      relevantPapers,
      conversationHistory,
      currentTask,
      tokenCount,
    };
  }

  private async selectRelevantPapers(
    query: string,
    papers: Paper[],
    limit: number
  ): Promise<Paper[]> {
    if (!this.collection || papers.length === 0) {
      return papers.slice(0, limit);
    }

    const queryEmbedding = await this.embeddings.embedQuery(query);

    const results = await this.collection.query({
      queryEmbeddings: [queryEmbedding],
      nResults: limit,
      where: {
        paperId: { $in: papers.map(p => p.id.toString()) },
      },
    });

    if (!results.ids[0] || results.ids[0].length === 0) {
      return papers.slice(0, limit);
    }

    const paperMap = new Map(papers.map(p => [p.id.toString(), p]));
    const relevantPapers: Paper[] = [];

    for (const id of results.ids[0]) {
      const paper = paperMap.get(id);
      if (paper) {
        relevantPapers.push(paper);
      }
    }

    return relevantPapers;
  }

  private async getCompressedHistory(sessionId: number): Promise<BaseMessage[]> {
    const cached = this.conversationCache.get(sessionId);
    if (cached) {
      return cached.slice(-this.maxHistoryMessages);
    }

    const messages = await this.prisma.message.findMany({
      where: { sessionId },
      orderBy: { createdAt: 'desc' },
      take: 50,
    });

    const history: BaseMessage[] = messages.reverse().map(msg => {
      if (msg.role === 'user') {
        return new HumanMessage(msg.content);
      } else if (msg.role === 'assistant') {
        return new AIMessage(msg.content);
      } else {
        return new SystemMessage(msg.content);
      }
    });

    if (history.length > this.maxHistoryMessages) {
      const compressedHistory = await this.compressConversationHistory(history);
      this.conversationCache.set(sessionId, compressedHistory);
      return compressedHistory;
    }

    this.conversationCache.set(sessionId, history);
    return history;
  }

  private async compressConversationHistory(
    history: BaseMessage[]
  ): Promise<BaseMessage[]> {
    const oldMessages = history.slice(0, -10);
    const recentMessages = history.slice(-10);

    if (oldMessages.length === 0) {
      return recentMessages;
    }

    const historyText = oldMessages
      .map(m => `${m._getType()}: ${m.content}`)
      .join('\n');

    const summaryResponse = await this.llm.invoke([
      new SystemMessage('Summarize the following conversation history concisely, preserving key information and context:'),
      new HumanMessage(historyText),
    ]);

    const summaryMessage = new SystemMessage(
      `[Previous conversation summary]: ${summaryResponse.content}`
    );

    return [summaryMessage, ...recentMessages];
  }

  private buildResearchContext(
    project: any,
    state?: ResearchState
  ): string {
    let context = `研究项目: ${project.title}\n`;
    context += `研究主题: ${project.topic}\n`;
    
    const keywords = JSON.parse(project.keywords || '[]');
    context += `关键词: ${keywords.join(', ')}\n`;

    if (state) {
      context += `\n当前阶段: ${this.getPhaseDescription(state.currentPhase)}\n`;
      context += `已处理论文数: ${state.processedPapers.length}\n`;
      
      if (state.findings.length > 0) {
        context += `\n主要发现:\n`;
        state.findings.slice(-5).forEach((finding, i) => {
          context += `${i + 1}. ${finding}\n`;
        });
      }

      if (state.reportDraft) {
        context += `\n报告进度: 已完成草稿\n`;
      }
    }

    return context;
  }

  private getPhaseDescription(phase: string): string {
    const descriptions: Record<string, string> = {
      search: '文献检索阶段 - 正在搜索和收集相关文献',
      analyze: '分析阶段 - 正在分析和提取论文关键信息',
      compare: '对比阶段 - 正在比较不同论文的观点',
      write: '撰写阶段 - 正在生成研究报告',
    };
    return descriptions[phase] || '未知阶段';
  }

  private buildSystemPrompt(topic: string): string {
    return `你是一个专业的 AI 研究助手，正在帮助用户研究「${topic}」这个主题。

你的职责包括：
1. 搜索和分析学术文献
2. 提取关键信息和观点
3. 比较不同来源的异同
4. 撰写结构化的研究报告

在回答时，请：
- 引用具体的论文和来源
- 提供客观、准确的信息
- 指出不同观点之间的关系
- 如果信息不足，主动建议补充搜索`;
  }

  private inferCurrentTask(query: string, state?: ResearchState): string {
    const searchKeywords = ['搜索', '查找', '检索', 'search', 'find'];
    const analyzeKeywords = ['分析', '解读', '理解', 'analyze', 'explain'];
    const compareKeywords = ['比较', '对比', '异同', 'compare', 'difference'];
    const writeKeywords = ['报告', '总结', '撰写', 'report', 'summarize', 'write'];

    const queryLower = query.toLowerCase();

    if (searchKeywords.some(k => queryLower.includes(k))) {
      return 'search';
    }
    if (analyzeKeywords.some(k => queryLower.includes(k))) {
      return 'analyze';
    }
    if (compareKeywords.some(k => queryLower.includes(k))) {
      return 'compare';
    }
    if (writeKeywords.some(k => queryLower.includes(k))) {
      return 'write';
    }

    return state?.currentPhase || 'search';
  }

  private async estimateTokenCount(context: Partial<ContextWindow>): Promise<number> {
    const text = [
      context.systemPrompt || '',
      context.researchContext || '',
      context.currentTask || '',
      ...(context.relevantPapers || []).map(p => `${p.title}\n${p.abstract}\n${p.summary || ''}`),
      ...(context.conversationHistory || []).map(m => m.content as string),
    ].join('\n');

    return Math.ceil(text.length / 4);
  }

  private async compressContext(context: ContextWindow): Promise<ContextWindow> {
    let compressed = { ...context };
    
    while (compressed.tokenCount > this.maxTokens - this.reservedTokens) {
      if (compressed.relevantPapers.length > 2) {
        compressed.relevantPapers = compressed.relevantPapers.slice(0, -1);
      } else if (compressed.conversationHistory.length > 5) {
        compressed.conversationHistory = await this.compressConversationHistory(
          compressed.conversationHistory
        );
      } else {
        compressed.relevantPapers = compressed.relevantPapers.map(p => ({
          ...p,
          abstract: p.abstract.slice(0, 500) + '...',
        }));
      }

      compressed.tokenCount = await this.estimateTokenCount(compressed);
    }

    return compressed;
  }

  updateResearchState(projectId: number, updates: Partial<ResearchState>): void {
    const current = this.researchState.get(projectId) || {
      projectId,
      topic: '',
      keywords: [],
      processedPapers: [],
      currentPhase: 'search' as const,
      findings: [],
      reportDraft: '',
    };

    this.researchState.set(projectId, { ...current, ...updates });
  }

  addFinding(projectId: number, finding: string): void {
    const state = this.researchState.get(projectId);
    if (state) {
      state.findings.push(finding);
      if (state.findings.length > 20) {
        state.findings = state.findings.slice(-20);
      }
    }
  }

  addProcessedPaper(projectId: number, paperId: number): void {
    const state = this.researchState.get(projectId);
    if (state && !state.processedPapers.includes(paperId)) {
      state.processedPapers.push(paperId);
    }
  }

  clearCache(sessionId?: number): void {
    if (sessionId) {
      this.conversationCache.delete(sessionId);
    } else {
      this.conversationCache.clear();
    }
  }
}

export const contextManager = new ContextManager();

3.3 长文档处理

typescript

// src/context/document-processor.ts

import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
import { Document } from '@langchain/core/documents';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

interface ProcessedDocument {
  title: string;
  summary: string;
  keyPoints: string[];
  methodology: string;
  conclusions: string;
  chunks: DocumentChunk[];
}

interface DocumentChunk {
  content: string;
  index: number;
  metadata: {
    section?: string;
    pageNumber?: number;
  };
}

interface SectionSummary {
  section: string;
  summary: string;
  keyPoints: string[];
}

export class DocumentProcessor {
  private llm: ChatOpenAI;
  private splitter: RecursiveCharacterTextSplitter;

  constructor() {
    this.llm = new ChatOpenAI({
      modelName: 'gpt-4o-mini',
      temperature: 0,
    });

    this.splitter = new RecursiveCharacterTextSplitter({
      chunkSize: 2000,
      chunkOverlap: 200,
      separators: ['\n\n', '\n', '. ', ' ', ''],
    });
  }

  async processDocument(
    content: string,
    title: string,
    extractSections: boolean = true
  ): Promise<ProcessedDocument> {
    const chunks = await this.splitDocument(content);

    let sectionSummaries: SectionSummary[] = [];
    
    if (extractSections) {
      sectionSummaries = await this.extractAndSummarizeSections(content);
    }

    const summary = await this.generateOverallSummary(
      title,
      content.slice(0, 10000),
      sectionSummaries
    );

    const keyPoints = await this.extractKeyPoints(content.slice(0, 15000));

    const methodology = await this.extractMethodology(content);

    const conclusions = await this.extractConclusions(content);

    return {
      title,
      summary,
      keyPoints,
      methodology,
      conclusions,
      chunks,
    };
  }

  private async splitDocument(content: string): Promise<DocumentChunk[]> {
    const docs = await this.splitter.createDocuments([content]);
    
    return docs.map((doc, index) => ({
      content: doc.pageContent,
      index,
      metadata: doc.metadata,
    }));
  }

  private async extractAndSummarizeSections(
    content: string
  ): Promise<SectionSummary[]> {
    const sectionPatterns: Record<string, RegExp> = {
      abstract: /(?:^|\n)(abstract|摘要)[:\s]*\n?([\s\S]*?)(?=\n(?:1\.|introduction|引言|keywords|关键词)|\n\n\n)/i,
      introduction: /(?:^|\n)((?:1\.?\s*)?introduction|引言)[:\s]*\n?([\s\S]*?)(?=\n(?:2\.|related|background|方法|method)|\n\n\n)/i,
      methods: /(?:^|\n)((?:\d\.?\s*)?method(?:ology|s)?|方法)[:\s]*\n?([\s\S]*?)(?=\n(?:\d\.|experiment|result|实验|结果)|\n\n\n)/i,
      results: /(?:^|\n)((?:\d\.?\s*)?results?|结果)[:\s]*\n?([\s\S]*?)(?=\n(?:\d\.|discussion|conclusion|讨论|结论)|\n\n\n)/i,
      conclusion: /(?:^|\n)((?:\d\.?\s*)?conclusions?|结论)[:\s]*\n?([\s\S]*?)(?=\n(?:reference|acknowledge|参考|致谢)|\n\n\n|$)/i,
    };

    const summaries: SectionSummary[] = [];

    for (const [section, pattern] of Object.entries(sectionPatterns)) {
      const match = content.match(pattern);
      if (match && match[2]) {
        const sectionContent = match[2].trim().slice(0, 5000);
        
        if (sectionContent.length > 100) {
          const summary = await this.summarizeSection(section, sectionContent);
          summaries.push(summary);
        }
      }
    }

    return summaries;
  }

  private async summarizeSection(
    section: string,
    content: string
  ): Promise<SectionSummary> {
    const response = await this.llm.invoke([
      new SystemMessage(`你是一个学术论文分析助手。请分析以下论文的「${section}」部分，提供：
1. 简洁摘要（2-3句话）
2. 关键要点（3-5个）

以 JSON 格式输出：{"summary": "...", "keyPoints": ["...", "..."]}`),
      new HumanMessage(content),
    ]);

    try {
      const result = JSON.parse(response.content as string);
      return {
        section,
        summary: result.summary,
        keyPoints: result.keyPoints,
      };
    } catch {
      return {
        section,
        summary: content.slice(0, 200) + '...',
        keyPoints: [],
      };
    }
  }

  private async generateOverallSummary(
    title: string,
    content: string,
    sectionSummaries: SectionSummary[]
  ): Promise<string> {
    const sectionsContext = sectionSummaries
      .map(s => `${s.section}: ${s.summary}`)
      .join('\n');

    const response = await this.llm.invoke([
      new SystemMessage(`你是一个学术论文分析助手。根据论文内容和各部分摘要，生成一个全面的论文摘要（200-300字）。

论文标题: ${title}

各部分摘要:
${sectionsContext}`),
      new HumanMessage(`请根据以上信息和以下论文内容生成摘要：\n\n${content.slice(0, 5000)}`),
    ]);

    return response.content as string;
  }

  private async extractKeyPoints(content: string): Promise<string[]> {
    const response = await this.llm.invoke([
      new SystemMessage(`从以下学术论文中提取 5-8 个关键要点。每个要点应该简洁明了，包含具体信息。

以 JSON 数组格式输出：["要点1", "要点2", ...]`),
      new HumanMessage(content),
    ]);

    try {
      return JSON.parse(response.content as string);
    } catch {
      return [];
    }
  }

  private async extractMethodology(content: string): Promise<string> {
    const methodPattern = /(?:method(?:ology|s)?|approach|技术方法|研究方法)[:\s]*\n?([\s\S]{500,3000})/i;
    const match = content.match(methodPattern);
    
    if (!match) {
      return '';
    }

    const response = await this.llm.invoke([
      new SystemMessage('请简洁描述这篇论文使用的研究方法或技术路线（100-200字）：'),
      new HumanMessage(match[1]),
    ]);

    return response.content as string;
  }

  private async extractConclusions(content: string): Promise<string> {
    const conclusionPattern = /(?:conclusion|结论)[:\s]*\n?([\s\S]{200,2000})/i;
    const match = content.match(conclusionPattern);
    
    if (!match) {
      const lastPart = content.slice(-3000);
      const response = await this.llm.invoke([
        new SystemMessage('请从以下文本中提取论文的主要结论（100-150字）：'),
        new HumanMessage(lastPart),
      ]);
      return response.content as string;
    }

    const response = await this.llm.invoke([
      new SystemMessage('请简洁总结这篇论文的主要结论（100-150字）：'),
      new HumanMessage(match[1]),
    ]);

    return response.content as string;
  }

  async processMultipleDocuments(
    documents: Array<{ content: string; title: string }>
  ): Promise<ProcessedDocument[]> {
    const results: ProcessedDocument[] = [];

    for (const doc of documents) {
      try {
        const processed = await this.processDocument(doc.content, doc.title);
        results.push(processed);
      } catch (error) {
        console.error(`Error processing document ${doc.title}:`, error);
      }
    }

    return results;
  }

  async createHierarchicalSummary(
    documents: ProcessedDocument[]
  ): Promise<string> {
    const docSummaries = documents
      .map((d, i) => `[${i + 1}] ${d.title}\n${d.summary}`)
      .join('\n\n');

    const response = await this.llm.invoke([
      new SystemMessage(`你是一个研究综述助手。根据以下多篇论文的摘要，生成一个综合性的研究领域概述：

要求：
1. 识别主要研究方向和趋势
2. 指出论文之间的关联
3. 总结该领域的主要进展
4. 指出尚待解决的问题`),
      new HumanMessage(docSummaries),
    ]);

    return response.content as string;
  }
}

export const documentProcessor = new DocumentProcessor();

第四部分：Agent 实现

4.1 研究规划器（主 Agent）

typescript

// src/agents/research-planner.ts

import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage, SystemMessage, AIMessage } from '@langchain/core/messages';
import { z } from 'zod';

const ResearchPlanSchema = z.object({
  understanding: z.string().describe('对研究问题的理解'),
  searchStrategy: z.object({
    primaryKeywords: z.array(z.string()).describe('主要搜索关键词'),
    alternativeKeywords: z.array(z.string()).describe('备选/相关关键词'),
    targetSources: z.array(z.enum(['arxiv', 'semantic_scholar', 'web', 'local'])).describe('目标搜索源'),
    timeRange: z.object({
      from: z.number().optional(),
      to: z.number().optional(),
    }).describe('时间范围筛选'),
    expectedPapers: z.number().describe('预期需要的论文数量'),
  }),
  analysisApproach: z.object({
    focusAreas: z.array(z.string()).describe('重点分析领域'),
    comparisonDimensions: z.array(z.string()).describe('对比分析维度'),
    expectedFindings: z.array(z.string()).describe('预期发现的内容类型'),
  }),
  outputFormat: z.object({
    reportStructure: z.array(z.string()).describe('报告结构大纲'),
    includeSections: z.array(z.string()).describe('需要包含的章节'),
  }),
  estimatedSteps: z.number().describe('预计步骤数'),
});

type ResearchPlan = z.infer<typeof ResearchPlanSchema>;

const GapAnalysisSchema = z.object({
  hasGaps: z.boolean().describe('是否存在信息缺口'),
  gaps: z.array(z.object({
    description: z.string().describe('缺口描述'),
    suggestedAction: z.string().describe('建议的补充行动'),
    priority: z.enum(['high', 'medium', 'low']).describe('优先级'),
  })),
  completeness: z.number().min(0).max(100).describe('当前研究完成度百分比'),
  recommendation: z.enum(['continue_search', 'proceed_analysis', 'generate_report']).describe('建议的下一步'),
});

type GapAnalysis = z.infer<typeof GapAnalysisSchema>;

export class ResearchPlanner {
  private llm: ChatOpenAI;
  private plannerLLM: ChatOpenAI;

  constructor() {
    this.llm = new ChatOpenAI({
      modelName: 'gpt-4o',
      temperature: 0.3,
    });

    this.plannerLLM = this.llm.withStructuredOutput(ResearchPlanSchema);
  }

  async createResearchPlan(
    topic: string,
    requirements?: string,
    constraints?: {
      maxPapers?: number;
      timeRange?: { from?: number; to?: number };
      focusAreas?: string[];
    }
  ): Promise<ResearchPlan> {
    const systemPrompt = `你是一个专业的研究规划助手。根据用户的研究主题和需求，制定详细的研究计划。

考虑因素：
1. 研究主题的范围和深度
2. 可用的文献来源
3. 时间和资源限制
4. 输出格式要求

请制定一个全面、可执行的研究计划。`;

    const userPrompt = `研究主题：${topic}

${requirements ? `具体要求：${requirements}` : ''}

${constraints ? `限制条件：
- 最大论文数：${constraints.maxPapers || '不限'}
- 时间范围：${constraints.timeRange ? `${constraints.timeRange.from || '不限'} - ${constraints.timeRange.to || '至今'}` : '不限'}
- 重点领域：${constraints.focusAreas?.join(', ') || '不限'}` : ''}

请制定研究计划。`;

    const plan = await this.plannerLLM.invoke([
      new SystemMessage(systemPrompt),
      new HumanMessage(userPrompt),
    ]);

    return plan;
  }

  async analyzeInformationGaps(
    originalPlan: ResearchPlan,
    collectedPapers: Array<{
      title: string;
      abstract: string;
      summary?: string;
    }>,
    extractedFindings: string[]
  ): Promise<GapAnalysis> {
    const gapAnalyzerLLM = this.llm.withStructuredOutput(GapAnalysisSchema);

    const papersContext = collectedPapers
      .map((p, i) => `[${i + 1}] ${p.title}\n${p.summary || p.abstract.slice(0, 300)}`)
      .join('\n\n');

    const findingsContext = extractedFindings.join('\n- ');

    const systemPrompt = `你是一个研究质量评估助手。分析当前收集的信息是否足够完成研究目标。

评估维度：
1. 覆盖度：是否覆盖了计划中的所有重点领域
2. 深度：每个领域的信息是否足够深入
3. 多样性：是否包含不同的观点和方法
4. 时效性：信息是否足够新
5. 可靠性：来源是否可靠`;

    const userPrompt = `研究计划：
- 主题理解：${originalPlan.understanding}
- 重点领域：${originalPlan.analysisApproach.focusAreas.join(', ')}
- 对比维度：${originalPlan.analysisApproach.comparisonDimensions.join(', ')}
- 预期发现：${originalPlan.analysisApproach.expectedFindings.join(', ')}

已收集论文（${collectedPapers.length}篇）：
${papersContext}

已提取发现：
- ${findingsContext}

请评估信息完整性并识别缺口。`;

    const analysis = await gapAnalyzerLLM.invoke([
      new SystemMessage(systemPrompt),
      new HumanMessage(userPrompt),
    ]);

    return analysis;
  }

  async generateSupplementarySearches(
    gaps: GapAnalysis['gaps'],
    previousSearches: string[]
  ): Promise<Array<{
    query: string;
    source: string;
    rationale: string;
  }>> {
    const SearchSuggestionsSchema = z.object({
      searches: z.array(z.object({
        query: z.string(),
        source: z.enum(['arxiv', 'semantic_scholar', 'web']),
        rationale: z.string(),
      })),
    });

    const suggesterLLM = this.llm.withStructuredOutput(SearchSuggestionsSchema);

    const gapsDescription = gaps
      .map(g => `- ${g.description} (${g.priority}): ${g.suggestedAction}`)
      .join('\n');

    const response = await suggesterLLM.invoke([
      new SystemMessage(`你是一个搜索策略优化助手。根据识别的信息缺口，生成补充搜索建议。

之前的搜索：
${previousSearches.join(', ')}

要求：
1. 避免重复之前的搜索
2. 针对性填补信息缺口
3. 考虑不同搜索源的特点`),
      new HumanMessage(`需要填补的信息缺口：\n${gapsDescription}`),
    ]);

    return response.searches;
  }

  async decideNextStep(
    currentState: {
      phase: string;
      papersCollected: number;
      findingsExtracted: number;
      gapAnalysis?: GapAnalysis;
    }
  ): Promise<{
    action: 'search' | 'analyze' | 'compare' | 'write' | 'complete';
    reason: string;
    details: Record<string, any>;
  }> {
    const DecisionSchema = z.object({
      action: z.enum(['search', 'analyze', 'compare', 'write', 'complete']),
      reason: z.string(),
      details: z.record(z.any()),
    });

    const deciderLLM = this.llm.withStructuredOutput(DecisionSchema);

    const response = await deciderLLM.invoke([
      new SystemMessage(`你是一个研究流程控制助手。根据当前研究状态，决定下一步行动。

可选行动：
- search: 需要继续搜索更多文献
- analyze: 需要分析已收集的文献
- compare: 需要进行观点对比
- write: 可以开始撰写报告
- complete: 研究已完成`),
      new HumanMessage(`当前状态：
- 阶段：${currentState.phase}
- 已收集论文：${currentState.papersCollected}篇
- 已提取发现：${currentState.findingsExtracted}条
- 完成度：${currentState.gapAnalysis?.completeness || 0}%
- 建议：${currentState.gapAnalysis?.recommendation || '未评估'}

请决定下一步行动。`),
    ]);

    return response;
  }
}

export const researchPlanner = new ResearchPlanner();

4.2 文献检索 Agent

typescript

// src/agents/search-agent.ts

import { ChatOpenAI } from '@langchain/openai';
import { createReactAgent } from '@langchain/langgraph/prebuilt';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
import { tool } from '@langchain/core/tools';
import { z } from 'zod';
import { mcpManager } from '../mcp/client-manager';
import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

interface SearchResult {
  source: string;
  papers: Array<{
    id: string;
    title: string;
    authors: string[];
    abstract: string;
    url: string;
    publishedAt?: string;
    citations?: number;
  }>;
  totalResults: number;
}

const searchArxivTool = tool(
  async ({ query, maxResults, sortBy }) => {
    const result = await mcpManager.callTool('arxiv_search', {
      query,
      maxResults: maxResults || 10,
      sortBy: sortBy || 'relevance',
    });
    
    const content = result.content?.find((c: any) => c.type === 'text');
    return content?.text || JSON.stringify(result);
  },
  {
    name: 'search_arxiv',
    description: 'Search for papers on arXiv',
    schema: z.object({
      query: z.string().describe('Search query'),
      maxResults: z.number().optional().describe('Maximum results (default: 10)'),
      sortBy: z.enum(['relevance', 'lastUpdatedDate', 'submittedDate']).optional(),
    }),
  }
);

const searchSemanticScholarTool = tool(
  async ({ query, limit, yearMin, yearMax }) => {
    const result = await mcpManager.callTool('semantic_scholar_search', {
      query,
      limit: limit || 10,
      yearMin,
      yearMax,
    });
    
    const content = result.content?.find((c: any) => c.type === 'text');
    return content?.text || JSON.stringify(result);
  },
  {
    name: 'search_semantic_scholar',
    description: 'Search for papers in Semantic Scholar',
    schema: z.object({
      query: z.string().describe('Search query'),
      limit: z.number().optional().describe('Maximum results'),
      yearMin: z.number().optional().describe('Minimum publication year'),
      yearMax: z.number().optional().describe('Maximum publication year'),
    }),
  }
);

const getCitationsTool = tool(
  async ({ paperId, limit }) => {
    const result = await mcpManager.callTool('semantic_scholar_citations', {
      paperId,
      limit: limit || 10,
    });
    
    const content = result.content?.find((c: any) => c.type === 'text');
    return content?.text || JSON.stringify(result);
  },
  {
    name: 'get_citations',
    description: 'Get papers that cite a specific paper',
    schema: z.object({
      paperId: z.string().describe('Semantic Scholar paper ID'),
      limit: z.number().optional().describe('Maximum citations to return'),
    }),
  }
);

const getReferencesTool = tool(
  async ({ paperId, limit }) => {
    const result = await mcpManager.callTool('semantic_scholar_references', {
      paperId,
      limit: limit || 10,
    });
    
    const content = result.content?.find((c: any) => c.type === 'text');
    return content?.text || JSON.stringify(result);
  },
  {
    name: 'get_references',
    description: 'Get papers referenced by a specific paper',
    schema: z.object({
      paperId: z.string().describe('Semantic Scholar paper ID'),
      limit: z.number().optional().describe('Maximum references to return'),
    }),
  }
);

const savePaperTool = tool(
  async ({ title, authors, abstract, source, sourceUrl, sourceId, doi, publishedAt, venue, citations, userId, projectId }) => {
    const paper = await prisma.paper.upsert({
      where: {
        source_sourceId: {
          source,
          sourceId,
        },
      },
      update: {
        title,
        authors: JSON.stringify(authors),
        abstract,
        sourceUrl,
        doi,
        publishedAt: publishedAt ? new Date(publishedAt) : null,
        venue,
        citations: citations || 0,
      },
      create: {
        userId,
        title,
        authors: JSON.stringify(authors),
        abstract,
        source,
        sourceUrl,
        sourceId,
        doi,
        publishedAt: publishedAt ? new Date(publishedAt) : null,
        venue,
        citations: citations || 0,
      },
    });

    if (projectId) {
      await prisma.projectPaper.upsert({
        where: {
          projectId_paperId: {
            projectId,
            paperId: paper.id,
          },
        },
        update: {},
        create: {
          projectId,
          paperId: paper.id,
        },
      });
    }

    return JSON.stringify({ success: true, paperId: paper.id });
  },
  {
    name: 'save_paper',
    description: 'Save a paper to the database',
    schema: z.object({
      title: z.string(),
      authors: z.array(z.string()),
      abstract: z.string(),
      source: z.string(),
      sourceUrl: z.string().optional(),
      sourceId: z.string(),
      doi: z.string().optional(),
      publishedAt: z.string().optional(),
      venue: z.string().optional(),
      citations: z.number().optional(),
      userId: z.number(),
      projectId: z.number().optional(),
    }),
  }
);

export function createSearchAgent() {
  const llm = new ChatOpenAI({
    modelName: 'gpt-4o',
    temperature: 0,
  });

  const tools = [
    searchArxivTool,
    searchSemanticScholarTool,
    getCitationsTool,
    getReferencesTool,
    savePaperTool,
  ];

  const systemPrompt = `你是一个专业的文献检索助手，负责帮助研究人员查找相关学术论文。

你可以使用以下工具：
1. search_arxiv - 在 arXiv 上搜索论文
2. search_semantic_scholar - 在 Semantic Scholar 上搜索论文
3. get_citations - 获取引用某篇论文的后续研究
4. get_references - 获取某篇论文引用的参考文献
5. save_paper - 将找到的论文保存到数据库

搜索策略：
1. 根据研究主题生成多个搜索关键词
2. 在不同数据源进行搜索以获得更全面的结果
3. 使用引用追踪找到相关的重要论文
4. 过滤和排序结果，优先选择高质量论文

质量判断标准：
- 引用数（越高越好）
- 发表时间（优先最新研究）
- 发表期刊/会议（知名度）
- 作者影响力
- 与研究主题的相关度`;

  return createReactAgent({
    llm,
    tools,
    messageModifier: new SystemMessage(systemPrompt),
  });
}

export class SearchAgent {
  private agent: ReturnType<typeof createSearchAgent>;

  constructor() {
    this.agent = createSearchAgent();
  }

  async search(
    query: string,
    options: {
      userId: number;
      projectId?: number;
      maxPapers?: number;
      yearRange?: { from?: number; to?: number };
      sources?: string[];
    }
  ): Promise<SearchResult[]> {
    const prompt = `请搜索关于「${query}」的学术论文。

要求：
- 最多找 ${options.maxPapers || 10} 篇相关论文
${options.yearRange ? `- 时间范围：${options.yearRange.from || '不限'} - ${options.yearRange.to || '至今'}` : ''}
${options.sources ? `- 搜索源：${options.sources.join(', ')}` : '- 在所有可用数据源搜索'}

找到论文后，请将相关论文保存到数据库（userId: ${options.userId}${options.projectId ? `, projectId: ${options.projectId}` : ''}）。`;

    const result = await this.agent.invoke({
      messages: [new HumanMessage(prompt)],
    });

    const lastMessage = result.messages[result.messages.length - 1];
    return this.parseSearchResults(lastMessage.content as string);
  }

  async exploreCitations(
    paperId: string,
    direction: 'citations' | 'references' | 'both'
  ): Promise<any[]> {
    let prompt = '';
    
    if (direction === 'citations') {
      prompt = `请获取论文 ${paperId} 的引用列表（被哪些论文引用了）。`;
    } else if (direction === 'references') {
      prompt = `请获取论文 ${paperId} 的参考文献列表。`;
    } else {
      prompt = `请获取论文 ${paperId} 的引用和参考文献列表。`;
    }

    const result = await this.agent.invoke({
      messages: [new HumanMessage(prompt)],
    });

    return result.messages;
  }

  private parseSearchResults(content: string): SearchResult[] {
    return [];
  }
}

export const searchAgent = new SearchAgent();

4.3 摘要提取 Agent

typescript

// src/agents/summary-agent.ts

import { ChatOpenAI } from '@langchain/openai';
import { createReactAgent } from '@langchain/langgraph/prebuilt';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
import { tool } from '@langchain/core/tools';
import { z } from 'zod';
import { mcpManager } from '../mcp/client-manager';
import { documentProcessor } from '../context/document-processor';
import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

const PaperAnalysisSchema = z.object({
  title: z.string(),
  summary: z.string().describe('论文核心内容摘要（200-300字）'),
  keyPoints: z.array(z.string()).describe('关键要点（5-8个）'),
  methodology: z.string().describe('研究方法概述'),
  mainFindings: z.array(z.string()).describe('主要发现（3-5个）'),
  limitations: z.array(z.string()).describe('研究局限性'),
  futureWork: z.array(z.string()).describe('未来研究方向'),
  relevance: z.object({
    score: z.number().min(0).max(10),
    reason: z.string(),
  }).describe('与研究主题的相关性评估'),
});

type PaperAnalysis = z.infer<typeof PaperAnalysisSchema>;

const fetchPdfTool = tool(
  async ({ arxivId }) => {
    const result = await mcpManager.callTool('arxiv_download_pdf', { arxivId });
    const content = result.content?.find((c: any) => c.type === 'text');
    return content?.text || JSON.stringify(result);
  },
  {
    name: 'fetch_pdf',
    description: 'Download PDF from arXiv',
    schema: z.object({
      arxivId: z.string().describe('arXiv paper ID'),
    }),
  }
);

const parsePdfTool = tool(
  async ({ base64Content }) => {
    const result = await mcpManager.callTool('pdf_parse_base64', { base64Content });
    const content = result.content?.find((c: any) => c.type === 'text');
    return content?.text || JSON.stringify(result);
  },
  {
    name: 'parse_pdf',
    description: 'Parse PDF content and extract text',
    schema: z.object({
      base64Content: z.string().describe('Base64 encoded PDF content'),
    }),
  }
);

const extractSectionsTool = tool(
  async ({ filePath }) => {
    const result = await mcpManager.callTool('pdf_extract_sections', { filePath });
    const content = result.content?.find((c: any) => c.type === 'text');
    return content?.text || JSON.stringify(result);
  },
  {
    name: 'extract_sections',
    description: 'Extract academic paper sections',
    schema: z.object({
      filePath: z.string().describe('Path to PDF file'),
    }),
  }
);

const saveAnalysisTool = tool(
  async ({ paperId, summary, keyPoints, methodology, conclusions }) => {
    await prisma.paper.update({
      where: { id: paperId },
      data: {
        isProcessed: true,
        summary,
        keyPoints: JSON.stringify(keyPoints),
        methodology,
        conclusions,
      },
    });
    return JSON.stringify({ success: true });
  },
  {
    name: 'save_analysis',
    description: 'Save paper analysis to database',
    schema: z.object({
      paperId: z.number(),
      summary: z.string(),
      keyPoints: z.array(z.string()),
      methodology: z.string(),
      conclusions: z.string(),
    }),
  }
);

export function createSummaryAgent() {
  const llm = new ChatOpenAI({
    modelName: 'gpt-4o',
    temperature: 0,
  });

  const tools = [
    fetchPdfTool,
    parsePdfTool,
    extractSectionsTool,
    saveAnalysisTool,
  ];

  const systemPrompt = `你是一个专业的学术论文分析助手，负责阅读和理解论文内容，提取关键信息。

分析任务：
1. 理解论文的核心贡献
2. 识别研究方法和技术路线
3. 提取主要发现和结论
4. 评估论文的优缺点
5. 判断与研究主题的相关性

分析要求：
- 准确：忠实于原文内容
- 简洁：抓住核心，避免冗余
- 结构化：按照标准格式组织
- 客观：避免主观评价`;

  return createReactAgent({
    llm,
    tools,
    messageModifier: new SystemMessage(systemPrompt),
  });
}

export class SummaryAgent {
  private agent: ReturnType<typeof createSummaryAgent>;
  private llm: ChatOpenAI;

  constructor() {
    this.agent = createSummaryAgent();
    this.llm = new ChatOpenAI({
      modelName: 'gpt-4o',
      temperature: 0,
    });
  }

  async analyzePaper(
    paperId: number,
    researchTopic: string
  ): Promise<PaperAnalysis> {
    const paper = await prisma.paper.findUnique({
      where: { id: paperId },
    });

    if (!paper) {
      throw new Error(`Paper not found: ${paperId}`);
    }

    const analyzerLLM = this.llm.withStructuredOutput(PaperAnalysisSchema);

    const prompt = `请分析以下学术论文，评估其与研究主题「${researchTopic}」的相关性。

论文标题：${paper.title}
作者：${paper.authors}
摘要：${paper.abstract}
${paper.content ? `\n全文内容：\n${paper.content.slice(0, 15000)}` : ''}`;

    const analysis = await analyzerLLM.invoke([
      new SystemMessage(`你是专业的学术论文分析助手。请按照指定格式分析论文。`),
      new HumanMessage(prompt),
    ]);

    await prisma.paper.update({
      where: { id: paperId },
      data: {
        isProcessed: true,
        summary: analysis.summary,
        keyPoints: JSON.stringify(analysis.keyPoints),
        methodology: analysis.methodology,
        conclusions: analysis.mainFindings.join('\n'),
      },
    });

    return analysis;
  }

  async analyzeMultiplePapers(
    paperIds: number[],
    researchTopic: string
  ): Promise<PaperAnalysis[]> {
    const analyses: PaperAnalysis[] = [];

    for (const paperId of paperIds) {
      try {
        const analysis = await this.analyzePaper(paperId, researchTopic);
        analyses.push(analysis);
      } catch (error) {
        console.error(`Error analyzing paper ${paperId}:`, error);
      }
    }

    return analyses;
  }

  async extractKeyInsights(
    analyses: PaperAnalysis[],
    focusAreas: string[]
  ): Promise<{
    insights: Array<{
      area: string;
      findings: string[];
      sources: string[];
    }>;
    consensus: string[];
    controversies: string[];
  }> {
    const InsightsSchema = z.object({
      insights: z.array(z.object({
        area: z.string(),
        findings: z.array(z.string()),
        sources: z.array(z.string()),
      })),
      consensus: z.array(z.string()),
      controversies: z.array(z.string()),
    });

    const insightsLLM = this.llm.withStructuredOutput(InsightsSchema);

    const analysesContext = analyses
      .map(a => `论文：${a.title}\n关键发现：${a.mainFindings.join('; ')}`)
      .join('\n\n');

    const response = await insightsLLM.invoke([
      new SystemMessage(`你是研究综合分析助手。根据多篇论文的分析结果，提取综合洞察。

重点关注领域：${focusAreas.join(', ')}

任务：
1. 按领域整理发现
2. 识别学界共识
3. 发现争议观点`),
      new HumanMessage(analysesContext),
    ]);

    return response;
  }
}

export const summaryAgent = new SummaryAgent();

4.4 对比分析 Agent

typescript

// src/agents/comparison-agent.ts

import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
import { z } from 'zod';

interface PaperViewpoint {
  paperId: number;
  title: string;
  viewpoints: Array<{
    topic: string;
    stance: string;
    evidence: string;
  }>;
}

const ComparisonResultSchema = z.object({
  dimensions: z.array(z.object({
    name: z.string().describe('对比维度'),
    description: z.string().describe('维度说明'),
    papers: z.array(z.object({
      title: z.string(),
      position: z.string().describe('该论文在此维度上的立场/观点'),
      evidence: z.string().describe('支持证据'),
    })),
    consensus: z.string().optional().describe('共识点'),
    divergence: z.string().optional().describe('分歧点'),
  })),
  overallSummary: z.string().describe('整体对比总结'),
  strengthsWeaknesses: z.array(z.object({
    paperTitle: z.string(),
    strengths: z.array(z.string()),
    weaknesses: z.array(z.string()),
  })),
  recommendations: z.array(z.string()).describe('基于对比的建议'),
});

type ComparisonResult = z.infer<typeof ComparisonResultSchema>;

const RelationshipSchema = z.object({
  relationships: z.array(z.object({
    paper1: z.string(),
    paper2: z.string(),
    type: z.enum(['supports', 'contradicts', 'extends', 'applies', 'unrelated']),
    description: z.string(),
    confidence: z.number().min(0).max(1),
  })),
  clusters: z.array(z.object({
    theme: z.string(),
    papers: z.array(z.string()),
    summary: z.string(),
  })),
  evolutionPath: z.array(z.object({
    stage: z.string(),
    papers: z.array(z.string()),
    contribution: z.string(),
  })).optional(),
});

type RelationshipAnalysis = z.infer<typeof RelationshipSchema>;

export class ComparisonAgent {
  private llm: ChatOpenAI;

  constructor() {
    this.llm = new ChatOpenAI({
      modelName: 'gpt-4o',
      temperature: 0.2,
    });
  }

  async compareViewpoints(
    papers: PaperViewpoint[],
    comparisonDimensions: string[]
  ): Promise<ComparisonResult> {
    const comparisonLLM = this.llm.withStructuredOutput(ComparisonResultSchema);

    const papersContext = papers.map(p => {
      const viewpointsStr = p.viewpoints
        .map(v => `- ${v.topic}: ${v.stance} (证据: ${v.evidence})`)
        .join('\n');
      return `【${p.title}】\n${viewpointsStr}`;
    }).join('\n\n');

    const response = await comparisonLLM.invoke([
      new SystemMessage(`你是学术论文对比分析专家。请从以下维度对比分析多篇论文：

对比维度：${comparisonDimensions.join(', ')}

分析要求：
1. 在每个维度上比较各论文的立场和方法
2. 识别共识和分歧
3. 评估各论文的优缺点
4. 提供综合建议`),
      new HumanMessage(`请对比分析以下论文：\n\n${papersContext}`),
    ]);

    return response;
  }

  async analyzeRelationships(
    papers: Array<{
      title: string;
      summary: string;
      keyPoints: string[];
      publishedAt?: Date;
    }>
  ): Promise<RelationshipAnalysis> {
    const relationshipLLM = this.llm.withStructuredOutput(RelationshipSchema);

    const papersContext = papers.map(p => {
      return `【${p.title}】${p.publishedAt ? ` (${p.publishedAt.getFullYear()})` : ''}
摘要：${p.summary}
关键点：${p.keyPoints.join('; ')}`;
    }).join('\n\n');

    const response = await relationshipLLM.invoke([
      new SystemMessage(`你是学术文献关系分析专家。请分析以下论文之间的关系：

关系类型：
- supports: 支持/验证另一论文的观点
- contradicts: 与另一论文观点矛盾
- extends: 扩展/深化另一论文的研究
- applies: 应用另一论文的方法/理论
- unrelated: 无直接关系

任务：
1. 分析论文两两之间的关系
2. 识别主题聚类
3. 如果可能，描述研究演进路径`),
      new HumanMessage(`请分析以下论文的关系：\n\n${papersContext}`),
    ]);

    return response;
  }

  async generateComparisonTable(
    papers: Array<{
      title: string;
      methodology: string;
      mainFindings: string[];
      limitations: string[];
    }>,
    criteria: string[]
  ): Promise<{
    headers: string[];
    rows: Array<{
      paper: string;
      values: Record<string, string>;
    }>;
    notes: string[];
  }> {
    const TableSchema = z.object({
      headers: z.array(z.string()),
      rows: z.array(z.object({
        paper: z.string(),
        values: z.record(z.string()),
      })),
      notes: z.array(z.string()),
    });

    const tableLLM = this.llm.withStructuredOutput(TableSchema);

    const papersContext = papers.map(p => {
      return `【${p.title}】
方法：${p.methodology}
发现：${p.mainFindings.join('; ')}
局限：${p.limitations.join('; ')}`;
    }).join('\n\n');

    const response = await tableLLM.invoke([
      new SystemMessage(`请生成论文对比表格。

对比标准：${criteria.join(', ')}

要求：
1. 每个论文一行
2. 每个标准一列
3. 简洁填写每个单元格
4. 添加必要的注释`),
      new HumanMessage(papersContext),
    ]);

    return response;
  }

  async identifyResearchGaps(
    papers: Array<{
      title: string;
      mainFindings: string[];
      futureWork: string[];
      limitations: string[];
    }>
  ): Promise<{
    gaps: Array<{
      area: string;
      description: string;
      mentionedBy: string[];
      priority: 'high' | 'medium' | 'low';
    }>;
    opportunities: string[];
  }> {
    const GapsSchema = z.object({
      gaps: z.array(z.object({
        area: z.string(),
        description: z.string(),
        mentionedBy: z.array(z.string()),
        priority: z.enum(['high', 'medium', 'low']),
      })),
      opportunities: z.array(z.string()),
    });

    const gapsLLM = this.llm.withStructuredOutput(GapsSchema);

    const papersContext = papers.map(p => {
      return `【${p.title}】
发现：${p.mainFindings.join('; ')}
局限：${p.limitations.join('; ')}
未来方向：${p.futureWork.join('; ')}`;
    }).join('\n\n');

    const response = await gapsLLM.invoke([
      new SystemMessage(`分析以下论文，识别研究领域的空白和机会：

任务：
1. 识别多篇论文共同指出的研究空白
2. 评估各空白的研究价值和优先级
3. 提出潜在的研究机会`),
      new HumanMessage(papersContext),
    ]);

    return response;
  }
}

export const comparisonAgent = new ComparisonAgent();

4.5 报告撰写 Agent

typescript

// src/agents/report-agent.ts

import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';
import { z } from 'zod';
import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

interface ReportSection {
  title: string;
  content: string;
  references: string[];
}

interface ResearchData {
  topic: string;
  papers: Array<{
    id: number;
    title: string;
    authors: string;
    summary: string;
    keyPoints: string[];
    year?: number;
  }>;
  insights: Array<{
    area: string;
    findings: string[];
  }>;
  comparisons: any;
  gaps: Array<{
    area: string;
    description: string;
  }>;
}

const ReportOutlineSchema = z.object({
  title: z.string(),
  sections: z.array(z.object({
    id: z.string(),
    title: z.string(),
    description: z.string(),
    subsections: z.array(z.object({
      id: z.string(),
      title: z.string(),
    })).optional(),
  })),
  estimatedLength: z.number().describe('预计字数'),
});

type ReportOutline = z.infer<typeof ReportOutlineSchema>;

const SectionContentSchema = z.object({
  content: z.string().describe('章节内容（Markdown 格式）'),
  citations: z.array(z.object({
    paperId: z.number(),
    context: z.string(),
  })).describe('引用的论文'),
  keyTakeaways: z.array(z.string()).describe('关键结论'),
});

export class ReportAgent {
  private llm: ChatOpenAI;

  constructor() {
    this.llm = new ChatOpenAI({
      modelName: 'gpt-4o',
      temperature: 0.3,
    });
  }

  async generateOutline(
    researchData: ResearchData,
    reportType: 'literature_review' | 'research_summary' | 'technical_report' = 'literature_review'
  ): Promise<ReportOutline> {
    const outlineLLM = this.llm.withStructuredOutput(ReportOutlineSchema);

    const templates: Record<string, string> = {
      literature_review: `文献综述报告结构：
1. 引言（研究背景、目的、范围）
2. 研究方法（检索策略、筛选标准）
3. 文献分析（按主题/时间/方法分类）
4. 讨论（主要发现、趋势、争议）
5. 结论（总结、建议、未来方向）`,
      research_summary: `研究摘要报告结构：
1. 执行摘要
2. 背景介绍
3. 主要发现
4. 方法对比
5. 结论和建议`,
      technical_report: `技术报告结构：
1. 概述
2. 技术背景
3. 方法分析
4. 实验结果对比
5. 最佳实践建议
6. 总结`,
    };

    const dataContext = `研究主题：${researchData.topic}
收集论文数：${researchData.papers.length}
主要研究领域：${researchData.insights.map(i => i.area).join(', ')}
识别的研究空白：${researchData.gaps.map(g => g.area).join(', ')}`;

    const response = await outlineLLM.invoke([
      new SystemMessage(`你是学术报告撰写专家。请根据研究数据生成报告大纲。

报告类型：${reportType}
参考结构：
${templates[reportType]}`),
      new HumanMessage(dataContext),
    ]);

    return response;
  }

  async writeSection(
    sectionId: string,
    sectionTitle: string,
    researchData: ResearchData,
    previousSections: ReportSection[],
    guidelines?: string
  ): Promise<ReportSection> {
    const sectionLLM = this.llm.withStructuredOutput(SectionContentSchema);

    const relevantPapers = researchData.papers.slice(0, 10);
    const papersContext = relevantPapers.map(p => 
      `[${p.id}] ${p.title} (${p.year || '年份未知'})
作者：${p.authors}
摘要：${p.summary}
关键点：${p.keyPoints.join('; ')}`
    ).join('\n\n');

    const previousContext = previousSections.length > 0
      ? `已完成章节：\n${previousSections.map(s => `- ${s.title}`).join('\n')}`
      : '';

    const response = await sectionLLM.invoke([
      new SystemMessage(`你是学术报告撰写专家。请撰写报告的「${sectionTitle}」章节。

要求：
1. 使用 Markdown 格式
2. 引用相关论文（使用 [论文ID] 格式）
3. 保持学术性和客观性
4. 逻辑清晰，过渡自然
${guidelines ? `\n特殊要求：${guidelines}` : ''}`),
      new HumanMessage(`研究主题：${researchData.topic}

${previousContext}

可用文献：
${papersContext}

研究洞察：
${researchData.insights.map(i => `- ${i.area}: ${i.findings.slice(0, 3).join('; ')}`).join('\n')}

请撰写「${sectionTitle}」章节。`),
    ]);

    const references = response.citations.map(c => {
      const paper = researchData.papers.find(p => p.id === c.paperId);
      return paper ? `[${c.paperId}] ${paper.title}` : '';
    }).filter(Boolean);

    return {
      title: sectionTitle,
      content: response.content,
      references,
    };
  }

  async generateFullReport(
    researchData: ResearchData,
    outline: ReportOutline
  ): Promise<{
    title: string;
    content: string;
    references: Array<{
      id: number;
      citation: string;
    }>;
    metadata: {
      generatedAt: Date;
      wordCount: number;
      paperCount: number;
    };
  }> {
    const sections: ReportSection[] = [];

    for (const section of outline.sections) {
      const sectionContent = await this.writeSection(
        section.id,
        section.title,
        researchData,
        sections
      );
      sections.push(sectionContent);

      if (section.subsections) {
        for (const subsection of section.subsections) {
          const subContent = await this.writeSection(
            subsection.id,
            subsection.title,
            researchData,
            sections,
            `这是「${section.title}」的子章节`
          );
          sections.push(subContent);
        }
      }
    }

    const fullContent = this.assembleReport(outline.title, sections);

    const allReferences = new Set<string>();
    sections.forEach(s => s.references.forEach(r => allReferences.add(r)));

    const references = researchData.papers
      .filter(p => Array.from(allReferences).some(r => r.includes(`[${p.id}]`)))
      .map(p => ({
        id: p.id,
        citation: `${JSON.parse(p.authors || '[]').join(', ')}. "${p.title}". ${p.year || ''}`,
      }));

    return {
      title: outline.title,
      content: fullContent,
      references,
      metadata: {
        generatedAt: new Date(),
        wordCount: fullContent.length,
        paperCount: researchData.papers.length,
      },
    };
  }

  private assembleReport(title: string, sections: ReportSection[]): string {
    let content = `# ${title}\n\n`;
    content += `*生成时间：${new Date().toLocaleDateString('zh-CN')}*\n\n`;
    content += `---\n\n`;

    content += `## 目录\n\n`;
    sections.forEach((section, index) => {
      content += `${index + 1}. [${section.title}](#${this.slugify(section.title)})\n`;
    });
    content += `\n---\n\n`;

    sections.forEach((section, index) => {
      content += `## ${index + 1}. ${section.title}\n\n`;
      content += `${section.content}\n\n`;
    });

    const allReferences = new Set<string>();
    sections.forEach(s => s.references.forEach(r => allReferences.add(r)));

    if (allReferences.size > 0) {
      content += `## 参考文献\n\n`;
      Array.from(allReferences).forEach(ref => {
        content += `${ref}\n\n`;
      });
    }

    return content;
  }

  private slugify(text: string): string {
    return text
      .toLowerCase()
      .replace(/[^\w\u4e00-\u9fa5]+/g, '-')
      .replace(/^-+|-+$/g, '');
  }

  async saveReport(
    projectId: number,
    title: string,
    content: string,
    outline: ReportOutline,
    references: Array<{ id: number; citation: string }>
  ): Promise<number> {
    const report = await prisma.researchReport.create({
      data: {
        projectId,
        title,
        content,
        outline: JSON.stringify(outline),
        references: JSON.stringify(references),
        status: 'draft',
      },
    });

    return report.id;
  }

  async updateReport(
    reportId: number,
    updates: {
      content?: string;
      status?: string;
      version?: number;
    }
  ): Promise<void> {
    await prisma.researchReport.update({
      where: { id: reportId },
      data: updates,
    });
  }
}

export const reportAgent = new ReportAgent();

第五部分：LangGraph 工作流组装

5.1 状态定义

typescript

// src/graph/state.ts

import { Annotation, messagesStateReducer } from '@langchain/langgraph';
import { BaseMessage } from '@langchain/core/messages';

interface Paper {
  id: number;
  title: string;
  authors: string;
  abstract: string;
  summary?: string;
  keyPoints?: string[];
}

interface ResearchPlan {
  understanding: string;
  searchStrategy: {
    primaryKeywords: string[];
    alternativeKeywords: string[];
    targetSources: string[];
    expectedPapers: number;
  };
  analysisApproach: {
    focusAreas: string[];
    comparisonDimensions: string[];
  };
  outputFormat: {
    reportStructure: string[];
  };
}

interface GapAnalysis {
  hasGaps: boolean;
  gaps: Array<{
    description: string;
    suggestedAction: string;
    priority: string;
  }>;
  completeness: number;
  recommendation: string;
}

export const ResearchAssistantState = Annotation.Root({
  messages: Annotation<BaseMessage[]>({
    reducer: messagesStateReducer,
    default: () => [],
  }),

  projectId: Annotation<number>({
    reducer: (_, y) => y,
    default: () => 0,
  }),

  userId: Annotation<number>({
    reducer: (_, y) => y,
    default: () => 0,
  }),

  topic: Annotation<string>({
    reducer: (_, y) => y,
    default: () => '',
  }),

  currentPhase: Annotation<'planning' | 'searching' | 'analyzing' | 'comparing' | 'writing' | 'complete'>({
    reducer: (_, y) => y,
    default: () => 'planning',
  }),

  researchPlan: Annotation<ResearchPlan | null>({
    reducer: (_, y) => y,
    default: () => null,
  }),

  collectedPapers: Annotation<Paper[]>({
    reducer: (existing, newPapers) => {
      const existingIds = new Set(existing.map(p => p.id));
      const uniqueNew = newPapers.filter(p => !existingIds.has(p.id));
      return [...existing, ...uniqueNew];
    },
    default: () => [],
  }),

  analyzedPapers: Annotation<number[]>({
    reducer: (existing, newIds) => [...new Set([...existing, ...newIds])],
    default: () => [],
  }),

  extractedInsights: Annotation<Array<{
    area: string;
    findings: string[];
    sources: string[];
  }>>({
    reducer: (existing, newInsights) => [...existing, ...newInsights],
    default: () => [],
  }),

  comparisonResults: Annotation<any>({
    reducer: (_, y) => y,
    default: () => null,
  }),

  gapAnalysis: Annotation<GapAnalysis | null>({
    reducer: (_, y) => y,
    default: () => null,
  }),

  reportOutline: Annotation<any>({
    reducer: (_, y) => y,
    default: () => null,
  }),

  reportContent: Annotation<string>({
    reducer: (_, y) => y,
    default: () => '',
  }),

  searchHistory: Annotation<string[]>({
    reducer: (existing, newSearches) => [...existing, ...newSearches],
    default: () => [],
  }),

  iterationCount: Annotation<number>({
    reducer: (_, y) => y,
    default: () => 0,
  }),

  maxIterations: Annotation<number>({
    reducer: (_, y) => y,
    default: () => 5,
  }),

  error: Annotation<string | null>({
    reducer: (_, y) => y,
    default: () => null,
  }),
});

export type ResearchAssistantStateType = typeof ResearchAssistantState.State;

5.2 工作流节点实现

typescript

// src/graph/nodes.ts

import { ResearchAssistantStateType } from './state';
import { researchPlanner } from '../agents/research-planner';
import { searchAgent } from '../agents/search-agent';
import { summaryAgent } from '../agents/summary-agent';
import { comparisonAgent } from '../agents/comparison-agent';
import { reportAgent } from '../agents/report-agent';
import { contextManager } from '../context/context-manager';
import { AIMessage } from '@langchain/core/messages';

export async function planningNode(
  state: ResearchAssistantStateType
): Promise<Partial<ResearchAssistantStateType>> {
  console.log('[Planning] Creating research plan...');

  try {
    const plan = await researchPlanner.createResearchPlan(
      state.topic,
      undefined,
      {
        maxPapers: 20,
      }
    );

    contextManager.updateResearchState(state.projectId, {
      topic: state.topic,
      keywords: plan.searchStrategy.primaryKeywords,
      currentPhase: 'search',
    });

    return {
      researchPlan: plan,
      currentPhase: 'searching',
      messages: [
        new AIMessage({
          content: `研究计划制定完成！\n\n**主题理解**：${plan.understanding}\n\n**检索策略**：\n- 主要关键词：${plan.searchStrategy.primaryKeywords.join(', ')}\n- 目标数据源：${plan.searchStrategy.targetSources.join(', ')}\n- 预计论文数：${plan.searchStrategy.expectedPapers}`,
        }),
      ],
    };
  } catch (error) {
    return {
      error: `Planning failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
    };
  }
}

export async function searchingNode(
  state: ResearchAssistantStateType
): Promise<Partial<ResearchAssistantStateType>> {
  console.log('[Searching] Executing search strategy...');

  if (!state.researchPlan) {
    return { error: 'No research plan available' };
  }

  try {
    const keywords = state.researchPlan.searchStrategy.primaryKeywords;
    const sources = state.researchPlan.searchStrategy.targetSources;

    const searchPromises = keywords.slice(0, 3).map(async (keyword) => {
      return searchAgent.search(keyword, {
        userId: state.userId,
        projectId: state.projectId,
        maxPapers: 5,
        sources,
      });
    });

    await Promise.all(searchPromises);

    const papers = await fetchProjectPapers(state.projectId);

    return {
      collectedPapers: papers,
      searchHistory: keywords,
      currentPhase: 'analyzing',
      messages: [
        new AIMessage({
          content: `文献检索完成！共找到 ${papers.length} 篇相关论文。\n\n正在进行论文分析...`,
        }),
      ],
    };
  } catch (error) {
    return {
      error: `Search failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
    };
  }
}

export async function analyzingNode(
  state: ResearchAssistantStateType
): Promise<Partial<ResearchAssistantStateType>> {
  console.log('[Analyzing] Analyzing collected papers...');

  try {
    const unanalyzedPapers = state.collectedPapers.filter(
      p => !state.analyzedPapers.includes(p.id)
    );

    if (unanalyzedPapers.length === 0) {
      return { currentPhase: 'comparing' };
    }

    const papersToAnalyze = unanalyzedPapers.slice(0, 5);
    const analyses = await summaryAgent.analyzeMultiplePapers(
      papersToAnalyze.map(p => p.id),
      state.topic
    );

    const focusAreas = state.researchPlan?.analysisApproach.focusAreas || [];
    const insights = await summaryAgent.extractKeyInsights(analyses, focusAreas);

    const analyzedIds = papersToAnalyze.map(p => p.id);

    contextManager.updateResearchState(state.projectId, {
      currentPhase: 'analyze',
    });
    
    analyzedIds.forEach(id => {
      contextManager.addProcessedPaper(state.projectId, id);
    });

    return {
      analyzedPapers: analyzedIds,
      extractedInsights: [
        {
          area: 'general',
          findings: insights.consensus,
          sources: papersToAnalyze.map(p => p.title),
        },
        ...insights.insights,
      ],
      currentPhase: 'comparing',
      messages: [
        new AIMessage({
          content: `论文分析完成！已分析 ${analyzedIds.length} 篇论文。\n\n**主要共识**：\n${insights.consensus.map(c => `- ${c}`).join('\n')}\n\n正在进行观点对比...`,
        }),
      ],
    };
  } catch (error) {
    return {
      error: `Analysis failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
    };
  }
}

export async function comparingNode(
  state: ResearchAssistantStateType
): Promise<Partial<ResearchAssistantStateType>> {
  console.log('[Comparing] Comparing viewpoints...');

  try {
    const dimensions = state.researchPlan?.analysisApproach.comparisonDimensions || [
      'methodology',
      'findings',
      'limitations',
    ];

    const paperViewpoints = state.collectedPapers
      .filter(p => state.analyzedPapers.includes(p.id))
      .map(p => ({
        paperId: p.id,
        title: p.title,
        viewpoints: (p.keyPoints || []).map(kp => ({
          topic: 'general',
          stance: kp,
          evidence: p.abstract?.slice(0, 200) || '',
        })),
      }));

    const comparison = await comparisonAgent.compareViewpoints(
      paperViewpoints,
      dimensions
    );

    const relationships = await comparisonAgent.analyzeRelationships(
      state.collectedPapers
        .filter(p => state.analyzedPapers.includes(p.id))
        .map(p => ({
          title: p.title,
          summary: p.summary || p.abstract || '',
          keyPoints: p.keyPoints || [],
        }))
    );

    contextManager.updateResearchState(state.projectId, {
      currentPhase: 'compare',
    });

    return {
      comparisonResults: {
        comparison,
        relationships,
      },
      currentPhase: 'writing',
      messages: [
        new AIMessage({
          content: `观点对比完成！\n\n**主要发现**：\n${comparison.overallSummary}\n\n正在生成研究报告...`,
        }),
      ],
    };
  } catch (error) {
    return {
      error: `Comparison failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
    };
  }
}

export async function gapCheckNode(
  state: ResearchAssistantStateType
): Promise<Partial<ResearchAssistantStateType>> {
  console.log('[Gap Check] Checking information completeness...');

  if (!state.researchPlan) {
    return { currentPhase: 'writing' };
  }

  try {
    const gapAnalysis = await researchPlanner.analyzeInformationGaps(
      state.researchPlan,
      state.collectedPapers.map(p => ({
        title: p.title,
        abstract: p.abstract || '',
        summary: p.summary,
      })),
      state.extractedInsights.flatMap(i => i.findings)
    );

    if (gapAnalysis.hasGaps && 
        gapAnalysis.completeness < 70 && 
        state.iterationCount < state.maxIterations) {
      const supplementarySearches = await researchPlanner.generateSupplementarySearches(
        gapAnalysis.gaps,
        state.searchHistory
      );

      return {
        gapAnalysis,
        iterationCount: state.iterationCount + 1,
        searchHistory: [
          ...state.searchHistory,
          ...supplementarySearches.map(s => s.query),
        ],
        currentPhase: 'searching',
        messages: [
          new AIMessage({
            content: `信息完整度：${gapAnalysis.completeness}%\n\n发现信息缺口，正在补充检索...\n\n**需要补充的领域**：\n${gapAnalysis.gaps.map(g => `- ${g.description}`).join('\n')}`,
          }),
        ],
      };
    }

    return {
      gapAnalysis,
      currentPhase: 'writing',
    };
  } catch (error) {
    return { currentPhase: 'writing' };
  }
}

export async function writingNode(
  state: ResearchAssistantStateType
): Promise<Partial<ResearchAssistantStateType>> {
  console.log('[Writing] Generating research report...');

  try {
    const researchData = {
      topic: state.topic,
      papers: state.collectedPapers
        .filter(p => state.analyzedPapers.includes(p.id))
        .map(p => ({
          id: p.id,
          title: p.title,
          authors: p.authors,
          summary: p.summary || p.abstract || '',
          keyPoints: p.keyPoints || [],
        })),
      insights: state.extractedInsights,
      comparisons: state.comparisonResults,
      gaps: state.gapAnalysis?.gaps.map(g => ({
        area: g.suggestedAction,
        description: g.description,
      })) || [],
    };

    const outline = await reportAgent.generateOutline(researchData, 'literature_review');

    const report = await reportAgent.generateFullReport(researchData, outline);

    await reportAgent.saveReport(
      state.projectId,
      report.title,
      report.content,
      outline,
      report.references
    );

    contextManager.updateResearchState(state.projectId, {
      currentPhase: 'write',
      reportDraft: report.content,
    });

    return {
      reportOutline: outline,
      reportContent: report.content,
      currentPhase: 'complete',
      messages: [
        new AIMessage({
          content: `研究报告生成完成！\n\n**报告标题**：${report.title}\n**字数**：${report.metadata.wordCount}\n**引用论文**：${report.metadata.paperCount} 篇\n\n报告已保存，您可以在文献管理中查看和导出。`,
        }),
      ],
    };
  } catch (error) {
    return {
      error: `Report generation failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
    };
  }
}

async function fetchProjectPapers(projectId: number): Promise<any[]> {
  const { PrismaClient } = await import('@prisma/client');
  const prisma = new PrismaClient();
  
  const projectPapers = await prisma.projectPaper.findMany({
    where: { projectId },
    include: { paper: true },
    orderBy: { relevance: 'desc' },
  });

  return projectPapers.map(pp => ({
    id: pp.paper.id,
    title: pp.paper.title,
    authors: pp.paper.authors,
    abstract: pp.paper.abstract,
    summary: pp.paper.summary,
    keyPoints: pp.paper.keyPoints ? JSON.parse(pp.paper.keyPoints) : [],
  }));
}

5.3 完整工作流组装

typescript

// src/graph/research-assistant-graph.ts

import { StateGraph, END } from '@langchain/langgraph';
import { ResearchAssistantState, ResearchAssistantStateType } from './state';
import {
  planningNode,
  searchingNode,
  analyzingNode,
  comparingNode,
  gapCheckNode,
  writingNode,
} from './nodes';

function routeAfterPlanning(state: ResearchAssistantStateType): string {
  if (state.error) return 'error';
  return 'searching';
}

function routeAfterSearching(state: ResearchAssistantStateType): string {
  if (state.error) return 'error';
  if (state.collectedPapers.length === 0) return 'error';
  return 'analyzing';
}

function routeAfterAnalyzing(state: ResearchAssistantStateType): string {
  if (state.error) return 'error';
  return 'comparing';
}

function routeAfterComparing(state: ResearchAssistantStateType): string {
  if (state.error) return 'error';
  return 'gap_check';
}

function routeAfterGapCheck(state: ResearchAssistantStateType): string {
  if (state.error) return 'error';
  
  if (state.currentPhase === 'searching') {
    return 'searching';
  }
  
  return 'writing';
}

function routeAfterWriting(state: ResearchAssistantStateType): string {
  if (state.error) return 'error';
  return END;
}

async function errorNode(
  state: ResearchAssistantStateType
): Promise<Partial<ResearchAssistantStateType>> {
  console.error('[Error]', state.error);
  return {
    messages: [
      {
        _getType: () => 'ai',
        content: `研究过程中发生错误：${state.error}\n\n请检查配置或重试。`,
      } as any,
    ],
  };
}

export function createResearchAssistantGraph() {
  const workflow = new StateGraph(ResearchAssistantState)
    .addNode('planning', planningNode)
    .addNode('searching', searchingNode)
    .addNode('analyzing', analyzingNode)
    .addNode('comparing', comparingNode)
    .addNode('gap_check', gapCheckNode)
    .addNode('writing', writingNode)
    .addNode('error', errorNode)

    .addEdge('__start__', 'planning')
    .addConditionalEdges('planning', routeAfterPlanning, {
      searching: 'searching',
      error: 'error',
    })
    .addConditionalEdges('searching', routeAfterSearching, {
      analyzing: 'analyzing',
      error: 'error',
    })
    .addConditionalEdges('analyzing', routeAfterAnalyzing, {
      comparing: 'comparing',
      error: 'error',
    })
    .addConditionalEdges('comparing', routeAfterComparing, {
      gap_check: 'gap_check',
      error: 'error',
    })
    .addConditionalEdges('gap_check', routeAfterGapCheck, {
      searching: 'searching',
      writing: 'writing',
      error: 'error',
    })
    .addConditionalEdges('writing', routeAfterWriting, {
      [END]: END,
      error: 'error',
    })
    .addEdge('error', END);

  return workflow.compile();
}

export async function runResearchAssistant(
  topic: string,
  userId: number,
  projectId: number
) {
  const graph = createResearchAssistantGraph();

  const initialState = {
    topic,
    userId,
    projectId,
    maxIterations: 3,
  };

  console.log(`\n${'='.repeat(60)}`);
  console.log(`Starting research on: ${topic}`);
  console.log(`${'='.repeat(60)}\n`);

  const stream = await graph.stream(initialState, {
    recursionLimit: 50,
  });

  let finalState: ResearchAssistantStateType | null = null;

  for await (const event of stream) {
    const nodeName = Object.keys(event)[0];
    const nodeState = event[nodeName];
    
    console.log(`\n[${nodeName.toUpperCase()}] Phase completed`);
    
    if (nodeState.messages && nodeState.messages.length > 0) {
      const lastMessage = nodeState.messages[nodeState.messages.length - 1];
      console.log(`Message: ${lastMessage.content?.slice(0, 200)}...`);
    }
    
    finalState = nodeState;
  }

  console.log(`\n${'='.repeat(60)}`);
  console.log('Research completed!');
  console.log(`${'='.repeat(60)}\n`);

  return finalState;
}

第六部分：API 接口与前端集成

6.1 Express 路由

typescript

// src/routes/research.ts

import { Router, Request, Response } from 'express';
import { PrismaClient } from '@prisma/client';
import { createResearchAssistantGraph, runResearchAssistant } from '../graph/research-assistant-graph';
import { contextManager } from '../context/context-manager';
import { mcpManager } from '../mcp/client-manager';

const router = Router();
const prisma = new PrismaClient();

router.post('/projects', async (req: Request, res: Response) => {
  try {
    const { userId, title, topic, keywords, description } = req.body;

    const project = await prisma.researchProject.create({
      data: {
        userId,
        title,
        topic,
        keywords: JSON.stringify(keywords || []),
        description,
      },
    });

    res.json({ success: true, project });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    });
  }
});

router.get('/projects/:id', async (req: Request, res: Response) => {
  try {
    const projectId = parseInt(req.params.id);

    const project = await prisma.researchProject.findUnique({
      where: { id: projectId },
      include: {
        papers: {
          include: { paper: true },
          orderBy: { relevance: 'desc' },
        },
        reports: {
          orderBy: { createdAt: 'desc' },
        },
        sessions: {
          orderBy: { createdAt: 'desc' },
          take: 1,
        },
      },
    });

    if (!project) {
      return res.status(404).json({ success: false, error: 'Project not found' });
    }

    res.json({ success: true, project });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    });
  }
});

router.post('/projects/:id/start', async (req: Request, res: Response) => {
  try {
    const projectId = parseInt(req.params.id);
    const { userId } = req.body;

    const project = await prisma.researchProject.findUnique({
      where: { id: projectId },
    });

    if (!project) {
      return res.status(404).json({ success: false, error: 'Project not found' });
    }

    res.json({ success: true, message: 'Research started' });

    runResearchAssistant(project.topic, userId, projectId)
      .then((result) => {
        console.log('Research completed:', result?.currentPhase);
      })
      .catch((error) => {
        console.error('Research failed:', error);
      });

  } catch (error) {
    res.status(500).json({
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    });
  }
});

router.get('/projects/:id/stream', async (req: Request, res: Response) => {
  const projectId = parseInt(req.params.id);
  const userId = parseInt(req.query.userId as string);

  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const project = await prisma.researchProject.findUnique({
    where: { id: projectId },
  });

  if (!project) {
    res.write(`data: ${JSON.stringify({ type: 'error', message: 'Project not found' })}\n\n`);
    res.end();
    return;
  }

  const graph = createResearchAssistantGraph();

  const initialState = {
    topic: project.topic,
    userId,
    projectId,
    maxIterations: 3,
  };

  try {
    const stream = await graph.stream(initialState, {
      recursionLimit: 50,
    });

    for await (const event of stream) {
      const nodeName = Object.keys(event)[0];
      const nodeState = event[nodeName];

      const update = {
        type: 'progress',
        phase: nodeName,
        state: {
          currentPhase: nodeState.currentPhase,
          papersCount: nodeState.collectedPapers?.length || 0,
          analyzedCount: nodeState.analyzedPapers?.length || 0,
          completeness: nodeState.gapAnalysis?.completeness || 0,
          error: nodeState.error,
        },
        message: nodeState.messages?.[nodeState.messages.length - 1]?.content || null,
      };

      res.write(`data: ${JSON.stringify(update)}\n\n`);
    }

    res.write(`data: ${JSON.stringify({ type: 'complete' })}\n\n`);
  } catch (error) {
    res.write(`data: ${JSON.stringify({
      type: 'error',
      message: error instanceof Error ? error.message : 'Unknown error',
    })}\n\n`);
  }

  res.end();
});

router.get('/projects/:id/papers', async (req: Request, res: Response) => {
  try {
    const projectId = parseInt(req.params.id);
    const { page = 1, limit = 20 } = req.query;

    const papers = await prisma.projectPaper.findMany({
      where: { projectId },
      include: { paper: true },
      orderBy: { relevance: 'desc' },
      skip: (Number(page) - 1) * Number(limit),
      take: Number(limit),
    });

    const total = await prisma.projectPaper.count({
      where: { projectId },
    });

    res.json({
      success: true,
      papers: papers.map(pp => ({
        ...pp.paper,
        relevance: pp.relevance,
        notes: pp.notes,
        tags: pp.tags ? JSON.parse(pp.tags) : [],
      })),
      pagination: {
        page: Number(page),
        limit: Number(limit),
        total,
        totalPages: Math.ceil(total / Number(limit)),
      },
    });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    });
  }
});

router.get('/projects/:id/reports', async (req: Request, res: Response) => {
  try {
    const projectId = parseInt(req.params.id);

    const reports = await prisma.researchReport.findMany({
      where: { projectId },
      orderBy: { createdAt: 'desc' },
    });

    res.json({ success: true, reports });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    });
  }
});

router.get('/reports/:id', async (req: Request, res: Response) => {
  try {
    const reportId = parseInt(req.params.id);

    const report = await prisma.researchReport.findUnique({
      where: { id: reportId },
    });

    if (!report) {
      return res.status(404).json({ success: false, error: 'Report not found' });
    }

    res.json({ success: true, report });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    });
  }
});

router.get('/reports/:id/export', async (req: Request, res: Response) => {
  try {
    const reportId = parseInt(req.params.id);
    const format = req.query.format as string || 'markdown';

    const report = await prisma.researchReport.findUnique({
      where: { id: reportId },
    });

    if (!report) {
      return res.status(404).json({ success: false, error: 'Report not found' });
    }

    if (format === 'markdown') {
      res.setHeader('Content-Type', 'text/markdown');
      res.setHeader('Content-Disposition', `attachment; filename="${report.title}.md"`);
      res.send(report.content);
    } else if (format === 'html') {
      const { marked } = await import('marked');
      const html = `
<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>${report.title}</title>
  <style>
    body { max-width: 800px; margin: 0 auto; padding: 20px; font-family: system-ui; }
    h1 { color: #333; }
    h2 { color: #555; border-bottom: 1px solid #eee; padding-bottom: 10px; }
    code { background: #f5f5f5; padding: 2px 6px; border-radius: 4px; }
    pre { background: #f5f5f5; padding: 15px; border-radius: 8px; overflow-x: auto; }
  </style>
</head>
<body>
  ${marked(report.content)}
</body>
</html>`;
      res.setHeader('Content-Type', 'text/html');
      res.setHeader('Content-Disposition', `attachment; filename="${report.title}.html"`);
      res.send(html);
    } else {
      res.status(400).json({ success: false, error: 'Unsupported format' });
    }
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    });
  }
});

router.post('/chat', async (req: Request, res: Response) => {
  try {
    const { projectId, userId, message } = req.body;

    let session = await prisma.researchSession.findFirst({
      where: { projectId },
      orderBy: { createdAt: 'desc' },
    });

    if (!session) {
      session = await prisma.researchSession.create({
        data: { projectId },
      });
    }

    await prisma.message.create({
      data: {
        sessionId: session.id,
        role: 'user',
        content: message,
      },
    });

    const context = await contextManager.buildContext(projectId, message, session.id);

    res.json({
      success: true,
      sessionId: session.id,
      context: {
        tokenCount: context.tokenCount,
        relevantPapersCount: context.relevantPapers.length,
      },
    });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error',
    });
  }
});

export default router;

6.2 应用入口

typescript

// src/index.ts

import express from 'express';
import cors from 'cors';
import { config } from 'dotenv';
import researchRoutes from './routes/research';
import { mcpManager } from './mcp/client-manager';
import { contextManager } from './context/context-manager';

config();

const app = express();
const PORT = process.env.PORT || 3000;

app.use(cors());
app.use(express.json());

app.use('/api/research', researchRoutes);

app.get('/health', (req, res) => {
  res.json({ status: 'ok', timestamp: new Date().toISOString() });
});

async function startServer() {
  try {
    console.log('Initializing MCP connections...');
    await mcpManager.connectAll();
    console.log('MCP servers connected:', mcpManager.listConnectedServers());

    console.log('Initializing context manager...');
    await contextManager.initialize();

    app.listen(PORT, () => {
      console.log(`Server running on http://localhost:${PORT}`);
    });
  } catch (error) {
    console.error('Failed to start server:', error);
    process.exit(1);
  }
}

process.on('SIGTERM', async () => {
  console.log('Shutting down...');
  await mcpManager.disconnectAll();
  process.exit(0);
});

startServer();

第七部分：项目结构

research-assistant/
├── src/
│   ├── index.ts                    # 应用入口
│   ├── agents/
│   │   ├── research-planner.ts     # 研究规划器
│   │   ├── search-agent.ts         # 文献检索 Agent
│   │   ├── summary-agent.ts        # 摘要提取 Agent
│   │   ├── comparison-agent.ts     # 对比分析 Agent
│   │   └── report-agent.ts         # 报告撰写 Agent
│   ├── context/
│   │   ├── context-manager.ts      # 上下文管理器
│   │   └── document-processor.ts   # 长文档处理
│   ├── graph/
│   │   ├── state.ts               # 状态定义
│   │   ├── nodes.ts               # 工作流节点
│   │   └── research-assistant-graph.ts  # 工作流组装
│   ├── mcp/
│   │   ├── client-manager.ts      # MCP 客户端管理
│   │   └── servers/
│   │       ├── arxiv-server.ts    # arXiv MCP Server
│   │       ├── semantic-scholar-server.ts  # Semantic Scholar Server
│   │       ├── pdf-server.ts      # PDF 处理 Server
│   │       └── web-search-server.ts  # 网络搜索 Server
│   ├── routes/
│   │   └── research.ts            # API 路由
│   └── utils/
│       └── helpers.ts             # 工具函数
├── prisma/
│   └── schema.prisma              # 数据库模型
├── tests/
│   ├── agents/
│   ├── graph/
│   └── mcp/
├── package.json
├── tsconfig.json
└── .env.example

第八部分：核心概念总结

8.1 Agentic RAG 的核心价值

┌─────────────────────────────────────────────────────────────┐
│                    Agentic RAG 核心价值                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 主动性 (Proactivity)                                    │
│     • 主动规划检索策略                                       │
│     • 主动发现信息缺口                                       │
│     • 主动补充和验证信息                                     │
│                                                             │
│  2. 迭代性 (Iteration)                                      │
│     • 多轮检索，逐步深入                                     │
│     • 根据反馈调整策略                                       │
│     • 直到信息充足才停止                                     │
│                                                             │
│  3. 智能性 (Intelligence)                                   │
│     • 理解研究问题的本质                                     │
│     • 评估信息质量和相关性                                   │
│     • 整合多源信息形成洞察                                   │
│                                                             │
│  4. 结构化 (Structured)                                     │
│     • 有计划、有步骤地执行                                   │
│     • 输出结构化的研究成果                                   │
│     • 可追溯的推理过程                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

8.2 上下文管理策略

┌─────────────────────────────────────────────────────────────┐
│                    上下文管理策略                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  挑战：LLM 上下文窗口有限，但研究需要处理大量信息             │
│                                                             │
│  解决方案：                                                  │
│                                                             │
│  1. 分层存储                                                │
│     ┌──────────────┐                                        │
│     │  短期记忆     │ ← 当前会话、工作状态                   │
│     ├──────────────┤                                        │
│     │  长期记忆     │ ← 已处理论文、提取摘要                 │
│     ├──────────────┤                                        │
│     │  向量存储     │ ← 语义检索、快速召回                   │
│     └──────────────┘                                        │
│                                                             │
│  2. 智能选择                                                │
│     • 根据当前问题选择相关内容                               │
│     • 使用向量相似度排序                                     │
│     • 只加载必要的信息                                       │
│                                                             │
│  3. 动态压缩                                                │
│     • 历史对话自动摘要                                       │
│     • 长文档分段处理                                         │
│     • Token 超限时智能裁剪                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

8.3 MCP 集成模式

┌─────────────────────────────────────────────────────────────┐
│                    MCP 集成最佳实践                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Server 设计原则                                         │
│     • 单一职责：每个 Server 专注一类功能                     │
│     • 标准接口：遵循 MCP 协议规范                           │
│     • 错误处理：优雅处理各种异常                             │
│                                                             │
│  2. Client 管理                                             │
│     • 连接池：复用连接，减少开销                             │
│     • 工具路由：根据工具名自动路由到对应 Server              │
│     • 生命周期：正确管理连接的创建和销毁                     │
│                                                             │
│  3. 与 LangChain 集成                                       │
│     • 工具适配：将 MCP 工具转换为 LangChain Tool            │
│     • 类型安全：使用 Zod Schema 验证输入输出                │
│     • 错误传递：正确处理和传递工具调用错误                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

总结

本项目实现了一个完整的 AI 研究助手，核心特点：

Agentic RAG 架构
- 主动规划检索策略
- 多轮迭代检索
- 自动发现和填补信息缺口
- 智能整合多源信息
多 Agent 协作
- 研究规划器：统筹协调
- 文献检索 Agent：多源搜索
- 摘要提取 Agent：信息提取
- 对比分析 Agent：观点对比
- 报告撰写 Agent：报告生成
MCP 工具集成
- arXiv 学术搜索
- Semantic Scholar 文献检索
- PDF 文档处理
- 标准化工具接口
上下文管理
- 分层存储策略
- 智能内容选择
- 动态压缩机制
- 长文档分段处理
LangGraph 工作流
- 状态驱动流程
- 条件路由控制
- 迭代循环支持
- 错误处理机制

AI 研究助手五大核心支柱

这个项目展示了如何构建一个复杂的、生产级别的 AI 研究助手系统。

LangChain 教程 34｜项目实战：AI 研究助手 ​

项目概述 ​

第一部分：架构设计 ​

1.1 什么是 Agentic RAG？ ​

1.2 系统整体架构 ​

1.3 Agentic RAG 工作流详解 ​

1.4 数据库设计 ​

第二部分：MCP 集成 - 多源文献检索 ​

2.1 MCP 服务架构 ​

2.2 学术搜索 MCP Server ​

2.3 Semantic Scholar MCP Server ​

2.4 PDF 处理 MCP Server ​

2.5 MCP Client Manager ​

第三部分：上下文管理 ​

3.1 上下文管理的重要性 ​

3.2 上下文管理器实现 ​

3.3 长文档处理 ​

第四部分：Agent 实现 ​

4.1 研究规划器（主 Agent） ​

4.2 文献检索 Agent ​

4.3 摘要提取 Agent ​

4.4 对比分析 Agent ​

4.5 报告撰写 Agent ​

第五部分：LangGraph 工作流组装 ​

5.1 状态定义 ​

5.2 工作流节点实现 ​

5.3 完整工作流组装 ​

第六部分：API 接口与前端集成 ​

6.1 Express 路由 ​

6.2 应用入口 ​

第七部分：项目结构 ​

第八部分：核心概念总结 ​

8.1 Agentic RAG 的核心价值 ​

8.2 上下文管理策略 ​

8.3 MCP 集成模式 ​

总结 ​

LangChain 教程 34｜项目实战：AI 研究助手

项目概述

第一部分：架构设计

1.1 什么是 Agentic RAG？

1.2 系统整体架构

1.3 Agentic RAG 工作流详解

1.4 数据库设计

第二部分：MCP 集成 - 多源文献检索

2.1 MCP 服务架构

2.2 学术搜索 MCP Server

2.3 Semantic Scholar MCP Server

2.4 PDF 处理 MCP Server

2.5 MCP Client Manager

第三部分：上下文管理

3.1 上下文管理的重要性

3.2 上下文管理器实现

3.3 长文档处理

第四部分：Agent 实现

4.1 研究规划器（主 Agent）

4.2 文献检索 Agent

4.3 摘要提取 Agent

4.4 对比分析 Agent

4.5 报告撰写 Agent

第五部分：LangGraph 工作流组装

5.1 状态定义

5.2 工作流节点实现

5.3 完整工作流组装

第六部分：API 接口与前端集成

6.1 Express 路由

6.2 应用入口

第七部分：项目结构

第八部分：核心概念总结

8.1 Agentic RAG 的核心价值

8.2 上下文管理策略

8.3 MCP 集成模式

总结