文件服务上传慢排查

2026-02-13/2026-02-13

情况说明:文件上传出现大量上传失败,查看文件模块日志未报错,应该是上传时间超时,导致前端展示上传失败

首先服务新增日志,然后部署上新的服务,打印日志如下:

10:09:57.551 [http-nio-9300-exec-1] INFO c.r.f.c.SysFileController - [uploadBizFile,91] - ========== 开始上传文件 ========== 
10:10:04.547 [http-nio-9300-exec-1] INFO c.r.f.c.SysFileController - [uploadBizFile,95] - 文件存储耗时: 6995ms 
10:10:04.954 [http-nio-9300-exec-1] INFO c.r.f.c.SysFileController - [uploadBizFile,105] - 数据库保存耗时: 406ms 
10:10:04.955 [http-nio-9300-exec-1] INFO c.r.f.c.SysFileController - [uploadBizFile,114] - 文件上传总耗时: 7404ms, URL: /file/statics/2026/01/06/pexels-johannes-plenio-1477832_20260106100957A001.jpg

可以看到,瓶颈在于文件上传,非数据库

查看文件上传代码:

public static final String upload(String baseDir, MultipartFile file, String[] allowedExtension) 
            throws FileSizeLimitExceededException, IOException, FileNameLengthLimitExceededException, InvalidExtensionException 
    { 
        int fileNamelength = Objects.requireNonNull(file.getOriginalFilename()).length(); 
        if (fileNamelength > FileUploadUtils.DEFAULT_FILE_NAME_LENGTH) 
        { throw new FileNameLengthLimitExceededException(FileUploadUtils.DEFAULT_FILE_NAME_LENGTH); } 
        assertAllowed(file, allowedExtension); 
        String fileName = extractFilename(file); 
        String absPath = getAbsoluteFile(baseDir, fileName).getAbsolutePath(); 
        file.transferTo(Paths.get(absPath)); 
        return getPathFileName(fileName); 
    }

代码使用了file.transferTo(Paths.get(absPath));

此操作是阻塞式 IO,单线程写磁盘,且当前 Tomcat 工作线程同步等待

并发一上来tomcat工作线程被打满,上传本来就慢,后续请求也进不来,直接造成服务雪崩

首先优化的点是采用显示的Buffered Stream(避免二次拷贝)

try (
    InputStream in = file.getInputStream();
    OutputStream out = new BufferedOutputStream(
            Files.newOutputStream(Paths.get(absPath),
            StandardOpenOption.CREATE_NEW),
            1024 * 1024 // 1MB buffer
    )
) {
    IOUtils.copy(in, out);
}
  • 避免 transferTo 的隐式逻辑
  • 控制 buffer,减少系统调用
  • 实测在 NFS 上能下降 30%~50%

但是代码上线后仍然报错,后续意识到文件存储是挂载在nfs上的,文件实际存储链路

客户端
  ↓
Tomcat 接收 multipart(可能已落一次临时文件)
  ↓
http-nio-exec-* 线程
  ↓
transferTo()
  ↓
Linux VFS
  ↓
NFS Client
  ↓
网络
  ↓
NFS Server 写盘 + ACK
  ↓
ACK 返回后,Java 方法才结束

nfs存在一次网络+磁盘写入,非本机io,然后进行了一次测试

image.png
可以看到,本地写入磁盘1M的数据都要49s,所以肯定是nfs挂载出现了问题

我们首先在群里反馈了该问题,然后服务也同步进行了优化

修改方案为

落本地盘,再“后台同步”到 NFS 的方案,再加上 异步 + 限流

http的上传请求,只写入本地盘,该上传可做到毫秒级,然后再用异步线程去处理文件同步nfs挂载盘,相关代码:

nfs异步上传服务

package com.ruoyi.file.service;

import com.baomidou.mybatisplus.core.conditions.update.UpdateWrapper;
import com.ruoyi.file.config.NfsAsyncEnum;
import com.ruoyi.file.repository.service.FileLocalToNfsMetaService;
import com.ruoyi.file.utils.FileUploadUtils;
import com.ruoyi.system.api.domain.FileLocalToNfsMeta;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.io.IOUtils;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;

import javax.annotation.Resource;
import java.io.BufferedOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.time.LocalDateTime;

/**
 * @author buguniao
 */
@Slf4j
@Service
public class NfsUploadService {


    private static final int BUFFER_SIZE = 4 * 1024 * 1024;

    @Value("${file.local-tmp-dir}")
    private String localTmpDir;

    @Value("${file.path}")
    private String nfsPath;


    @Resource
    private FileLocalToNfsMetaService fileLocalToNfsMetaService;


    @Async("nfsUploadExecutor")
    public void uploadToNfs(Integer fileId) {

        FileLocalToNfsMeta byId = fileLocalToNfsMetaService.getById(fileId);
        String relativePath = byId.getRelativePath();

        relativePath = FileUploadUtils.getFileUploadPath(relativePath);

        Path local = Paths.get(
                localTmpDir, relativePath);
        Path nfs = Paths.get(
                nfsPath, relativePath);

        try {
            Files.createDirectories(nfs.getParent());

            try (InputStream in = Files.newInputStream(local);
                 OutputStream out = new BufferedOutputStream(
                         Files.newOutputStream(
                                 nfs, StandardOpenOption.CREATE_NEW),
                         BUFFER_SIZE)) {

                IOUtils.copyLarge(in, out);
            }

            Files.deleteIfExists(local);


            UpdateWrapper<FileLocalToNfsMeta> success = new UpdateWrapper<FileLocalToNfsMeta>()
                    .set("status", NfsAsyncEnum.SUCCESS.getCode())
                    .set("update_time", LocalDateTime.now())
                    .eq("file_id", fileId);
            fileLocalToNfsMetaService.update(success);

        } catch (Exception e) {
            log.error("NFS 上传失败: {}", relativePath, e);
            UpdateWrapper<FileLocalToNfsMeta> fail = new UpdateWrapper<FileLocalToNfsMeta>()
                    .set("status", NfsAsyncEnum.SUCCESS.getCode())
                    .set("last_error", e.getMessage())
                    .set("retry_count", byId.getRetryCount() == null ? 1 : byId.getRetryCount() + 1)
                    .set("update_time", LocalDateTime.now())
                    .eq("relative_path", relativePath);
            fileLocalToNfsMetaService.update(fail);
        }
    }


}

线程池配置

package com.ruoyi.file.config;

import com.google.common.util.concurrent.ThreadFactoryBuilder;
import lombok.extern.slf4j.Slf4j;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;

import java.util.concurrent.*;

/**
 * @author buguniao
 */
@Configuration
@Slf4j
public class ThreadPoolConfig {

    @Bean("nfsUploadExecutor")
    public Executor nfsUploadExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(2);
        executor.setMaxPoolSize(4);
        executor.setQueueCapacity(500);
        executor.setThreadNamePrefix("nfs-upload-");
        executor.setRejectedExecutionHandler(
                new ThreadPoolExecutor.CallerRunsPolicy()
        );
        executor.initialize();
        return executor;
    }
    
    }

本地临时文件清理任务

package com.ruoyi.file.service;

import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.baomidou.mybatisplus.core.conditions.update.UpdateWrapper;
import com.ruoyi.file.config.NfsAsyncEnum;
import com.ruoyi.file.repository.service.FileLocalToNfsMetaService;
import com.ruoyi.file.utils.FileUploadUtils;
import com.ruoyi.system.api.domain.FileLocalToNfsMeta;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.io.FileUtils;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

import javax.annotation.Resource;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.time.Duration;
import java.time.Instant;
import java.time.LocalDateTime;
import java.util.List;
import java.util.UUID;
import java.util.stream.Stream;

/**
 * @author buguniao
 * 本地临时文件 TTL 清理(定时任务)
 */
@Component
@Slf4j
public class LocalTmpFileCleanJob {

    @Resource
    private FileLocalToNfsMetaService fileLocalToNfsMetaService;

    @Resource
    private NfsUploadService nfsUploadService;

    @Value("${file.local-tmp-dir}")
    private String localTmpDir;

    /**
     * 每十分钟扫描一次数据库清除临时文件
     */
    @Scheduled(cron = "0 */10 * * * ?")
    public void clean() {
        String batchId = UUID.randomUUID().toString();
        log.info("开始文件清理,批次号{},时间{}", batchId, LocalDateTime.now());
        LocalDateTime now = LocalDateTime.now();
        // 1. 清理已完成但未删除的本地文件
        deleteByStatus(NfsAsyncEnum.SUCCESS.getCode(), now.minusMinutes(10));
        // 2. 清理失败且超过 TTL 的文件
        deleteByStatus(NfsAsyncEnum.FAIL.getCode(), now.minusHours(48));
        log.info("文件清理结束,批次号{},时间{}", batchId, LocalDateTime.now());
    }

    /**
     * 每日三点清除孤儿文件
     */
    @Scheduled(cron = "0 0 3 * * ?")
    public void cleanOrphanFiles() {

        Path baseDir = Paths.get(localTmpDir);

        try (Stream<Path> files = Files.walk(baseDir)) {
            files
                    .filter(Files::isRegularFile)
                    .forEach(this::tryDeleteOrphan);
        } catch (IOException e) {
            log.error("扫描本地文件失败", e);
        }
    }


    @Scheduled(cron = "0 */30 * * * ?")
    public void retryNfsUpload() {
        // 1、取同步失败的数据
        LambdaQueryWrapper<FileLocalToNfsMeta> queryWrapper = new LambdaQueryWrapper<>();
        queryWrapper.eq(FileLocalToNfsMeta::getStatus, NfsAsyncEnum.FAIL.getCode())
                .orderByDesc(FileLocalToNfsMeta::getUpdateTime)
                .last("limit 20");
        List<FileLocalToNfsMeta> retryList = fileLocalToNfsMetaService.list(queryWrapper);

        for (FileLocalToNfsMeta meta : retryList) {
            nfsUploadService.uploadToNfs(meta.getFileId());
            log.info("重新同步本地文件{}", meta.getRelativePath());
        }
    }


    private void deleteByStatus(String status, LocalDateTime expireTime) {

        LambdaQueryWrapper<FileLocalToNfsMeta> lambdaQueryWrapper = new LambdaQueryWrapper<>();
        lambdaQueryWrapper.eq(FileLocalToNfsMeta::getStatus, status)
                .le(FileLocalToNfsMeta::getUpdateTime, expireTime)
                .last("limit 100");
        List<FileLocalToNfsMeta> list = fileLocalToNfsMetaService.list(lambdaQueryWrapper);


        for (FileLocalToNfsMeta meta : list) {
            try {
                String filePath = FileUploadUtils.getFileUploadPath(meta.getRelativePath());
                String tmpPath = localTmpDir + "/" + filePath;
                Path path = Paths.get(tmpPath);
                if (Files.exists(path)) {
                    Files.delete(path);
                }
                log.info("临时文件{}已删除", tmpPath);
                UpdateWrapper<FileLocalToNfsMeta> updateWrapper = new UpdateWrapper<>();
                updateWrapper.set("status", NfsAsyncEnum.DELETE_SUCCESS.getCode())
                        .eq("file_id", meta.getFileId());
                fileLocalToNfsMetaService.update(updateWrapper);
            } catch (Exception e) {
                log.warn("本地文件清理失败 fileId={}", meta.getFileId(), e);
            }
        }
    }


    private void tryDeleteOrphan(Path path) {

        try {
            FileTime time = Files.getLastModifiedTime(path);
            long hours =
                    Duration.between(
                            time.toInstant(),
                            Instant.now()
                    ).toHours();
            if (hours > 72) {
                Files.deleteIfExists(path);
                log.warn("兜底清理孤儿文件 {}", path);
            }

        } catch (Exception ignored) {
        }
    }

}

优化上线后,控制到文件上传时间在100ms内,跟踪服务未出现上传异常

后续群内通知nfs盘异常已修复,同步测试后读写nfs盘速率已恢复正常,但此方案也保留下来了。


标题:文件服务上传慢排查
作者:buguniao
地址:https://thunderdemon.cn/articles/2026/02/13/1770976924780.html

       
       
取消