Paperless-ngx 文件管理系統(Docker)

  • 之前就一直找尋可以快速搜尋 File Server 內檔案內文關鍵字的系統, 最近看到這套 Paperless-ngx 還具有OCR的功能, 連掃描產生的 PDF 內文都可以解析出內文, 真的就很符合我希望使用的情境.
  • 安裝環境 :
    • VM : 4 vCores / 8G RAM / 32G(SSD)+500G(HDD)
    • 配置 : 將 500G 掛在 /data 目錄上, 作為存放資料使用
  1. 下載 docker-compose.env 與 docker-compose.yml

    wget https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/dev/docker/compose/docker-compose.env -O docker-compose.env
    wget https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/dev/docker/compose/docker-compose.postgres.yml -O docker-compose.yml

  2. 修改 docker-compose.env

    vi docker-compose.env

    • 增加繁體中文 OCR 辨識功能
      PAPERLESS_OCR_LANGUAGES=chi-tra chi-tra-vert
    • 修改網址 Exp. docs.my.ichiayi.com
      PAPERLESS_URL=https://docs.my.ichiayi.com
    • 修改時區
      PAPERLESS_TIME_ZONE=Asia/Taipei
    • 修改預設 OCR為繁體中文+英文
      PAPERLESS_OCR_LANGUAGE=chi_tra+eng
  3. 設定 Reverse Proxy(Option) Exp. docs.my.ichiayi.com → http 172.16.0.220 8000
  4. 修改 docker-compose.yml 來支援 Office 格式, 以及增加 time out 時間, 資料存放到 /data

    vi docker-compose.yml

    version: "3.4"
    services:
      broker:
        image: docker.io/library/redis:7
        restart: unless-stopped
        volumes:
          - redisdata:/data
    
      db:
        image: docker.io/library/postgres:15
        restart: unless-stopped
        volumes:
          - pgdata:/var/lib/postgresql/data
        environment:
          POSTGRES_DB: paperless
          POSTGRES_USER: paperless
          POSTGRES_PASSWORD: paperless
    
      webserver:
        image: ghcr.io/paperless-ngx/paperless-ngx:latest
        restart: unless-stopped
        depends_on:
          - db
          - broker
          - gotenberg
          - tika
        ports:
          - "8000:8000"
        volumes:
          - data:/usr/src/paperless/data
          - media:/usr/src/paperless/media
          - ./export:/usr/src/paperless/export
          - ./consume:/usr/src/paperless/consume
        env_file: docker-compose.env
        environment:
          PAPERLESS_REDIS: redis://broker:6379
          PAPERLESS_DBHOST: db
          PAPERLESS_TIKA_ENABLED: 1
          PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
          PAPERLESS_TIKA_ENDPOINT: http://tika:9998
    
      gotenberg:
        image: docker.io/gotenberg/gotenberg:7.10
        restart: unless-stopped
    
        # The gotenberg chromium route is used to convert .eml files. We do not
        # want to allow external content like tracking pixels or even javascript.
        command:
          - "gotenberg"
          - "--chromium-disable-javascript=true"
          - "--chromium-allow-list=file:///tmp/.*"
          - "--uno-listener-start-timeout=90s"
          - "--api-timeout=900s"
    
      tika:
        image: ghcr.io/paperless-ngx/tika:latest
        restart: unless-stopped
    
    volumes:
      data:
        driver: local
        driver_opts:
          type: 'none'
          o: 'bind'
          device: '/data/web-data'
      media:
        driver: local
        driver_opts:
          type: 'none'
          o: 'bind'
          device: '/data/web-media'
      pgdata:
        driver: local
        driver_opts:
          type: 'none'
          o: 'bind'
          device: '/data/db-data'
      redisdata:
        driver: local
        driver_opts:
          type: 'none'
          o: 'bind'
          device: '/data/broker-data'
  5. 建立 /data 內各個資料目錄

    mkdir -p /data/web-data
    mkdir -p /data/web-media
    mkdir -p /data/db-data
    mkdir -p /data/broker-data

  6. 第一次抓取 docker images

    docker compose pull

  7. 建立第一位 Paperless 管理者帳號

    docker compose run --rm webserver createsuperuser

  8. 啟動 Paperless 服務

    docker compose up -d

  • tech/paperless.txt
  • 上一次變更: 2024/02/01 12:14
  • jonathan