TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Jul 17, 2021

Method

此篇基於 ViT 及 UNet 做醫學影像的語意分割，ViT 的部分和原始 ViT 論文作法一模一樣，且有嘗試將 ViT 的 output 使用單純的 bilinearly upsample 輸出 N 個 channel(class) 的 segmentation，但因為 ViT 的 H/P*W/P 通常比原始影像的 H*W 小很多，所以效果不會太好，因此作者提出 hybrid 的 CNN-Transformer 來補足 low-level details，利用 CNN 取出 feature map 然後當作 ViT 的 input，架構上也和 UNet 一樣使用了 skip-connection，並透過網路的方式做 upsample。

Experiment

Synapse multi-organ segmentation Dataset

上半部比較 SOTA 的 model，下半部做 ablation study，None 的 decoder 表示使用一般的 bilinearly upsample，hausdorff distance 是指兩個集合若是要彼此包含的話所需要的最短距離。

ACDC dataset

Number of skip-connections

Input resolution

Sequence length and patch size

Model scaling

基本上這篇就是把原始 ViT 加上 UNet 架構而已，並沒有太多的改變和實驗，但根據其他論文和此篇論文可以看出在 ViT 之前加上 convolution 的效果都還不錯。

Reference

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Machine Learning

Computer Vision

Written by Balin

NTUST CSIE

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

YOLOv12: Redefining Real-Time Object Detection 🚀

Henry Navarro

YOLOv12: Redefining Real-Time Object Detection 🚀

Introducing the Pioneering Features and Performance of YOLOv12 from the Latest Research

Feb 19

Object detection with Vision Transformers

In

AI Innovator From PrismAI

by

Abhijat Sarari

Object detection with Vision Transformers

Object detection is a core task in computer vision, powering technologies from self-driving cars to real-time video surveillance. It…

Oct 20, 2024

Lists

Predictive Modeling w/ Python

20 stories1856 saves

Practical Guides to Machine Learning

10 stories2225 saves

Natural Language Processing

1977 stories1619 saves

data science and AI

40 stories340 saves

YOLO v3 v5 v8 explanation | YOLO vs. Faster R-CNN

Jo Wang

YOLO v3 v5 v8 explanation | YOLO vs. Faster R-CNN

YOLO (You Only Look Once): YOLO treats object detection as a regression problem, predicting bounding boxes and class probabilities directly…

Oct 20, 2024

Image Segmentation in Machine Learning: A Step-by-Step Guide

Daniel García

Image Segmentation in Machine Learning: A Step-by-Step Guide

If you’ve ever wondered how self-driving cars recognize objects on the road or how medical imaging software detects tumors, the answer…

Sep 23, 2024

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

In

Level Up Coding

by

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

Understanding and Implementing Faster R-CNN

Rishabh Singh

Understanding and Implementing Faster R-CNN

Most of the current SOTA models are built on top of the groundwork laid by the Faster-RCNN model. Faster R-CNN is an object detection model…

Oct 14, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams