[부스트캠프 AI Tech / Day31] Today

피어세션 정리

Image Classification 1
- Computer Vision이란?
- Computer Vision의 가장 대표적인 task인 Image Classification에 대한 이해 및 대표적인 network 소개
Annotation data efficient learning
- 적은 수의 데이터로 효율적으로 학습 하는 방법
- Data augmentation, Knowledge distillation, Transfer learning, Self-Training…

후미: Noisy Student Training에서 어떤 부분이 각각 augmentation, teacher-student network, knowledge distillation을 나타내는 걸까?
- knowledge distillation: teacher는 학습하지 않고, student 만 학습한다는 관점에서 봤을 때, 적용되는 듯
펭귄: 2번째 슬라이드 38에서 형광펜 “Semantic information is not considered in distillation”의 의미
- weight를 학습하지 않고, 결과를 학습하고자 함
- 같은 결과를 낼 때, 다른 weight를 사용할 수 있음
후미: 왜 weight를 학습하지 않고, 결과를 따라가도록 학습하게 만들었을까?
- teacher 모델보다 적은 파라미터를 가지는 student 모델을 학습하기 위함
후미 : 2번째 슬라이드 32 페이지에서 output은 무엇일까?
- 구현에 따라 다를 것 같음
- (hard prediction - one hot encoding / soft prediction - softmax probability)
- softmax를 취한 후의 확률 값
  2주차 강의자료 참고
  - - 참고: pytorch KL Divergence Loss 라이브러리 문서
샐리: knowledge distillation의 역할? 목적?
- 경량화
- fine tuning - 새로운 task를 잘 할 수 있도록, 적은 데이터를 통해 학습

과제 팁
- from tqdm import tqdm 사용해보기
pytorch 공부할거리
- https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html
- https://tutorials.pytorch.kr/
Knowledge Distillation 강의 영상
- http://dmqm.korea.ac.kr/activity/seminar/304
vision transformer
- https://engineer-mole.tistory.com/133

Pytorch에서 tensor transform 을 할때 대부분 mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]와 같이 하곤 합니다. 그 이유는 무엇일까요???
https://stackoverflow.com/questions/58151507/why-pytorch-officially-use-mean-0-485-0-456-0-406-and-std-0-229-0-224-0-2
https://discuss.pytorch.org/t/how-does-torchvision-transforms-normalize-work/57670
https://discuss.pytorch.org/t/understanding-transform-normalize/21730]