Bert + KorQuAD
BERT
- https://github.com/google-research/bert 버트 오픈소스
- 깃허브 코드엔
- Tensorflow code for BERT model (Bert -Base , Bert-large)
- Pre-trained checkpoints for both the lowercase and cased version of BERT-Base and BERT-Large from the paper.
- TensorFlow code for push-button replication of the most important fine-tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC.
- 버트 깃허브에 가보면 BERT 모델에 대해 다음과 같이 설명하고 있다.
- first unsupervised, deeply bidirectional system for pre-training NLP
<Unsupervised>
: plain text corpus 를 인풋으로 집어넣을 수 있다는 것이고, 이는 인터넷 상의 여러 언어로 되어있는 여러 text data 들을 쓸 수 있다는 것.
<Deeply Bidirectional System>
- BERT was built upon recent work in pre-training contextual representations. (e.g.) Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit
- but crucially these models are all unidirectional or shallowly bidirectional. This means that each word is only contextualized using the words to its left (or right).
- BERT는 deeply bidirectional 인데, 예를 들어 bank 를 represent 한다면 "I made a --- deposit" 문장 전체를 본다는 것.
- (c.f.) context free - 문맥 안따진 임베딩 <-> contextual
-
- 이게 어떻게 가능? <Mask>를 이용해서!
- mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words.
- Mask 방법에 더해, 문장 사이의 방법을 보는 Next Sentence Prediction
- 이게 어떻게 가능? <Mask>를 이용해서!
Pre-trained 과정 (Mask LM + NSP) 가 끝나고 나면,
then train a large model (12 - 24 layer Transformer) on a large corpus for a long time
and then, ====>>BERT 탄생!
** Using Bert has two stages : Pre-Training and Fine-Tuning
--------------------
<Bert on Korquad>
- Korquad files
Korquad v1.0 에서 train set, dev set, evaluation script 다운로드
- Pretrained files
(Bert Pretrained Model 의 checkpoint)
- bert_model.ckpt.meta
- bert_mode.ckpt.index
- bert_model.ckpt.data
- vocab
- bert_config
* Pretrained Files를 더 작은 한국어 버트 모델로 대체 - https://github.com/MrBananaHuman/KoreanCharacterBert?fbclid=IwAR13agfkN9NjlMku8g2YRD0WjM37w96zpbXgSghqs_RanymC5ppOtn8tDtk
(hidden layer 3개, 음절단위 토크나이징, 사전 사이즈 7000개)
- bert_files
modeling, optimization, run_squad
tokenization(smallermodel 에 제공된 tokenization 이용)
Transformer
https://arxiv.org/abs/1706.03762
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new
arxiv.org
https://arxiv.org/pdf/1810.04805.pdf