Computer Science

Bert + KorQuAD

Tanya 탄야 2019. 11. 7. 21:00

BERT 

 

 -  https://github.com/google-research/bert 버트 오픈소스

 

 - 깃허브 코드엔 

    - Tensorflow code for BERT model (Bert -Base , Bert-large)

    - Pre-trained checkpoints for both the lowercase and cased version of BERT-Base and BERT-Large from the paper.

    - TensorFlow code for push-button replication of the most important fine-tuning experiments from the paper,            including SQuAD, MultiNLI, and MRPC.

 

 

 - 버트 깃허브에 가보면 BERT 모델에 대해 다음과 같이 설명하고 있다.

  • first unsupervised, deeply bidirectional system for pre-training NLP

<Unsupervised>

: plain text corpus 를 인풋으로 집어넣을 수 있다는 것이고, 이는 인터넷 상의 여러 언어로 되어있는 여러 text data 들을 쓸 수 있다는 것. 

<Deeply Bidirectional System> 

  • BERT was built upon recent work in pre-training contextual representations. (e.g.) Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit
  • but crucially these models are all unidirectional or shallowly bidirectional. This means that each word is only contextualized using the words to its left (or right). 
  • BERT는 deeply bidirectional 인데, 예를 들어 bank 를 represent 한다면 "I made a --- deposit" 문장 전체를 본다는 것. 
  •  
  • (c.f.) context free - 문맥 안따진 임베딩 <-> contextual 
    • 이게 어떻게 가능? <Mask>를 이용해서! 
      • mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words.
    • Mask 방법에 더해, 문장 사이의 방법을 보는 Next Sentence Prediction

Pre-trained 과정 (Mask LM + NSP) 가 끝나고 나면,

then train a large model (12 - 24 layer Transformer) on a large corpus for a long time

and then, ====>>BERT 탄생!

 

** Using Bert has two stages : Pre-Training and Fine-Tuning 

 

 

 

 

--------------------

<Bert on Korquad> 

 - Korquad  files 

  Korquad v1.0 에서 train set, dev set, evaluation script 다운로드

 - Pretrained files

  (Bert Pretrained Model 의 checkpoint) 

 - bert_model.ckpt.meta 

 - bert_mode.ckpt.index

 - bert_model.ckpt.data

 - vocab

 - bert_config

  * Pretrained Files를 더 작은 한국어 버트 모델로 대체 - https://github.com/MrBananaHuman/KoreanCharacterBert?fbclid=IwAR13agfkN9NjlMku8g2YRD0WjM37w96zpbXgSghqs_RanymC5ppOtn8tDtk 

 

(hidden layer 3개, 음절단위 토크나이징, 사전 사이즈 7000개) 

 

 - bert_files

modeling, optimization, run_squad

tokenization(smallermodel 에 제공된 tokenization 이용)

 

 

 

 

 

Transformer

https://arxiv.org/abs/1706.03762

 

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new

arxiv.org

https://arxiv.org/pdf/1810.04805.pdf

불러오는 중입니다...

 

반응형