๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

NLP ๐Ÿ—ฃ3

seq2seq(sequence to sequence), attention seq2seq๋Š” ์ฑ—๋ด‡(chatBot), ๊ธฐ๊ณ„ ๋ฒˆ์—ญ(machine-traslation), STT(Speach-to-Text)๋“ฑ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ์ด๋‹ค. seq2seq๋ฅผ ์ž์„ธํžˆ ๋ณด๋ฉด Encoder์™€ Decoder๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๋‹ค. ์ธ์ฝ”๋” ๋ชจ๋“  ๋‹จ์–ด๋“ค์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ž…๋ ฅ ๋ฐ›์•„ ๋งˆ์ง€๋ง‰์— ๋ชจ๋“  ๋‹จ์–ด ์ •๋ณด๋“ค์„ ์••์ถ•ํ•ด context vector๋ฅผ ๋งŒ๋“ ๋‹ค. ๋””์ฝ”๋” context vector๋ฅผ ์ž…๋ ฅ์œผ๋กœ ํ•˜์—ฌ ๋ฒˆ์—ญ๋œ ๋‹จ์–ด๋ฅผ ํ•œ๊ฐœ์”ฉ ์ˆœ์ฐจ์ ์œผ๋กœ ์ถœ๋ ฅํ•œ๋‹ค. ์œ„ ๊ทธ๋ฆผ์„ ์กฐ๊ธˆ ๋” ์ž์„ธํžˆ ๊ทธ๋ ค๋ณด๋ฉด ์‚ฌ์‹ค ์ธ์ฝ”๋”์™€ ๋””์ฝ”๋”๋Š” ๋‘๊ฐœ์˜ RNN์ด๋‹ค. ์ธ์ฝ”๋”๋ฅผ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๋ฉด ์ž…๋ ฅ ๋ฌธ์žฅ์€ ๋‹จ์–ด ํ† ํฐํ™”๋ฅผ ํ†ตํ•ด ๋‹จ์–ด ๋‹จ์œ„๋กœ ์ชผ๊ฐœ์ง€๊ณ  ๋‹จ์–ด ํ† ํฐ ๊ฐ๊ฐ์€ RNN ์…€ ๊ฐ ์‹œ์ ์—์„œ ์ž…๋ ฅ์ด ๋œ๋‹ค. ์ธ์ฝ”๋ฐ RNN ์…€์€ ๋ชจ๋“  ๋‹จ์–ด๋ฅผ ์ž…๋ ฅ ๋ฐ›์€ ๋’ค ์ธ์ฝ”๋” RNN ์…€์˜ .. 2022. 1. 25.
one-hot encoding ๋‹จ์–ด๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜๋Š”๋ฐ ์žˆ์–ด ๊ฐ€์žฅ ์›์ดˆ์ ์ธ ๋ฐฉ๋ฒ• ๋‹จ์–ด ์ง‘ํ•ฉ์˜ ํฌ๊ธฐ๊ฐ€ ์ฐจ์›์ด ๋œ๋‹ค. [the, cat, sat, on, the, mat] 5๊ฐ€์ง€ ๋‹จ์–ด๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ์žฅ์€ 3์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ๊ฐ€์ง„๋‹ค. from konlpy.tag import Komoran import numpy as np komoran = Komoran() text = "์˜ค๋Š˜ ๋‚ ์”จ๋Š” ๊ตฌ๋ฆ„์ด ๋งŽ์•„์š”" nouns = komoran.nouns(text) print(nouns) dics = {} for word in nouns: if word not in dics.keys(): dics[word] = len(dics) #0, 1, 2 print(dics) #one-hot-encoding nb_classes = len(dics) targets = list(d.. 2022. 1. 21.
1. Embedding 1.1 ์ž„๋ฒ ๋”ฉ์ด๋ž€ ์‚ฌ๋žŒ์ด ์“ฐ๋Š” ์ž์—ฐ์–ด๋ฅผ ๊ธฐ๊ณ„๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ๋ฒกํ„ฐ๋กœ ๋ฐ”๊พธ๋Š” ๊ณผ์ • 1.2 ์ž„๋ฒ ๋”ฉ์˜ ์—ญํ•  1) ๋‹จ์–ด/ ๋ฌธ์žฅ ๊ฐ„ ๊ด€๋ จ๋„ ๊ณ„์‚ฐ ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜๋Š” ์ˆœ๊ฐ„ ๋‹จ์–ด๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐ ๊ฐ€๋Šฅ ex> ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ 2) ์˜๋ฏธ/๋ฌธ๋ฒ• ์ •๋ณด ํ•จ์ถ• ๋ฒกํ„ฐ๊ฐ„์˜ ์‚ฌ์นœ ์—ฐ์‚ฐ์œผ๋กœ ๋‹จ์–ด ๊ฐ„์˜ ์œ ์ถ” ํ‰๊ฐ€ ๊ฐ€๋Šฅ ex> ๋‹จ์–ด1 + ๋‹จ์–ด 2 - ๋‹จ์–ด 3 = ์•„๋“ค + ๋”ธ - ์†Œ๋…„ = ์†Œ๋…€ 3) ์ „์ด ํ•™์Šต ๋‹จ์–ด๋ฅผ ์ž„๋ฒ ๋”ฉ ํ•œ ๊ฐ’์ด input ์œผ๋กœ ์“ฐ์ธ๋‹ค. 1.3 ์ž„๋ฒ ๋”ฉ ๊ธฐ๋ฒ• 0) ๊ทธ ์ „์— ์‹œ์†Œ๋Ÿฌ์Šค๋ฅผ ํ™œ์šฉํ•œ ๋ฐฉ๋ฒ• (์‹œ์†Œ๋Ÿฌ์Šค ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•) wordNet์ˆ˜๋งŽ์€ ๋‹จ์–ด์— ๋Œ€ํ•œ ๋™์˜์–ด์™€ ๊ณ„์ธต ๊ตฌ์กฐ ๋“ฑ ์ •์˜ ํ•˜์ง€๋งŒ ์ˆ˜์ž‘์—…์œผ๋กœ ์‹œ๋Œ€ ๋ณ€ํ™”์— ๋Œ€์‘ํ•˜๊ธฐ ์–ด๋ ต๋‹ค. ๋น„์šฉ์ด ๋งŽ์ด ๋“ ๋‹ค. ๋‹จ์–ด๊ฐ„ ๋ฏธ๋ฌ˜ํ•œ ์ฐจ์ด๋ฅผ ํ‘œํ˜„ ํ•  ์ˆ˜ ์—†๋‹ค. nlp์—์„œ ๊ฐ€์žฅ ์œ ๋ช…ํ•œ ์‹œ์†Œ๋Ÿฌ์Šค.. 2021. 12. 10.