실습 설명
포터 스테머 알고리즘으로 어간을 추출하는 함수 stemming_by_porter()를 만들어주세요.
● stemming_by_porter() 함수는 파라미터로 토큰화한 코퍼스(tokenized_words)가 전달됩니다.
● 결과로는 어간이 추출된 토큰 리스트가 반환됩니다.
main.py
# 필요한 패키지와 함수 불러오기
import nltk
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from text import TEXT
nltk.download('punkt')
corpus = TEXT
tokenized_words = word_tokenize(corpus)
# 포터 스테머의 어간 추출
def stemming_by_porter(tokenized_words):
porter_stemmer = PorterStemmer()
porter_stemmed_words = []
for word in tokenized_words:
# 여기에 코드를 작성하세요
stem = porter_stemmer.stem(word)
porter_stemmed_words.append(stem)
return porter_stemmed_words
stemming_by_porter(tokenized_words)
text.py
TEXT = """After reading the comments for this movie, I am not sure whether I should be angry, sad or sickened. Seeing comments typical of people who a)know absolutely nothing about the military or b)who base everything they think they know on movies like this or on CNN reports about Abu-Gharib makes me wonder about the state of intellectual stimulation in the world. At the time I type this the number of people in the US military: 1.4 million on Active Duty with another almost 900,000 in the Guard and Reserves for a total of roughly 2.3 million. The number of people indicted for abuses at at Abu-Gharib: Currently less than 20 That makes the total of people indicted .00083% of the total military. Even if you indict every single military member that ever stepped in to Abu-Gharib, you would not come close to making that a whole number. The flaws in this movie would take YEARS to cover. I understand that it's supposed to be sarcastic, but in reality, the writer and director are trying to make commentary about the state of the military without an enemy to fight. In reality, the US military has been at its busiest when there are not conflicts going on. The military is the first called for disaster relief and humanitarian aid missions. When the tsunami hit Indonesia, devestating the region, the US military was the first on the scene. When the chaos of the situation overwhelmed the local governments, it was military leadership who looked at their people, the same people this movie mocks, and said make it happen. Within hours, food aid was reaching isolated villages. Within days, airfields were built, cargo aircraft started landing and a food distribution system was up and running. Hours and days, not weeks and months. Yes there are unscrupulous people in the US military. But then, there are in every walk of life, every occupation. But to see people on this website decide that 2.3 million men and women are all criminal, with nothing on their minds but thoughts of destruction or mayhem is an absolute disservice to the things that they do every day. One person on this website even went so far as to say that military members are in it for personal gain. Wow! Entry level personnel make just under $8.00 an hour assuming a 40 hour work week. Of course, many work much more than 40 hours a week and those in harm's way typically put in 16-18 hour days for months on end. That makes the pay well under minimum wage. So much for personal gain. I beg you, please make yourself familiar with the world around you. Go to a nearby base, get a visitor pass and meet some of the men and women you are so quick to disparage. You would be surprised. The military no longer accepts people in lieu of prison time. They require a minimum of a GED and prefer a high school diploma. The middle ranks are expected to get a minimum of undergraduate degrees and the upper ranks are encouraged to get advanced degrees.
"""
실행결과
출처 코드잇
'Data Analysis > Natural Language Processing(NLP)' 카테고리의 다른 글
문장 토큰화 실습 (0) | 2023.06.09 |
---|---|
문장 토큰화(Sentence Tokenization) (0) | 2023.06.09 |
어간 추출(Stemming) (0) | 2023.06.08 |
정규화(Normalization) (0) | 2023.06.08 |
자연어 전처리 적용 I (0) | 2023.06.08 |