실습 설명
주어진 파일(imdb.tsv)의 리뷰 데이터 별 감성 지수를 구하는 함수 swn_polarity()를 작성해 주세요.
imdb.tsv
review
0 "Watching Time Chasers, it obvious that it was made by a bunch of friends. Maybe they were sitting around one day in film school and said, \""Hey, let's pool our money together and make a really bad movie!\"" Or something like that. What ever they said, they still ended up making a really bad movie--dull story, bad script, lame acting, poor cinematography, bottom of the barrel stock music, etc. All corners were cut, except the one that would have prevented this film's release. Life's like that."
1 I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses' home and rapes, tortures and kills various women. It is in black and white but saves the colour for one shocking shot. At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene. Avoid.
2 Minor Spoilers In New York, Joan Barnard (Elvire Audrey) is informed that her husband, the archeologist Arthur Barnard (John Saxon), was mysteriously murdered in Italy while searching an Etruscan tomb. Joan decides to travel to Italy, in the company of her colleague, who offers his support. Once in Italy, she starts having visions relative to an ancient people and maggots, many maggots. After shootings and weird events, Joan realizes that her father is an international drug dealer, there are drugs hidden in the tomb and her colleague is a detective of the narcotic department. The story ends back in New York, when Joan and her colleague decide to get married with each other, in a very romantic end. Yesterday I had the displeasure of wasting my time watching this crap. The story is so absurd, mixing thriller, crime, supernatural and horror (and even a romantic end) in a non-sense way. The acting is the worst possible, highlighting the horrible performance of the beautiful Elvire Audrey. John Saxon just gives his name to the credits and works less than five minutes, when his character is killed. The special effects are limited to maggots everywhere. The direction is ridiculous. I lost a couple of hours of my life watching 'Assassinio al Cimitero Etrusco'. If you have the desire or curiosity of seeing this trash, choose another movie, go to a pizzeria, watch TV, go sleep, navigate in Internet, go to the gym, but do not waste your time like I did. My vote is two. Title (Brazil): 'O Mist챕rio Etrusco' ('The Etruscan Mystery')
3 I went to see this film with a great deal of excitement as I was at school with the director, he was even a good friend of mine for a while. But sorry mate, this film stinks. I can only talk about what was wrong with the first half because that's when I walked out and went to the pub for a much needed drink: 1) someone's standing on a balcony about to jump and so you send a helicopter to shine a searchlight on them??? I don't think so - nothing would make them more likely to jump. 2) local radio doesn't send reporters to cover people about to attempt suicide - again for fear of pressuring them into jumping - or for fear of encouraging copy-cat instances. 3) whatever the circumstances, radio reporters don't do live broadcasts from the 10th floor of a tower block. Radio cars don't carry leads long enough to connect the microphone and headphones to the transmitter. 4) the stuck in the lift scene was utterly derivative 5) the acting and direction was almost non existent.I could go on, but I won't.
4 "Yes, I agree with everyone on this site this movie is VERY VERY bad. To even call this a movie is an insult to all movies ever made. It's 40 minutes long. Someone compares this movie to an after school special. B-I-N-G-O! That describes is perfectly. The packaging for this movie intentionally is misleading. For example, the title of this movie should describe the movie. Rubberface??? That should be the first hint. It was retitled with a new package of some goofy face Jim probably made in his stand-up days. I was hoping for more stand-up from Jim. If you like Jim now as an actor. You would love him in his stand up days. Still trying to locate the Rodney Dangerfield Young Comedians Special from HBO that featured Jim in his early career days. It isn't even mentioned on this site. I'd love to find anything Jim did stand-up wise. Also Jim Carrey is a supporting actor in this movie. The main character is VERY VERY annoying. She is some girl lacking self confidence but yet wants to be a stand up comedian. Jim is there to say lines like \""That's Funny Janet\"" and \""You really are talented\"". And honestly she is terrible really terrible. And the movie is terrible. Beware of false advertising and a really bad movie."
5 "Jennifer Ehle was sparkling in \""Pride and Prejudice.\"" Jeremy Northam was simply wonderful in \""The Winslow Boy.\"" With actors of this caliber, this film had to have a lot going for it. Even those who were critical of the movie spoke of the wonderful sequences involving these two. I was eager to see it. It is with bitter disappointment, however, that I must report that this flick is a piece of trash. The scenes between Ehle and Northam had no depth or tenderness or real passion; they consisted of hackneyed and unsubtle latter-day cinematic lust--voracious open-mouthed kissing and soft-porn humping. Lust can be entertaining if it's done with originality; this was tasteless and awful. Ehle and Northam have sullied their craft; they should be ashamed. As for the modern part of the romance, I was unnerved by the effeminate appearance of the male lead. Aren't there any masculine men left in Hollywood? The plot was kind of interesting; with a better script and a more imaginative director, it might have worked. 1/10"
6 Amy Poehler is a terrific comedian on Saturday Night Live, but her role in this movie doesn't give her anything to work with. Her character, a publisher's representative guiding a new author on a book tour, is mean, not funny. Susan Sarandon plays the author's mother who is involved with the sadistic gym teacher (Billy Bob Thorton) the author had when he was a chubby junior high student. Unfortunately her role doesn't require a talented actress. The funniest thing is the way she looks in the awful gown she wears as the queen of the corn festival. There is no explanation of why the corn queen is old enough to have a grown son. The plot is the stale one of an author who wrote a best selling self-help book and then adopts behavior that contradicts his advice. Still, it is not the worst movie I've ever seen, and I didn't erase it before watching it.
7 "A plane carrying employees of a large biotech firm--including the CEO's daughter--goes down in thick forest in the Pacific Northwest. When the search and rescue mission is called off, the CEO, Harlan Knowles (Lance Henriksen), puts together a small ragtag group to execute their own search and rescue mission. But just what is Knowles searching for and trying to rescue, and just what is following and watching them in the woods? Oy, what a mess this film was! It was a shame, because for one, it stars Lance Henriksen, who is one of my favorite modern genre actors, and two, it could have easily been a decent film. It suffers from two major flaws, and they're probably both writer/director Jonas Quastel's fault--this film (which I'll be calling by its aka of Sasquatch) has just about the worst editing I've ever seen next to Alone in the Dark (2005), and Quastel's constant advice for the cast appears to have been, \""Okay, let's try that again, but this time I want everyone to talk on top of each other, improvise non-sequiturs and generally try to be as annoying as possible\"".The potential was there. Despite the rip-off aspects (any material related to the plane crash was obviously trying to crib The Blair Witch Project (1999) and any material related to the titular monster was cribbing Predator (1987)), Ed Wood-like exposition and ridiculous dialogue, the plot had promise and potential for subtler and far less saccharine subtexts. The monster costume, once we actually get to see it, was more than sufficient for my tastes. The mixture of character types trudging through the woods could have been great if Quastel and fellow writer Chris Lanning would have turned down the stereotype notch from 11 to at least 5 and spent more time exploring their relationships. The monster's \""lair\"" had some nice production design, specifically the corpse decorations ala a more primitive Jeepers Creepers (2001). If it had been edited well, there were some scenes with decent dialogue that could have easily been effective. But the most frightening thing about Sasquatch is the number of missteps made: For some reason, Quastel thinks it's a good idea to chop up dialogue scenes that occur within minutes of each other in real time so that instead we see a few lines of scene A, then a few lines of scene B, then back to A, back to B, and so on. For some reason, he thinks it's a good idea to use frequently use black screens in between snippets of dialogue, whether we need the idea of an unspecified amount of time passing between irrelevant comments or whether the irrelevant comments seem to be occurring one after the other in time anyway. For some reason, he doesn't care whether scenes were shot during the morning, afternoon, middle of the night, etc. He just cuts to them at random. For that matter, the scenes we're shown appear to be selected at random. Important events either never or barely appear, and we're stuck with far too many pointless scenes. For some reason, he left a scene about cave art in the film when it either needs more exposition to justify getting there, or it needs to just be cut out, because it's not that important (the monster's intelligence and \""humanity\"" could have easily been shown in another way). For some reason, there is a whole character--Mary Mancini--left in the script even though she's superfluous. For some reason we suddenly go to a extremely soft-core porno scene, even though the motif is never repeated again. For some reason, characters keep calling Harlan Knowles \""Mr. H\"", like they're stereotypes of Asian domestics. For some reason, Quastel insists on using the \""Blurry Cam\"" and \""Distorto-Cam\"" for the monster attack scenes, even though the costume doesn't look that bad, and it would have been much more effective to put in some fog, a subtle filter, or anything else other than bad cinematography. I could go on, but you get the idea. I really wanted to like this film better than I did혰I'm a Henriksen fan, I'm intrigued by the subject, I loved the setting, I love hiking and this is basically a hiking film on one level--but I just couldn't. Every time I thought it was \""going to be better from this point until the end\"", Quastel made some other awful move. In the end, my score was a 3 out of 10."
8 A well made, gritty science fiction movie, it could be lost among hundreds of other similar movies, but it has several strong points to keep it near the top. For one, the writing and directing is very solid, and it manages for the most part to avoid many sci-fi cliches, though not all of them. It does a good job of keeping you in suspense, and the landscape and look of the movie will appeal to sci-fi fans. If you're looking for a masterpiece, this isn't it. But if you're looking for good old fashioned post-apoc, gritty future in space sci-fi, with good suspense and special effects, then this is the movie for you. Thoroughly enjoyable, and a good ending.
9 "Incredibly dumb and utterly predictable story of a rich teen girl who, not given love by her parents, starts a girl gang. They rob gas stations, rape guys (!!!) and kill a policeman. All the \""teenagers\"" in this film are easily in their late 20s/early 30s, the acting is all horrible and the script has every cliche imaginable with hilarious dialogue--it comes as no surprise that it was written by the immortal Ed Wood Jr.! Worth seeing for laughs. Best lines--\""They're shooting back!\"" and \""It ain't supposed to be like this.\"""
코퍼스의 감성 지수는 단어별 감성 지수의 합으로 구하고, 각 단어의 감성 지수는 긍정 지수에서 부정 지수를 뺀 값으로 계산합니다. 아래와 같은 순서로 실습을 진행해 주세요.
- 단어와 품사 태그를 기반으로 Synsets 구하기
- Synsets의 첫 번째 요소의 이름으로 단일 SentiSynset 구하기
- SentiSynset을 통해 단어의 감성 지수 구하기
- 각 단어의 감성 지수를 더하여 코퍼스의 감성 지수 값 계산해 반환하기
참고로 품사 태깅이 완료된 데이터가 pos_tagged_df로 준비되어 있습니다.
import pandas as pd
import nltk
from nltk.tokenize import sent_tokenize
from preprocess import pos_tagger
df = pd.read_csv('imdb.tsv', delimiter = "\\t")
df['sent_tokens'] = df['review'].apply(sent_tokenize)
df['pos_tagged_tokens'] = df['sent_tokens'].apply(pos_tagger)
df['pos_tagged_tokens']
실습 결과
main.py
import pandas as pd
import nltk
from nltk.tokenize import sent_tokenize
from preprocess import pos_tagger
from preprocess import penn_to_wn
from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('sentiwordnet')
nltk.download('averaged_perceptron_tagger')
df = pd.read_csv('imdb.tsv', delimiter = "\t")
df['sent_tokens'] = df['review'].apply(sent_tokenize)
df['pos_tagged_tokens'] = df['sent_tokens'].apply(pos_tagger)
def swn_polarity(pos_tagged_sents):
# 감성 지수 초기화
sentiment_score = 0
for word, tag in pos_tagged_sents:
# NLTK 기반 품사 태깅을 WordNet기반 품사 태그로 변환
wn_tag = penn_to_wn(tag)
if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV, wn.VERB):
continue
# 단어와 품사 태그를 기반으로 Synsets 구하기
synsets = wn.synsets(word, pos=wn_tag)
if not synsets:
continue
# Synsets의 첫 번째 요소의 이름으로 단일 SentiSynset 구하기
synset = synsets[0]
senti_synset = swn.senti_synset(synset.name())
# SentiSynset을 통해 단어의 감성 지수 구하기
word_senti_score = (senti_synset.pos_score() - senti_synset.neg_score())
# 각 단어의 감성 지수를 더하여 코퍼스의 감성 지수 값 계산해 반환하기
sentiment_score += word_senti_score
return sentiment_score
# 데이터 프레임에 적용
df['swn_sentiment'] = df['pos_tagged_tokens'].apply(swn_polarity)
# 테스트 코드
df
preprocess.py
from collections import Counter
import nltk
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
nltk.download('wordnet')
nltk.download('sentiwordnet')
nltk.download('omw-1.4')
# 등장 빈도 기준 정제 함수
def clean_by_freq(tokenized_words, cut_off_count):
# 파이썬의 Counter 모듈을 통해 단어의 빈도수 카운트하여 단어 집합 생성
vocab = Counter(tokenized_words)
# 빈도수가 cut_off_count 이하인 단어 set 추출
uncommon_words = {key for key, value in vocab.items() if value <= cut_off_count}
# uncommon_words에 포함되지 않는 단어 리스트 생성
cleaned_words = [word for word in tokenized_words if word not in uncommon_words]
return cleaned_words
# 단어 길이 기준 정제 함수
def clean_by_len(tokenized_words, cut_off_length):
cleaned_by_freq_len = []
for word in tokenized_words:
if len(word) > cut_off_length:
cleaned_by_freq_len.append(word)
return cleaned_by_freq_len
# 불용어 제거 함수
def clean_by_stopwords(tokenized_words, stop_words_set):
cleaned_words = []
for word in tokenized_words:
if word not in stop_words_set:
cleaned_words.append(word)
return cleaned_words
# 포터 스테머 어간 추출 함수
def stemming_by_porter(tokenized_words):
porter_stemmer = PorterStemmer()
porter_stemmed_words = []
for word in tokenized_words:
stem = porter_stemmer.stem(word)
porter_stemmed_words.append(stem)
return porter_stemmed_words
# 품사 태깅 함수
def pos_tagger(tokenized_sents):
pos_tagged_words = []
for sentence in tokenized_sents:
# 단어 토큰화
tokenized_words = word_tokenize(sentence)
# 품사 태깅
pos_tagged = pos_tag(tokenized_words)
pos_tagged_words.extend(pos_tagged)
return pos_tagged_words
# Penn Treebank POS Tag를 WordNet POS Tag로 변경
def penn_to_wn(tag):
if tag.startswith('J'):
return wn.ADJ
elif tag.startswith('N'):
return wn.NOUN
elif tag.startswith('R'):
return wn.ADV
elif tag.startswith('V'):
return wn.VERB
# 표제어 추출 함수
def words_lemmatizer(pos_tagged_words):
lemmatizer = WordNetLemmatizer()
lemmatized_words = []
for word, tag in pos_tagged_words:
wn_tag = penn_to_wn(tag)
if wn_tag in (wn.NOUN, wn.ADJ, wn.ADV, wn.VERB):
lemmatized_words.append(lemmatizer.lemmatize(word, wn_tag))
else:
lemmatized_words.append(word)
return lemmatized_words
실행결과
'Data Analysis > Natural Language Processing(NLP)' 카테고리의 다른 글
VADER 감성 분석 실습 (0) | 2023.07.05 |
---|---|
VADER (0) | 2023.07.05 |
감성 분석 결과 확인 (0) | 2023.07.05 |
감성 분석 적용 (0) | 2023.07.05 |
감성 지수 구하기 실습 (0) | 2023.07.05 |