You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Review of a coffee blend A: "Delicate, sweetly spice-toned. The finish consolidates to notes of date and hazelnut with undertones of cedar."
Bold, richly aromatic with a hint of citrus. Dark chocolate, toasted walnut, orange blossom, cedar, and brown sugar in aroma and cup. Brightly sweet with a vibrant acidity; full, velvety mouthfeel. The finish centers on dark chocolate and walnut with a cedar undertone.
Review of coffee blend C:
Lively, tangy with a fruity essence. Raspberry, macadamia, gardenia, bamboo, molasses in aroma and cup. Sweet-tart structure with a brisk acidity; light, silky mouthfeel. The finish is a delightful blend of raspberry and macadamia, complemented by a bamboo note.
Review of coffee blend D:
Robust, earthy with a hint of smokiness. Black currant, hazelnut, hibiscus, oak, treacle in aroma and cup. Deeply sweet with a low acidity; smooth, thick mouthfeel. The finish combines black currant and hazelnut with an oak backdrop.
Review of coffee blend E:
Subtle, delicately spiced with a sweet tone. Date, almond, orchid, cedar, maple syrup in aroma and cup. Sweet-toned structure with gentle, rounded acidity; silky, satiny mouthfeel. The finish is a smooth interplay of date and almond with a cedar undercurrent.
cat<<EOF>src/utils.pyimportpandasaspdimportnumpyasnpimportnltkdefcosine_similarity(a, b):
numerator=np.dot(a, b)
denominator=np.linalg.norm(a) *np.linalg.norm(b)
returnnumerator/denominatordefdownload_nltk_data():
# Check and download the 'punkt' tokenizer modelstry:
nltk.data.find('tokenizers/punkt')
exceptLookupError:
nltk.download('punkt')
# Check and download the 'stopwords' corpustry:
nltk.data.find('corpora/stopwords')
exceptLookupError:
nltk.download('stopwords')
defpreprocess_text(text):
fromnltk.corpusimportstopwordsfromnltk.stemimportPorterStemmerfromnltk.tokenizeimportword_tokenize# Tokenize texttokens=word_tokenize(text)
# Convert to lower casetokens= [
word.lower() forwordintokens
]
# Remove punctuationwords= [
wordforwordintokensifword.isalpha()
]
# Filter out stop wordsstop_words=set(
stopwords.words('english')
)
words= [
wordforwordinwordsifwordnotinstop_words
]
# Stemmingstemmer=PorterStemmer()
stemmed_words= [
stemmer.stem(word) forwordinwords
]
return' '.join(stemmed_words)
EOF
importnltkfromnltk.stemimportPorterStemmerfromnltk.stemimportWordNetLemmatizer# Download necessary NLTK datanltk.download('wordnet')
nltk.download('omw-1.4')
# Initialize stemmer and lemmatizerstemmer=PorterStemmer()
lemmatizer=WordNetLemmatizer()
# List of words to processwords= [
'running',
'runner',
'jumps',
'easily',
'better'
]
# Stemming processstemmed_words= [
stemmer.stem(word) forwordinwords
]
# Lemmatization process with POS specification# (POS: Part of Speech)lemmatized_words= []
forwordinwords:
# Default to nounpos_tag='n'ifwordin ['better']:
# Treat this example as an adjectivepos_tag='a'elifwordin ['running', 'jumps']:
# Treat these examples as verbspos_tag='v'elifwordin ['easily']:
# Treat this example as an adverbpos_tag='r'else:
# Treat all other examples as nounspos_tag='n'lemmatized_word=lemmatizer.lemmatize(
word,
pos=pos_tag
)
lemmatized_words.append(lemmatized_word)
# Print resultsprint("Original: ", words)
print("Stemmed: ", stemmed_words)
print("Lemmatized: ", lemmatized_words)
# Read user inputinput_coffee_name=input("Enter a coffee name: ")
dataset_file_path=os.path.join(
os.path.dirname(__file__),
'data',
'simplified_coffee.csv'
)
# Load the CSV file into a Pandas DataFrame # (only the first 50 rows for this example)df=pd.read_csv(
dataset_file_path,
nrows=50
)
# Preprocess the review textdf['preprocessed_review'] =df['review'].apply(
preprocess_text
)
# The model to usemodel="text-embedding-ada-002"# Get the embeddings for each reviewreview_embeddings= []
forreviewindf['preprocessed_review']:
review_embeddings.append(
get_embedding(
review,
model=model
)
)
# Get the index of the input coffee nametry:
input_coffee_index= \
df[df['name'] == \
input_coffee_name].index[0]
except:
print("Please enter a valid coffee name.")
exit()
# Calculate the cosine similarity between # the input coffee's review and all other reviewssimilarities= []
input_review_embedding= \
review_embeddings[input_coffee_index]
forreview_embeddinginreview_embeddings:
similarity=cosine_similarity(
input_review_embedding,
review_embedding
)
similarities.append(similarity)
# Get the indices of the most similar reviews # (excluding the input coffee's review itself)most_similar_indices= \
np.argsort(similarities)[-6:-1]
# Get the names of the most similar coffeessimilar_coffee_names=df.iloc[most_similar_indices]\
['name']\
.tolist()
# Print the resultsprint(
"The most similar coffees to "f"{input_coffee_name} are:"
)
forcoffee_nameinsimilar_coffee_names:
print(coffee_name)
cat<<EOF>src/app.pyimportosimportpandasaspdimportnumpyasnpfromapiimportget_embeddingfromutilsimport (
cosine_similarity,
download_nltk_data,
preprocess_text
)
# Download necessary NLTK datadownload_nltk_data()
dataset_file_path=os.path.join(
os.path.dirname(__file__),
'data',
'simplified_coffee.csv'
)
# Read user inputinput_coffee_name=input("Enter a coffee name: ")
# Load the CSV file into a Pandas DataFrame# (only the first 50 rows for this example)df=pd.read_csv(
dataset_file_path,
nrows=50
)
# Preprocess the review textdf['preprocessed_review'] =df['review'].apply(
preprocess_text
)
# The model to usemodel="text-embedding-ada-002"# Get the embeddings for each reviewreview_embeddings= []
forreviewindf['preprocessed_review']:
review_embeddings.append(
get_embedding(
review,
model=model
)
)
# Get the index of the input coffee nametry:
input_coffee_index= \
df[df['name'] == \
input_coffee_name].index[0]
except:
print("Please enter a valid coffee name.")
exit()
# Calculate the cosine similarity between# the input coffee's review and all other reviewssimilarities= []
input_review_embedding= \
review_embeddings[input_coffee_index]
forreview_embeddinginreview_embeddings:
similarity=cosine_similarity(
input_review_embedding,
review_embedding
)
similarities.append(similarity)
# Get the indices of the most similar reviews# (excluding the input coffee's review itself)most_similar_indices= \
np.argsort(similarities)[-6:-1]
# Get the names of the most similar coffeessimilar_coffee_names=df.iloc[most_similar_indices]\
['name']\
.tolist()
# Print the resultsprint(
"The most similar coffees to"f"{input_coffee_name} are:"
)
forcoffee_nameinsimilar_coffee_names:
print(coffee_name)
EOF
Enter a coffee name: Organic Ethiopia Kirite
The most similar coffees to Organic Ethiopia Kirite are:
El Peñon Nicaragua
Colombia David Gomez 100% Caturra
Panama Auromar Estate Geisha Peaberry
Ethiopia Yirgacheffe Natural G1
Ethiopia Shakiso Mormora
Creating a "Fuzzier" Search
# get the index of the input coffee nametry:
# search for a coffee name in the dataframe # that looks like the input coffee name input_coffee_index=df[
df['name'].str.contains(
input_coffee_name,
case=False
)
].index[0]
print(
"Found a coffee name that looks like "f"{df.iloc[input_coffee_index]['name']}. ""Using this coffee name instead."
)
except:
print(
"Sorry, we don't have that coffee name in ""our database. Please try again."
)
exit()
# get the index of the input coffee nametry:
# search for all coffee names in the dataframe # that looks like the input coffee name input_coffee_indexes=df[
df['name'].str.contains(
input_coffee_name,
case=False
)
].indexexcept:
print(
"Sorry, we couldn't find any coffee ""with that name."
)
exit()
# get the index of the input coffee nametry:
input_coffee_index=df[
df['name'] ==input_coffee_name
].index[0]
exceptIndexError:
# get the embeddings for each nameprint(
"Searching for a similar coffee name..."
)
name_embeddings= []
fornameindf['name']:
name_embeddings.append(
get_embedding(
name, model=model
)
)
# perform a cosine similarity search # on the input coffee nameinput_coffee_embedding=get_embedding(
input_coffee_name,
model=model
)
_similarities= []
forname_embeddinginname_embeddings:
_similarities.append(
cosine_similarity(
input_coffee_embedding,
name_embedding
)
)
input_coffee_index=_similarities.index(
max(_similarities)
)
except:
print(
"Sorry, we don't have that coffee name ""in our database. Please try again."
)
exit()
cat<<EOF>src/app.pyimportosimportpandasaspdimportnumpyasnpfromapiimportget_embeddingfromutilsimport (
cosine_similarity,
download_nltk_data,
preprocess_text
)
# Download necessary NLTK datadownload_nltk_data()
dataset_file_path=os.path.join(
os.path.dirname(__file__),
'data',
'simplified_coffee.csv'
)
# Read user inputinput_coffee_name=input("Enter a coffee name: ")
# Load the CSV file into a Pandas DataFrame# (only the first 50 rows for this example)df=pd.read_csv(
dataset_file_path,
nrows=50
)
# Preprocess the review textdf['preprocessed_review'] =df['review'].apply(
preprocess_text
)
# The model to usemodel="text-embedding-ada-002"# Get the embeddings for each reviewreview_embeddings= []
forreviewindf['preprocessed_review']:
review_embeddings.append(
get_embedding(
review,
model=model
)
)
# get the index of the input coffee nametry:
input_coffee_index=df[
df['name'] ==input_coffee_name
].index[0]
exceptIndexError:
# get the embeddings for each nameprint(
"Searching for a similar coffee name..."
)
name_embeddings= []
fornameindf['name']:
name_embeddings.append(
get_embedding(
name, model=model
)
)
# perform a cosine similarity search on the input coffee nameinput_coffee_embedding=get_embedding(
input_coffee_name,
model=model
)
_similarities= []
forname_embeddinginname_embeddings:
_similarities.append(
cosine_similarity(
input_coffee_embedding,
name_embedding
)
)
input_coffee_index=_similarities.index(
max(_similarities)
)
except:
print(
"Sorry, we don't have that coffee name ""in our database. Please try again."
)
exit()
# Calculate the cosine similarity between# the input coffee's review and all other reviewssimilarities= []
input_review_embedding= \
review_embeddings[input_coffee_index]
forreview_embeddinginreview_embeddings:
similarity=cosine_similarity(
input_review_embedding,
review_embedding
)
similarities.append(similarity)
# Get the indices of the most similar reviews# (excluding the input coffee's review itself)most_similar_indices= \
np.argsort(similarities)[-6:-1]
# Get the names of the most similar coffeessimilar_coffee_names=df.iloc[most_similar_indices]\
['name']\
.tolist()
# Print the resultsprint(
"The most similar coffees to "f"{input_coffee_name} are:"
)
forcoffee_nameinsimilar_coffee_names:
print(coffee_name)
EOF
Enter a coffee name: Ethiopian Kirite
Searching for a similar coffee name...
The most similar coffees to "Ethiopian Kirite" are:
El Peñon Nicaragua
Colombia David Gomez 100% Caturra
Panama Auromar Estate Geisha Peaberry
Ethiopia Yirgacheffe Natural G1
Ethiopia Shakiso Mormora
Predicting News Category: Zero-Shot Classification with Embeddings
# Define a function to classify a sentencedefclassify_sentence(sentence, model):
# Get the embedding of the sentencesentence_embedding=get_embedding(
sentence,
model=model
)
# Calculate the similarity score # between the sentence and each categorysimilarity_scores= {}
forcategoryincategories:
category_embeddings=get_embedding(
category,
model=model
)
similarity_scores[
category
] =cosine_similarity(
sentence_embedding,
category_embeddings
)
# Return the category with the highest # similarity scorereturnmax(
similarity_scores,
key=similarity_scores.get
)
# Classify a sentencesentences= [
"1 dead and 3 injured in El Paso, ""Texas, mall shooting",
"Director Owen Kline Calls ""Funny Pages His ‘Self-Critical’ Debut",
"15 spring break ideas for families ""that want to get away",
"The US is preparing to send ""more troops to the Middle East",
"Bruce Willis' 'condition has progressed' ""to frontotemporal dementia, his family ""says",
"Get an inside look at Universal’s ""new Super Nintendo World",
"Barcelona 2-2 Manchester United: ""Marcus Rashford shines but ""Raphinha salvages draw for hosts",
"Chicago bulls win the NBA championship",
"The new iPhone 12 is now available",
"Scientists discover a new dinosaur ""species",
"The new coronavirus vaccine is now ""available",
"The new Star Wars movie is now ""available",
"Amazon stock hits a new record high",
]
model="text-embedding-ada-002"forsentenceinsentences:
category=classify_sentence(
sentence,
model=model
)
print(f"'{sentence[:50]}..' => {category}")
print()
cat<<EOF>src/app.pyfromapiimportget_embeddingfromutilsimportcosine_similaritycategories= [
'U.S. NEWS',
'COMEDY',
'PARENTING',
'WORLD NEWS',
'CULTURE & ARTS',
'TECH',
'SPORTS'
]
# Define a function to classify a sentencedefclassify_sentence(sentence, model):
# Get the embedding of the sentencesentence_embedding=get_embedding(
sentence,
model=model
)
# Calculate the similarity score # between the sentence and each categorysimilarity_scores= {}
forcategoryincategories:
category_embeddings=get_embedding(
category,
model=model
)
similarity_scores[
category
] =cosine_similarity(
sentence_embedding,
category_embeddings
)
# Return the category with the highest # similarity scorereturnmax(
similarity_scores,
key=similarity_scores.get
)
# Classify a sentencesentences= [
"1 dead and 3 injured in El Paso, ""Texas, mall shooting",
"Director Owen Kline Calls ""Funny Pages His ‘Self-Critical’ Debut",
"15 spring break ideas for families ""that want to get away",
"The US is preparing to send ""more troops to the Middle East",
"Bruce Willis' 'condition has progressed' ""to frontotemporal dementia, his family ""says",
"Get an inside look at Universal’s ""new Super Nintendo World",
"Barcelona 2-2 Manchester United: ""Marcus Rashford shines but ""Raphinha salvages draw for hosts",
"Chicago bulls win the NBA championship",
"The new iPhone 12 is now available",
"Scientists discover a new dinosaur ""species",
"The new coronavirus vaccine is now ""available",
"The new Star Wars movie is now ""available",
"Amazon stock hits a new record high",
]
model="text-embedding-ada-002"forsentenceinsentences:
category=classify_sentence(
sentence,
model=model
)
print(f"'{sentence[:50]}..' => {category}")
print()
EOF
'1 dead and 3 injured in El Paso..'categoryis=>WORLDNEWS'Director Owen Kline Calls Funny..'categoryis=>COMEDY'15 spring break ideas for families..'categoryis=>PARENTING'The US is preparing to send more troops..'categoryis=>WORLDNEWS'Bruce Willis''condition has progressed'.. categoryis=>WORLDNEWS'Get an inside look at Universal’s new..'categoryis=>WORLDNEWS'Barcelona 2-2 Manchester United: Marcus..'categoryis=>SPORTS'Chicago bulls win the NBA championship..'categoryis=>SPORTS'The new iPhone 12 is now available..'categoryis=>TECH'Scientists discover a new dinosaur..'categoryis=>WORLDNEWS'The new coronavirus vaccine is now..'categoryis=>WORLDNEWS'The new Star Wars movie is now..'categoryis=>WORLDNEWS'Amazon stock hits a new record..'categoryis=>WORLDNEWS