Skip to content

Adarve999/Financial_Sentiment_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial Sentiment Analysis: From Simple to Advanced Methods

Step 1: Exploratory Data Analysis (EDA)

  • Data cleaning (handling missing values, duplicates).
  • Distribution of sentiment classes (positive, neutral, negative).
  • Most frequent words in each category.
  • Word cloud visualization for different sentiment classes.

Step 2: Simple Methods (Baseline Models)

1. Bag-of-Words (BoW) + Logistic Regression / Naïve Bayes

  • Convert text into a frequency matrix using CountVectorizer.
  • Train models:
    • Naïve Bayes classifier
    • Logistic Regression classifier
  • Compare performance using precision, recall, and F1-score.

Step 3: Intermediate Models (Machine Learning & Topic Modeling)

3.1 TF-IDF + Machine Learning Models

  • Use TF-IDF Vectorization to represent text.
  • Train models:
    • Support Vector Machine (SVM)
    • Random Forest
    • Gradient Boosting (XGBoost, LightGBM)
  • Compare performance using precision, recall, and F1-score.

3.2 Dimensionality Reduction with LSA (Latent Semantic Analysis)

  • Apply Truncated SVD on TF-IDF matrix.
  • Reduce feature dimensionality while preserving sentiment-related patterns.
  • Train Logistic Regression, SVM, or Random Forest on the LSA-transformed data.

3.3 Topic Modeling with LDA (Latent Dirichlet Allocation)

  • Use LDA to discover latent topics in financial news titles.
  • Check whether sentiment classes align with discovered topics.
  • Use topics as additional features for sentiment classification.

Step 4: Advanced Models (Deep Learning & Transformers)

4.1 Word Embeddings + Deep Learning

  • Convert text to embeddings (Word2Vec, FastText, or GloVe).
  • Train a LSTM, GRU, or BiLSTM for sentiment classification.

4.2 Transformer-Based Models (State-of-the-Art)

  • Fine-tune a pre-trained financial language model:
    • BERT
    • FinBERT (Financial Sentiment BERT)
    • RoBERTa
  • Use these models for sentiment classification.

Step 5: Evaluation & Insights

  • Compare model performance across:
    • Baseline models (Lexicon, BoW)
    • Machine Learning models (SVM, Random Forest)
    • LSA, LDA-enhanced models
    • Deep Learning & Transformer models
  • Use confusion matrices, ROC curves, and SHAP analysis for feature importance.

Step 6: Generative Modeling for Controlled Text Generation

6.1 Fine-Tuning a Pretrained Language Model (GPT-2)

  • Using our dataset for transfer learning the model learns to generate realistic financial headlines conditioned on the sentiment and instruction.

6.2 Prompt Engineering for Entity-Enforced Generation

  • At inference time, generate headlines with prompts like:
    Sentiment: positive
    Instruction: Generate a financial headline that includes at least one financial entity.
    Headline:
    
  • This guides the model to include companies, indexes, or sectors in its output, addressing issues with empty or generic responses.

6.3 Multiple Generations per Sentiment

  • Generate multiple headlines per sentiment (e.g., 10 positive, 10 negative, 10 neutral) for content diversification.
  • Useful for data augmentation, report writing, or synthetic dataset creation.

About

Esto es una practica que va a tratar sobre análisis de sentimientos de noticias financieras y generación de noticias

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors