This repo seeds the reference account dataset used to drive style extraction for automated X content creation.
data/reference_accounts.jsoncontains the reference account set (username, niche, follower count, avg engagement rate, weight).data/reference_tweets.jsoncontains the most recent fetched tweets for style analysis.data/topic_stream.jsonstores raw tweets collected for topic discovery.data/topic_candidates.jsonstores extracted topic candidates prior to validation.data/topic_signals.jsonstores scored topic signals with detailed scoring metadata.data/approved_topics.jsonstores approved topics that pass the scoring threshold.data/recent_topics.jsonstores recently used topics for deduplication.data/topic_discovery_config.jsonholds the topic discovery configuration.data/candidate_viral_tweets.jsonstores raw candidates before engagement normalization filtering.data/viral_tweet_dataset.jsonstores normalized, labeled, high-performing tweets for training.data/viral_tweet_config.jsonholds the viral tweet dataset configuration.data/tweet_patterns.jsonstores extracted structural templates for tweet generation.
The fetcher uses the X API v2. Set X_BEARER_TOKEN and run:
python scripts/fetch_reference_tweets.py --max-results 10 --update-accountsEngagement rate is computed as:
avg_engagement_rate = avg((likes + replies + retweets + quotes) / follower_count) across recent tweets
Accounts stay in pending_engagement_verification until fresh tweets are pulled and rates are computed.
The topic discovery pipeline scans reference accounts and keyword searches to surface early, controversial, or high-engagement discussions. It writes the topic_stream, topic_candidates, topic_signals, and approved_topics datasets.
Run with:
python scripts/topic_discovery.pyYou can adjust keywords, lookback window, signal thresholds, or scoring weights in data/topic_discovery_config.json.
Build the viral tweet dataset from manual CSV input with:
python scripts/build_viral_tweet_dataset.pyThe script reads data/manual_viral_tweets.csv, writes data/candidate_viral_tweets.json, and promotes tweets into data/viral_tweet_dataset.json after normalization.
Fill data/manual_viral_tweets.csv with tweets you collect manually from the X UI and run:
python scripts/build_viral_tweet_dataset.pyRequired columns are text, author_followers, like_count, reply_count, retweet_count, quote_count. Optional columns are tweet_id, url, created_at, author_username, author_id, impression_count, source_type, source_query.
Once the viral dataset is populated, extract reusable structure templates with:
python scripts/build_tweet_patterns.pyRank generated tweet candidates with:
python scripts/rank_tweets.py --top 12