MoneyLoop

The Loop

Our orders are made in the for-loop of the main.py file. Here the money is made, hence our project's name being MoneyLoop. After each loop we wait for 0.05 seconds before the next one.

Arbitrage

If the same instrument is listed on two different exchanges and the lowest ask price on exchange A is lower than the highest bid price on B, we perform arbitrage. We not only go from A to B, but also from B to A. The following snipped determines the volume of our order:

# Calculate the maximum volume we can trade
max_volume = min(
    book_a.asks[0].volume,
    book_b.bids[0].volume,
    dynamic_volume(potential_profit_per_share),
    POSITION_LIMIT - positions_a,
    POSITION_LIMIT + positions_b,
)

The dynamic volume ensures, that we increase our positions more if there is more to earn per share:

def dynamic_volume(profit_per_stock: float):
    if profit_per_stock < 0.3:
        return 10
    if profit_per_stock < 0.4:
        return 30
    if profit_per_stock < 0.6:
        return 60
    return 70

Now we can submit our orders:

# Trade the maximum volume at the best prices
exchange.insert_order(
    instrument_id=instrument_a.instrument_id,
    price=lowest_ask_a,
    volume=max_volume,
    side="bid",
    order_type="ioc",
)
exchange.insert_order(
    instrument_id=instrument_b.instrument_id,
    price=highest_bid_b,
    volume=max_volume,
    side="ask",
    order_type="ioc",
)

Quoting

We start by grouping instruments that represent the same financial instrument on different exchanges. The theoretical price of the instrument is then calculated as the average of the best bid and ask prices of the two instruments. The spread is calculated as the difference between the best bid and ask prices of the two instruments.

Volatiles

If an instrument is talked about on social media, we classify it as volatile for the next 18 seconds. For a volatile, we store the time until it is released again, the probability of the sentiment being positive, and the impact. During that time, we do the follwing:

overcompensation_amount = int(OVERCOMPENSATION_AMOUNT * impact_factor)

# ...


# we own stock in A and assume it will fall:
if position_a > 0 and prob_pos < 0.45:
    exchange.insert_order(
        instrument_id=instrument_a.instrument_id,
        price=instrument_order_book_a.bids[0].price - instrument_a.tick_size * OFFSET_TICKS,
        volume=position_a + overcompensation_amount,
        side="ask",
        order_type="ioc",
    )
    print(f'Inserted BUY order for {instrument_a.instrument_id} at price {instrument_order_book_a.bids[0].price}')
    time.sleep(0.01)

# we own negative stock A and assume it will increase:
if position_a < 0 and prob_pos > 0.55:
    exchange.insert_order(
        instrument_id=instrument_a.instrument_id,
        price=instrument_order_book_a.asks[0].price + instrument_a.tick_size * OFFSET_TICKS,
        volume=-position_a + overcompensation_amount,
        side="bid",
        order_type="ioc",
    )
    print(f'Inserted BUY order for {instrument_a.instrument_id} at price {instrument_order_book_a.asks[0].price}')
    time.sleep(0.01)

# we own stock in B and assume it will fall:
if position_b > 0 and prob_pos < 0.45:
    exchange.insert_order(
        instrument_id=instrument_b.instrument_id,
        price=instrument_order_book_b.bids[0].price - instrument_a.tick_size * OFFSET_TICKS,
        volume=position_b + overcompensation_amount,
        side="ask",
        order_type="ioc",
    )
    print(f'Inserted BUY order for {instrument_b.instrument_id} at price {instrument_order_book_b.bids[0].price}')
    time.sleep(0.01)

# we own negative stock B and assume it will increase
if position_b < 0 and prob_pos > 0.55:
    exchange.insert_order(
        instrument_id=instrument_b.instrument_id,
        price=instrument_order_book_b.asks[0].price + instrument_a.tick_size * OFFSET_TICKS,
        volume=-position_b + overcompensation_amount,
        side="bid",
        order_type="ioc",
    )
    print(f'Inserted BUY order for {instrument_b.instrument_id} at price {instrument_order_book_b.asks[0].price}')
    time.sleep(0.01)

Impact

Based on the provided training.csv, we calculated how impactful a social media post from a given source will be. For that we grouped the table by the different sources (the @tags) and calculated the absolute average impact they had on the instruments. Then we sorted the table in ascending order and set all zero values to the lowest value above zero, then normalized the entries and stored them in a new csv file impact_per_source.csv. In the money loop, we gather all the relevant news sources and get their impact on the market from the table, for further calcuations, we map the impact to 1 if its in the top 15% of the table, to 0.5 if its in the top 40% and 0.2 otherwise.

Matching Posts and Instruments

To identify financial instruments mentioned in posts, we implemented a two-step process. Initially, we generated TF-IDF vectors for all posts and instruments, supplemented by word count features for each post. For instance, the word count features for the financial instrument 'ING' specifically highlighted the co-occurrence of terms like 'ing' and 'bank'.

Subsequently, we trained a logistic regression model to predict the specific instrument referenced in a given post. To address the imbalance between mentions of various instruments and posts where no instrument was mentioned, our training focused exclusively on posts that referenced an instrument.

In the prediction phase, we applied the trained model to each post. If the model assigned a probability of 0.5 or higher to an instrument, we identified that instrument as being mentioned in the post. In cases where the probability fell below this threshold, we categorized the post under 'no instrument mentioned.' This approach ensures a more precise and balanced identification of financial instruments in textual data.

Sentiment Analysis

To ensure profitable trading decisions, accurate prediction of post sentiment is crucial. A key attribute of our model is its calibration, designed to express high uncertainty in cases where it might be inaccurate. To achieve this, we employed a bootstrap ensemble approach, utilizing ten logistic regression models. Each model in this ensemble was trained on distinct subsets of the training data, combining TF-IDF and word count features to promote diversity. Additionally, we augmented our dataset with various sentiment analysis datasets specifically tailored for financial data. This strategy not only enhances the robustness of our predictions but also significantly improves our model's reliability in sentiment analysis for trading applications.

The highly calibrated sentiment predictions we've developed play a crucial role in assessing the market impact of social media posts. Thanks to the precise calibration of our model, we can make informed decisions about adjusting our stock positions. If a post exhibits positive sentiment, we may consider increasing our investment in the related stock. Conversely, negative sentiment signals might prompt us to reduce or cancel our positions. This approach allows us to strategically respond to the ever-changing sentiments expressed in social media, aligning our investment strategies with real-time market influences.