ShaScam

How do we do it?
What do we do?
How big is the problem?
What's wrong?
Current Solutions

Inspiration

We decided to build this project after noticing the increasing number of phishing and spam calls that the elderly and other individuals experience. Many people are unassuming to the tactics used by spam callers, and easily give away information like credit card numbers, social security numbers and personal details that put them at risk of being subject to financial fraud and even physical danger. Furthermore, most advice regarding scam calls and dealing with them are preventative—they focus on learning to avoid callers in large databases of identified numbers, or educating individuals about template tips to follow that change rapidly over time and are far outrun by the pace at which scammers change strategies and implement new scams.

As such, our goal was to build a tool that allows individuals to actively control the processor of engaging with a scam call as opposed to regretting them and being susceptible to new and innovative measures that many scammers employ to catch their next victims. Harnessing the power of LLM inference and real-time data allowed us to do this instantly enough for users to take action against scam callers.

What it does

Our application allows a user to safeguard themselves from being scammed through introducing an intelligent security layer that determines the likelihood that a call is from a scammer as it is occurring. Our interactions are rooted completely in user consent - individuals have full autonomy over their information and are receptive of other's opinions, and can choose to give their Twilio phone number out to individuals not as close to them or marketers and other individuals they may not trust.

All you have to do as a user is install our app and create a Twilio phone number for usage as your proxy, so that ShaScam can lay over any potential unwanted spam calls and warn you before you slip into a scammer's trap with well-founded analysis and steps to engage with the other person on the line.

How we built it

We made use of Twilio's API to create proxy phone numbers for users of our application to give away to individuals not close to them. With user consent, we sat over a call initiated by a potential scammer, rerouting it to the Twilio proxy number owned by the user and transcribing real time audio data into text using Google Cloud's speech to text integration with audio packets streamed from Twilio's API. Next, we progressively passed cohesive 12-20 word chunks of transcribed output as it was streamed to a 13B parameter Llama 2 model for inference about the likelihood of whether a caller over the phone was a scammer or not. Finally, we built out an intuitive, minimalistic interface for users to create a Twilio number to use and receive push notifications from to be alerted about a spam call should they receive it and obtain analysis about why the call is suspicious, as well as how they should respond to maximize their safety and ensure the person on the other end of the line is not malicious.

Challenges we ran into

We wanted to ensure that the accuracy of our model in detecting spam calls would not bias itself towards finding every interaction to be with a scammer. This is why we set out to build or find a dataset of spam calls to use. We found that several researchers had approached this problem, but there was no final solution for where to source datasets for this topic, as they were either hidden, or not up-to-par of the density of data we needed to make inferences and fine-tune an existing LLM to inform those inference.

Initially, we also set out to try to make the experience of using our product as user-friendly as possible, and tried call-forwarding through iOS/Android phones, which seemed like an excellent way to sit over, transcribe and analyze a call. We soon realized, however, that this process reduced autonomy of the users of our product in controlling when their calls were monitored and introduced latency and platform challenges on Android and iOS. We quickly pivoted to creating proxy numbers that users could control use, monitoring and distribution of, thus giving them control of their own security while enabling scam call prevention for those that wish to safeguard themselves more strictly. We even included voice prompts that adhered to regulation regarding user consent of data collection and utilization, using Text-to-Speech through Twilio to confirm user comfort with being recorded over call.

Furthermore, we wanted to finetune the model we were using for LLM inference to reduce generalization errors we noticed with the models we were using being particularly selective and overfitting to the definition and examples we provided of scam calls. We set out to find datasets of normal phone conversations and those recorded from scam calls, and hit a significant obstacle with finding the latter as such data is challenging to find due to individuals not typically being able to record scam calls they are on, and phone calls being difficult to record due to user privacy concerns. While we finally sourced a dataset of chatbot recorded scam calls—the Lenny dataset—and transcribed them, we hit many issues trying to finetune the model with Together API and Monster.API, instead pivoting to using a larger-parameter model that we were able to experiment with and prompt engineer to yield better results than our zero-shot approach with Llama-2.

Accomplishments that we're proud of

We are most proud of the progress we've made with analysis of scam calls using LLMs—no existing solution, per our research, is able to parse real-time audio data into text and monitor calls as they happen, before the danger and intentions of a scammer have already affected a user.

What we learned

The problem we are solving with this project is one that has existed for a long time, but has had suboptimal solutions that leave people unsafe. A key learning we had while ideating for this hackathon was that problems that are long standing and as simple to describe as scam call detection have layers of underlying detail and complexity that can be unpacked to lead to higher levels of security.

Streaming real-time data into an LLM and communicating its continuously generated results with a clean, frictionless interface for our target demographic—senior citizens, who are primarily the most susceptible to these telephone attacks—was a huge lesson and experience in dealing with threading, parallel programming and integrating inference models with the power of real-time, dynamic data as opposed to the static forms each of us have typically used in projects outside of this hackathon.

In particular, dealing with the platform restrictions of calling on iOS vs Android due to tight handles by carriers led us to innovate tremendously with our UI/UX flow and the control flow of our core program. Taking small pivots when we hit roadblocks with instances like call forwarding on iOS and chunking data intelligently to prevent overflowing our inference model with an overwhelming number of requests from streamed, noisy data taught us a lot about dealing with volatile data in-flow and finding technical innovations that move us forward towards our solution quickly.

What's next for Shascam

Alongside providing reasoning for why an attack may potentially be from a scammer, making it possible to categorize different scams experienced by users with respect to recency of certain scripts being experienced or tricks by scammers on call is a goal we have to help slowly educate users about the types of trends they are experiencing and are most susceptible to, introducing a form of personalization of what risks they may be putting themselves in during such conversations.

Another area we want to continue in is finetuning. Specifically, we want to finetune an LLM on clean and diverse data which, due to the time constraints and resource constraints, we weren’t able to create and find during the hackathon.