Data Mines

Graph Visualization
LSA Results

Inspiration

Finra's copious amounts of data.

What it does

Detect firms at potentially high risk for committing fraud. And, tracking people through their positions in different companies to measure the extent to which the likelihood of fraud is to follow these people.

How I built it

We had to parse the enormous dataset using algorithms we designed, and then applied Latent Semantic Analysis and our custom designed graph algorithm to find connections in the dataset to evaluate risk potential.

Challenges I ran into

So much data...

Accomplishments that I'm proud of

Finding meaningful risk-potential of companies, and applying relevant natural language processing to a large data set. Parsing and filtering the copious amounts of data was also a good challenge.

What I learned

We learned about NLP and some of its applications, as well as the importance of accessible data. We worked with how to effectively parse data, and quickly extract important and relevant pieces of information.

What's next for Data Mines

We would like to expand more on our LSA solution to provide a more tangible end result. Then we would like to connect the two solutions, using the results from our LSA to further increase the validity of our graph-based risk analysis in addition to normalizing the results of the risk scores. After that, could work on using more specific timestamps of the employment history data in order to identity connections between groups of employees, and their collective movement over time.

Built With

Submitted to

HackGT 2017
- Winner Best Analysis and Development of Deep Connections With Finra's Dataset

Created by

I worked on solidifying the use of the LSA algorithm and also helped create the graph structure that stored and calculated risk potential.

Sherry Sarkar
Interest in Algorithms, Theoretical Machine Learning
I worked on parsing and figuring out how to filter the large datasets. I also used natural language processing techniques to find potential clusters in one of the datasets.

Daniel Hathcock
I helped to parse the large amount of data, understand our results from various tools like latent semantic analysis, and build the graph displayed representing risky corporations.

Shyamal Patel

Updates

Sherry Sarkar started this project — Oct 15, 2017 08:07 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.