Inspiration

Finra's copious amounts of data.

What it does

Detect firms at potentially high risk for committing fraud. And, tracking people through their positions in different companies to measure the extent to which the likelihood of fraud is to follow these people.

How I built it

We had to parse the enormous dataset using algorithms we designed, and then applied Latent Semantic Analysis and our custom designed graph algorithm to find connections in the dataset to evaluate risk potential.

Challenges I ran into

So much data...

Accomplishments that I'm proud of

Finding meaningful risk-potential of companies, and applying relevant natural language processing to a large data set. Parsing and filtering the copious amounts of data was also a good challenge.

What I learned

We learned about NLP and some of its applications, as well as the importance of accessible data. We worked with how to effectively parse data, and quickly extract important and relevant pieces of information.

What's next for Data Mines

We would like to expand more on our LSA solution to provide a more tangible end result. Then we would like to connect the two solutions, using the results from our LSA to further increase the validity of our graph-based risk analysis in addition to normalizing the results of the risk scores. After that, could work on using more specific timestamps of the employment history data in order to identity connections between groups of employees, and their collective movement over time.

Share this project:

Updates