Inspiration
Finra's copious amounts of data.
What it does
Detect firms at potentially high risk for committing fraud. And, tracking people through their positions in different companies to measure the extent to which the likelihood of fraud is to follow these people.
How I built it
We had to parse the enormous dataset using algorithms we designed, and then applied Latent Semantic Analysis and our custom designed graph algorithm to find connections in the dataset to evaluate risk potential.
Challenges I ran into
So much data...
Accomplishments that I'm proud of
Finding meaningful risk-potential of companies, and applying relevant natural language processing to a large data set. Parsing and filtering the copious amounts of data was also a good challenge.
What I learned
We learned about NLP and some of its applications, as well as the importance of accessible data. We worked with how to effectively parse data, and quickly extract important and relevant pieces of information.
What's next for Data Mines
We would like to expand more on our LSA solution to provide a more tangible end result. Then we would like to connect the two solutions, using the results from our LSA to further increase the validity of our graph-based risk analysis in addition to normalizing the results of the risk scores. After that, could work on using more specific timestamps of the employment history data in order to identity connections between groups of employees, and their collective movement over time.
Log in or sign up for Devpost to join the conversation.