Inspiration
Ever since I started really feeling the AGI, I have wondered who, how many people are at the core of this intelligence explosion that is brewing. And I wanted to find the data sources that would allow me to visualize it (arxiv FTW)
What it does
There's a 5 gigabyte JSON bundle that has metadata on all the 2 million plus Arxiv papers over the past 19 years and so the app parses that and correlates them and then assigns an acceleration coefficient.
How I built it
I built it using cursor and I use O3 for the architecting and then my main driver is Gemini 2.5 and then when 2.5 gets stuck I use Claude Sonnet 3.7 to debug. The app is almost entirely Python with a React front end.
Challenges I ran into
Oh sand god, where do I start? First of all, 5GB of JSON is a lot to handle on a 16GB RAM MacBook Pro. Cursor crashed multiple times. But I learned a lot about core parallelization and RAM optimization. So I was able to get the processing run time down from 2 hours plus to about 30 minutes.
I ran into git issues because I'm very new to dev work and I didn't even know how to undo a simple change. The program overwrote my list of acceptable technical terms that I want to index. 2000 plus terms, it replaced it with a list of 5000 terms that included a lot of plain English. Terrible, terrible.
As usual in the end ran into a ton of React and node issues when I was trying to get the front end going. But it's 6:12 now and I'm really trying to finish so bye
Accomplishments that I'm proud of
I'm really proud of the architecting that I did this time. tysm Karpathy. It really gave me a great roadmap that was a lot easier to follow. I did get okay this time and I attempted to follow the roadmap and have the agent constantly edit the roadmap updating our progress and any changes we made along the way due to issues that we ran into. So I'm getting better at tracking things and maintaining my own context as well as the context for the agent.
What I learned
AGI isn't here yet
What's next for AGI VIZ
It would be cool to pull in extra data sources and get the front-end visualization working.
Built With
- and-leva.-everything-runs-offline-on-the-5-gb-arxiv-oai-metadata-snapshot
- and-scikit-learn;-a-zero-db
- built-with-python-3.11-for-the-etl-and-api-(fastapi-+-uvicorn
- docker-optional-back-end;-and-a-vite-/-react-18-front-end-powered-by-react-three-fiber
- drei
- networkx
- orjson
- react-query
- streaming-flat-file-json);-data-wrangling-via-ijson
- tqdm
- webgl-1
- zustand
Log in or sign up for Devpost to join the conversation.