-
-
Ex 1: Asking the chatbot to draft a report for PTB
-
Drag n Drop image into chat
-
Darfted Report
-
Main Agent delegates task SubAgent to email the report
-
Email Sent and Received
-
Chat bot is asked about intracranial hemorrhage and if this is an EMERGENCY.
-
Chat bot catches the intracranial hemorrhage, and calls it an EMERGENCY.
-
Main Agent delegates to subagent to email the report.
-
Email send and received.
Inspiration
I'm a radiologist/Doctor by profession. Social media is full of posts saying that AI will learn radiology and eventually replace radiologists but does it always have to be this way? Is the end game always about AI replacing humans (in all professions)? What if it works the other way around? Can't the "human" learn AI, be empowered by AI, to do more, and be more than what he/she/they/them was/were trained to do? But what would a radiologist want to build? What kind of agentic workflow would I want for myself?
Radiologists are always buried in work but we still want to cater to the questions of our fellow healthworkers. We can't answer them all unfortunately but what if we had a radiology assistant, a chatbot that accepts images from people (fellow doctors, nurses, radtechs, hospital personnel), analyzes the images, searches for differentials if needed, and based on their interactions, the rad assistant can email me accordingly ... To escalate an emergency for example or even draft a preliminary report so all I have to do is check it. Built in google search tool for the assistant can also help include differentials that can be bundled with the emailed draft.
In my country, another problem we face is underdiagnosed pulmonary tuberculosis. An AI radiology assistant that can prescreen chest X-rays and push these to a radiologist's email may help for PTB screening, and triage.
My goals:
- to learn not just vibe coding with google gemini but actual agentic workflow deployment with google-adk
- not just on my local machine but in the cloud from backend to frontEnd
- build a minimum viable product (MVP) for the chatBot.
- Do this all in just 2 weeks (the time I have left for the hackathon)
My problem:
- I had no prior experience in actual agentic workflows (save for the you tube vides and articles I really enjoy watching and reading but never actually doing).
- I had no prior experience in serving large language models (LLMs) in the cloud
- I had only 2 weeks to do this all (the time I have left for the hackathon)
- I'm a doctor by profession. Programming is not second nature to me.
Improbable odds but still made possible with...
- Google gemini, and
- Google adk
What my multiagent chat bot app does.
- You can drag and drop images (screen shotted jpeg) onto the chat and ask the chatbot about it.
- Ask it if it has something pertinent or emergent that a radiologist needs to see.
- If the chat bot thinks so, you have the option for the chatbot to send an email immediately to the radiologist for escalation.
- The chatbot can write a simple FYI message, do a google search for differentials, or even write a sample radiology report, then add all of these to the email (for the radiologist to double check and confirm).
- The chat bot can be hosted in the cloud or locally given the open nature of google-adk. It can use local free models like google gemma (for privacy) or use more powerful models like gemini in the cloud.
How I built it
- I first used google adk to create a main agent called Agent_1. This deligates tasks to subagents.
- The first subagent does email and is called Async_Email_Agent. It uses a gmail mcp (model context protocol) server.
- The second agent is a google_search agent but it is appended as an AgentTool under the first agent. It uses a built in google search tool for gemini.
Challenges I ran into
- It took me over 5 days just figure out how to serve an actual AI model in the cloud using python with actual fastAPI end points that a frontEnd webpage could access. I didn't even know what a fastAPI was until now let alone how to use it. I faced entire blocks of "errors" I probably would have never been able to understand in two weeks. However, putting it all into google gemini, I was able to more quickly figure out what went wrong. The best part of chosing google adk was that I instinctively thought that gemini would know a lot about the adk that it was obviously built around? Amazingly, gemini could give quick to the point answers to questions about google-adk I couldn't find answers to on you tube videos or even the actual docs.
- Some tutorials on the internet about a month ago were already outdated. However, google gemini could still help me figure out the few changes needed in my code base
- With all the time lost to just figuring out the backEnd (again I'm a doctor, not a professional programmer), I had very little time to write the frontEnd. Fortunately, I could vibe code a front end using gemini canvas. I told it to make a webapp chatbot app UX/UI that supports drag and drop (for images) and can communicate with a fastAPI endpoint for google adk. The frontEnd didn't work well at first but with a few vibecoding tweaks, I was able to get it to work with the backEnd.
Accomplishments that I'm proud of
As a doctor (not a native programmer), I successfully built a minimum viable product (MVP) for the chatBot agents working dynamically in just two weeks. This would never have been possible given the time constraint if I didn't have such an extensive framework in google adk as well as google gemini as my coding assist. Don't get me wrong, I still be believe that just as we need professional radiologists for professional medical diagnostic work, I also believe that we need professional programmers for important large scale AI projects. It's just that seeing how agentic workflows are even a possibility for a simple AI enthusiast like me given an AI coding agent like google gemini and google adk is mind blowing.
What I learned
In two weeks I learned the following:
- google adk agents, multi-agents (subagents), tool use, mcp (model context protocol)
- fastAPI servers
- connecting a frontEnd html webpage to a fastAPI server (different origins)
- Figured out how to programmatically generate a "program architecture" using google gemini (yes gemini can vibe code program architectures).
- the pros and cons of running google adk locally and in the cloud, serving free open source models versus google gemini.... gemini just works flawlessly. I had to switch between LLM's for local use since some models were good for vision while others were good for tool use. With gemini, one model does it all.
- Perhaps one of the most important things I learned from this endeavor is that vibe coding is still not enough. I couldn't just vibe code my way to an actual product without actually understanding the code, and what I was doing. AI is not a replacement for actually learning a new skill.
- Finally, about AI taking over, taking our jobs away (in radiology or in any profession)....I believe that if used properly, as a means to boost learning and not a crutch for lazy thinking, AI is not our replacement. AI is not in our way. AI is OUR way in…. To the future.
What's next for Rad_Gemini
So I believe that given a little more time and effort, I could have added more agents to my workflow, given them more tools to use, more functionality, more complexity… But this project is not a testament to complexity. It is a testament to possibility, to sheer potential and viability. The fact that a person like myself was able to build and learn this in just under two weeks with the help of google gemini, and Google Adk …. opens an entire world of opportunity.
Potential TODO List:
- I'd like to improve the UX/UI... I didn't have time for this so close to the end of the hackathon. I'd like to work on a mobile first web app that can leverage the phone's camera for streaming. I might try google stitch.
- I'd like to add more MCP server tools and agents to my team of agents. Populate the team with medical related MCP server tools that focus on sites like PubMed.
- Didn't have time to explore more features from google cloud or vertex AI but I would like to figure this out to give the AI agents more granular control.
- I might consider adding live streaming or even voice input for the chat bot.
- Work on the system prompts for my AI agents for better guard rails when dealing with the "user".
- I could convert the chatbot web app into a reusable web component (I'll ask gemini canvas to do this). I can then use the chat bot in other apps I might consider building.... like a full blown DICOM reader for radiologic images.
- Consider end-to-end encryption for chat conversations.
Built With
- css
- fastapi
- gemini
- gemini-canvas
- google-adk
- html
- javascript
- notebooklm
- python
Log in or sign up for Devpost to join the conversation.