A three-tier distributed web application demonstrating core cloud computing concepts: VM-to-VM networking, remote script execution over SSH, and database-backed caching. A Flask front end on one Ubuntu VM orchestrates a Wikipedia lookup that executes remotely on an Amazon EC2 instance, with a MySQL cache in Docker on a separate VM between them. Repeat queries are served from the cache in a fraction of the time.
| Search form | First search (fetched from Wikipedia) | Repeat search (served from cache) |
|---|---|---|
![]() |
![]() |
![]() |
The Source line on each result page tells you where the answer came from. Searching for Apollo 8 the first time goes all the way out to Wikipedia via Amazon EC2; searching for Apollo 8 a second time comes straight back from the local database cache in a fraction of the time.
| Row stored in MySQL | Cache container running | EC2 instance running |
|---|---|---|
![]() |
![]() |
![]() |
- Takes a search term from a user via a simple web form
- Checks a local database to see if the same question has been asked before
- If not, reaches out to a remote machine running on Amazon Web Services, runs a Wikipedia lookup there, and brings the answer back
- Stores the answer so the next time someone asks the same question, it comes back instantly instead of making the round trip all over again
- Handles errors gracefully so the browser always gets a useful response
flowchart LR
Host["Host Browser<br/>(Windows)"] -->|HTTP :8888| Flask
subgraph FlaskVM["Flask VM (Ubuntu)"]
Flask["Flask app<br/>main.py"]
end
subgraph CacheVM["Cache VM (Ubuntu)"]
MySQL[("MySQL 8<br/>:7888 to :3306<br/>Docker container")]
end
subgraph AWS["Amazon Web Services"]
EC2["EC2 Ubuntu<br/>wiki.py"]
end
Flask -->|PyMySQL :7888| MySQL
Flask -->|Paramiko SSH :22| EC2
EC2 -->|HTTPS| Wikipedia[("Wikipedia API")]
Three independent machines, three network boundaries, one cache-aside read path. The Flask VM is the orchestrator in the middle, the Cache VM is the memory, and the EC2 instance is the hands reaching out to Wikipedia on the open internet.
- You submit a query through the web form at
/. - Flask asks the database: have we seen this question before?
- If yes — the stored answer comes straight back, tagged
Source: Cache. - If no — Flask opens a secure SSH connection to the EC2 instance, runs a small Python script (
wiki.py) out there with your query as an argument, captures whatever it prints, writes the result into the database for next time, and sends it back to you taggedSource: Remote Wikipedia via EC2. - If anything along the way fails, the error is caught and shown on a friendly error page rather than crashing the app.
Deep dive: setup, configuration, troubleshooting
.
├── main.py # Flask app: routes, cache-aside, Paramiko call
├── paramiko_test.py # Standalone SSH smoke test for EC2
├── requirements.txt # Flask, paramiko, pymysql, wikipedia
├── ec2/
│ └── wiki.py # Remote Wikipedia lookup (runs on EC2)
├── db/
│ ├── schema.sql # MySQL table definition (auto-applied on first boot)
│ └── docker-compose.yml # MySQL 8 cache container on port 7888
├── docs/
│ └── screenshots/ # Images used in this README
├── .gitignore
└── README.md
- Host (Windows/macOS/Linux) with a browser
- Flask VM — Ubuntu 24.04, Python 3.12, pip, virtualenv, network access to the Cache VM and to EC2 over SSH
- Cache VM — Ubuntu 24.04, Docker Engine, Docker Compose, host port 7888 reachable from the Flask VM
- EC2 instance — Ubuntu, SSH access,
python3-pip,wikipediapackage, security group allowing inbound SSH from the Flask VM's public address
main.py resolves every configurable value from the environment with a placeholder default, so the committed code is safe to share. Real values are supplied at runtime on the Flask VM.
| Variable | Purpose | Example |
|---|---|---|
EC2_INSTANCE_IP |
Public IP of the EC2 instance | 52.0.0.0 |
EC2_KEY_FILE |
Path to the PEM key on the Flask VM | /home/stephendaly/ct5169-ca1/CT5169.pem |
REMOTE_PYTHON |
Python interpreter path on EC2 | /home/ubuntu/ct5169-wiki/venv/bin/python |
REMOTE_SCRIPT |
Path to wiki.py on EC2 |
/home/ubuntu/ct5169-wiki/wiki.py |
REMOTE_USERNAME |
SSH user on EC2 | ubuntu |
CACHE_DB_HOST |
Cache VM IP reachable from Flask VM | 192.168.29.5 |
CACHE_DB_PORT |
MySQL host port on the Cache VM | 7888 |
CACHE_DB_NAME |
Database name | wiki_cache |
CACHE_DB_USER |
MySQL user | ct5169 |
CACHE_DB_PASSWORD |
MySQL user password | (kept out of version control) |
git clone https://github.com/sdaly-ie/distributed-flask-wiki-cache.git ct5169-ca1
cd ct5169-ca1
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Put the EC2 private key in place and tighten permissions
cp /path/to/CT5169.pem ./CT5169.pem
chmod 400 CT5169.pemcd /path/to/ct5169-ca1/db
# Supply real passwords via a .env file (gitignored)
cat > .env <<EOF
MYSQL_ROOT_PASSWORD=your_strong_root_password
MYSQL_PASSWORD=your_strong_user_password
EOF
# Start the container. schema.sql is auto-applied on first boot.
docker compose up -d
docker compose pssudo apt update
sudo apt install -y python3-venv python3-pip
mkdir -p ~/ct5169-wiki && cd ~/ct5169-wiki
python3 -m venv venv
source venv/bin/activate
pip install wikipedia
# Copy ec2/wiki.py from the repo into ~/ct5169-wiki/wiki.pySecurity group: open inbound SSH (port 22) from the Flask VM's public address only. Don't leave SSH open to 0.0.0.0/0.
On the Flask VM, from the project directory with the virtualenv active:
export EC2_INSTANCE_IP="<real-ec2-ip>"
export EC2_KEY_FILE="/home/stephendaly/ct5169-ca1/CT5169.pem"
export CACHE_DB_HOST="<real-cache-vm-ip>"
export CACHE_DB_PASSWORD="<real-mysql-password>"
python3 main.pyFlask starts on 0.0.0.0:8888. From the host browser:
http://<flask-vm-host-only-ip>:8888/
The host-only adapter IP is typically in the 192.168.x.x range and is visible on the Flask VM via ip -4 addr show.
CREATE DATABASE IF NOT EXISTS wiki_cache
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
USE wiki_cache;
CREATE TABLE IF NOT EXISTS searches (
id INT AUTO_INCREMENT PRIMARY KEY,
query_text VARCHAR(255) NOT NULL UNIQUE,
result_text MEDIUMTEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP
);The UNIQUE constraint on query_text is what makes the INSERT ... ON DUPLICATE KEY UPDATE in save_result_to_cache() a correct upsert. Without it, a repeat search for the same term would raise a duplicate-key error.
Address already in use on Flask startup
A previous Flask process is still bound to the port. Kill it:
pkill -f "python3 main.py"Can't connect to MySQL server on 'YOUR_CACHE_VM_IP'
Environment variables weren't set. Verify with env | grep CACHE_DB_ before launching Flask.
Permission denied (publickey) on SSH to EC2
Check the PEM key file mode is exactly 400:
chmod 400 CT5169.pemConfirm EC2_KEY_FILE points at the correct absolute path.
permission denied while trying to connect to the Docker API
Your user isn't in the docker group on the Cache VM. Either run Docker commands with sudo, or add yourself to the group and log out/in:
sudo usermod -aG docker $USERHost browser can't reach the Flask VM
127.0.0.1 from the host points at the host, not the VM. Use the VM's host-only adapter IP instead (ip -4 addr show on the Flask VM — look for the 192.168.x.x address on enp0s8 or equivalent).
If you're reviewing this for marking or assessment, here's the five-minute tour:
main.py— cache-aside strategy in thesearch()route (thetry:block withget_cached_result()thenfetch_wikipedia_result()thensave_result_to_cache()).db/schema.sql—UNIQUEconstraint onquery_textthat enables the upsert.db/docker-compose.yml— MySQL 8 container on port 7888 withschema.sqlmounted intodocker-entrypoint-initdb.dso the table is auto-created on first boot, and a named volume so cached rows survive container restarts.ec2/wiki.py— handlesDisambiguationErrorandPageErrorso the Flask side sees useful messages rather than unhandled exceptions.paramiko_test.py— standalone SSH smoke test, useful for verifying EC2 connectivity independently of Flask.





