Sefik Ilkin Serengil

Introducing Brand New Face Recognition in DeepFace

Sefik Serengil — Thu, 01 Jan 2026 15:37:35 +0000

As humans, recognizing a person is often similar to matching someone we see with the people we already know or meet, and keep in our mental database. In computer vision, however, face recognition has a much more precise definition. From an academic perspective, face recognition is fundamentally a face verification problem: given a pair of face images, the task is to classify whether they belong to the same person or to two different individuals. Almost all face recognition models are trained and evaluated under this formulation, using benchmark datasets such as LFW, which consist of face pairs labeled as same or different. This is also the principle behind systems we often use in daily life. For example, Face ID on smartphones verifies whether the face captured by the front camera belongs to the same person as the one previously enrolled on the device.

Woman in Beige Shirt and Blue Denim Jeans Standing in Front of Books by pexels

There is, however, another closely related but practically different problem: searching for a person within a crowd. This is the scenario we usually mean in real-world face recognition applications. For example, in a forensic or surveillance setting, individuals appearing in CCTV footage are searched against a database of known or wanted people. Conceptually, this problem can be reduced to applying face verification multiple times: the query face is compared against every identity stored in the database.

In this post, we will focus on how face recognition has been handled in DeepFace so far, and why this approach needed to evolve for large-scale and production use cases. We will first briefly revisit the traditional, directory-based find function and its limitations, especially in stateful and API-driven environments. We will then introduce the new register, build_index, and search functions added in DeepFace v0.9.7, and explain how they enable scalable, stateless face search backed by databases and approximate nearest neighbor indexing.

How Face Recognition Has Been Handled So Far

In DeepFace, face verification is handled by the verify function. If you want to know whether two images belong to the same person, you simply pass the image pair to this function and receive a similarity decision.

result: dict = DeepFace.verify(
    img1_path = "img1.jpg",
    img2_path = "img2.jpg",
)

For face recognition (i.e., searching within a dataset), DeepFace traditionally provided the find function. With find, you pass a target image, and a directory containing reference images.

dfs: List[pd.DataFrame] = DeepFace.find(
    img_path = "img1.jpg",
    db_path = "C:/my_db",
)

DeepFace extracts embeddings for images in that directory and stores them on disk in pickle format. Once embeddings are computed, subsequent searches are fast, and only newly added or removed images are processed in later runs — not the entire dataset from scratch.

First run of find function takes minutes

Embeddings are stored in a pickle in the same folder after 1st run

Find function performs much faster from its second run

However, there was an important limitation. DeepFace ships not only as a Python library but also with a REST API. While the API exposes functions such as verify, analyze, and represent, the find function could not be exposed because it is stateful by design.

Introducing brand new register and search

Starting from DeepFace v0.9.7, three new core functions were introduced:

register
build_index
search

These functions enable a stateless and scalable face recognition pipeline. With register, you can extract face embeddings and store them directly in a database. With search, you can query a target image against the stored embeddings. In other words, the stateful find function is now complemented by a stateless search alternative, making large-scale deployments much easier. Also, you will always get the response from 1st request unsimilar to find function.

# register images into the database
DeepFace.register(
    img = "img1.jpg"
)
DeepFace.register(
    img = ["img2.jpg", "img3.jpg"]
)

# perform exact search
dfs: List[pd.DataFrame] = DeepFace.search(
    img = "target.jpg"
)

Currently supported backend databases are:

Postgres (default)
Mongo
Weaviate
Neo4j
Pgvector

We will add more backends in the short term but the principle will keep same. The default backend database is set to postgres. In other words, if you haven’t specified anything, then it will store embeddings in postgres. However, you can set database type in input arguments of these functions.

# register images into the database
DeepFace.register(
    img = "img1.jpg",
    database_type = "mongo",
)

# perform exact search
dfs: List[pd.DataFrame] = DeepFace.search(
    img = "target.jpg",
    database_type = "mongo",
)

Result of exact nearest neighbour

Wise Search

If your database size is n, performing a brute-force search has a time complexity of O(n). This is manageable for small datasets (n ~ 10K), but as computer scientists, we usually try to avoid O(n) solutions when n grows large. For databases containing millions or even billions of embeddings, brute-force search quickly becomes impractical.

The original find function relied on brute-force search, and by default, the new search function behaves the same way. However, search introduces a powerful option: Approximate Nearest Neighbor (ANN) search.

By configuring the search_type argument, you can switch from exact brute-force search to ANN search, reducing the complexity from O(n) to approximately O(log n). This allows searches to complete in seconds even when the database contains tens of millions of faces.

# perform approximate neartest neighbour search
dfs: List[pd.DataFrame] = DeepFace.search(
    img = "target.jpg",
    search_type = "ann"
)

Result of approximate nearest neighbour

Indexing Embeddings and Vector Databases

For ANN search, indexing is required:

If you use Postgres or Mongo as the backend, embeddings are stored in those databases. You need to call build_index to index embeddings. This will use FAISS in the background. Then, index will be stored in database as well.
build_index function is resumable, should be run again whenever new identities are added to the db.
If you use Weaviate, Neo4j, pgvector which are vector databases, indexing is handled internally, and calling build_index is not necessary.

# build index on registered embeddings (for postgres and mongo only)
DeepFace.build_index()

As a rule of thumb:

Tens of thousands of embeddings → exact (brute-force) search is acceptable
Millions of embeddings → ANN search with FAISS is recommended
Tens of millions and beyond → FAISS may require GPU acceleration
10M+ scale → a vector database becomes the most suitable choice

API Support

Because register, build_index, and search are stateless, they can be safely exposed through the DeepFace REST API. This makes it possible to build scalable face recognition services without relying on local directories, pickle files, or in-memory state — a key requirement for production-grade systems.

# register facial images and embeddings to db
$ curl -X POST http://localhost:5005/register \
   -H "Content-Type: application/json" \
   -d '{"model_name":"Facenet", "img":"img1.jpg"}'

# index embeddings (for postgres and mongo only)
$ curl -X POST http://localhost:5005/build/index \
   -H "Content-Type: application/json" \
   -d '{"model_name":"Facenet"}'

# search an identity in database
$ curl -X POST http://localhost:5005/search \
   -H "Content-Type: application/json" \
   -d '{"img":"target.jpg", "model_name":"Facenet"}'

Conclusion

Face recognition in DeepFace has evolved from a directory-based, stateful approach toward a more scalable and production-ready architecture. While face verification remains the core academic problem, real-world face recognition requires efficient search across large collections of identities. With the introduction of register, build_index, and search, DeepFace now supports stateless face search backed by databases and approximate nearest neighbor indexing. This shift enables DeepFace to scale from small, local datasets to millions of faces while remaining compatible with REST-based deployments. As a result, DeepFace can now serve both research-oriented use cases and large-scale, real-world face recognition systems more effectively.

The post Introducing Brand New Face Recognition in DeepFace appeared first on Sefik Ilkin Serengil.

Using Peer Reviews as Evidence in US & UK Extraordinary Talent Visas

Sefik Serengil — Sun, 19 Oct 2025 13:09:42 +0000

When applying for extraordinary talent visas such as the US EB-1A, EB-2 NIW, or the UK Global Talent Visa, applicants often focus on their publications, citations, and awards. However, one of the most underrated yet powerful forms of evidence is your peer review activity — serving as a reviewer or referee for academic journals, conferences, or research funding programs. Peer reviewing is not just an act of academic service. It is a clear sign that your expertise is trusted by your peers and that recognized international institutions value your judgment. Immigration and endorsement bodies see this as objective proof of extraordinary ability and professional recognition within your field. In this article, we will explore how to effectively present your peer review work as evidence in both US and UK visa frameworks, how to verify your reviews on Web of Science or ORCID, and how to strengthen your case by documenting the journals’ indexing information (e.g. SCI-E).

Woman Referee by Unsplash

Peer Review as Evidence in US EB-1A and EB-2 NIW

In the EB-1A (Extraordinary Ability) visa, the U.S. Citizenship and Immigration Services (USCIS) outlines ten criteria for eligibility. Peer review activities fall directly under Criterion 4:

The person’s participation, either individually or on a panel, as a judge of the work of others in the same or an allied field of specialization for which classification is sought.

USCIS explicitly mentions examples such as:

Reviewing manuscripts for scholarly journals
Evaluating abstracts or papers for academic conferences
Serving on Ph.D. dissertation committees
Acting as a reviewer for government-funded research programs

To satisfy this criterion, the petitioner must demonstrate both an invitation to review and proof that the review was actually completed — typically through invitation and confirmation emails.

Peer Review in the UK Global Talent Visa

The UK Global Talent Visa also recognizes peer review experience as a sign of international recognition and contribution to the field. While it is not listed as a formal criterion like in the EB-1A, it fits perfectly under the following endorsement categories:

Recognition for expertise and contribution to the field
Evidence of exceptional talent
You’ve contributed to the digital technology sector outside of work, for example mentoring or collaborative projects
Recognition beyond immediate employment

Peer review evidence demonstrates that your expertise is valued by leading journals and organizations, which strengthens your case before endorsement bodies such as the Royal Society, Royal Academy of Engineering, or Tech Nation.

You Don’t Have to Be an Academic to Be a Reviewer

Many assume that only professors or Ph.D. holders can be invited to review academic papers. In reality, any professional with recognized publications or domain expertise can serve as a reviewer.

For instance, although I hold a Master’s degree, I have published several peer-reviewed papers. Because of this, I occasionally receive review invitations from international journals without holding a Ph.D. or academic position.

How to Become and Verify Your Peer Review Activity

If you want to make your peer review work verifiable and credible for immigration or endorsement purposes, here’s a step-by-step guide.

1. Proof of Invitations and Completion

When you complete a review, journals often send confirmation or thank-you emails.
Keep these emails as documentary proof. We will use these when we create verified peer reviews.

Thanks mail coming from journal

2. Verify Reviews on Web of Science

You can manually add completed reviews to your Web of Science profile:

Go to your WoS profile’s Peer Review section.
If you don’t have any verified peer reviews yet, you will see an “Add a Review” button. If you already have verified peer reviews, click “Manage” first, and then select “Add a Review”.
Enter the journal name, review date and manuscript title and manuscript ID you recently reviewed for.

Adding a review

Thereafter, WoS will assign a Review ID to your entry.

Unverified peer review added

Once you added your peer review to WoS, then it will have unverified status. Unverified ones will not be shown in your WoS profile. You must verify them to be listed in your public WoS profile. Unverified ones will only be shown to you.

Unverified peer review added

You need to forward the journal’s thank-you email to reviews at webofscice dot com, including the WoS Review ID in the email body as

Forwarding thank you mail to WoS with WoS Review ID

Verification will take some time depending on the current queue. You will firstly get an email the received confirmation and this will include approximate processing time.

Received confirmation mail

Thereafter, you will be informed by email when your peer review is verified.

Sent notification when your peer review is verified

After verification, the review will appear as “Verified” on your public profile. You can check out my WoS here.

Peer reviews are shown publicly in my WoS profile

You can then show your Web of Science profile with its peer review section in your EB-1A, EB-2 NIW, or UK Global Talent Visa application. Endorsement bodies can independently confirm your record via Web of Science.

Verified Reviews with ORCID

Some publishers (like Springer) integrate directly with ORCID. Once authorized, your completed reviews can automatically appear on your ORCID profile as trusted, verifiable entries. But this action will be done not instantly.

ORCID Reviewer Recognition

On the other hand, some publishers like IEEE Access pushes your peer reviews to Web of Science directly instead of ORCID.

IEEE Access pushes peer reviews to WoS

No matter how your peer reviews be added to WoS. It can be pushed by a publisher or you can add it manually and send them thank you email. Once your peer reviews are verified on WoS, you can export them to ORCID via: Profile Settings > ORCID Syncing > Export Peer Reviews to ORCID now.

Exporting peer reviews to ORCID manually

Then, your ORCID profile will display your verified peer reviews.

Although I authorized Springer to add my peer reviews to ORCID and also manually triggered the “Sync Peer Reviews from WoS to ORCID Now” button, my peer reviews have not appeared on ORCID. I had to raise a ticket to Clarivate under Global Customer Support Center > Product or Technical Question. They exported my peer reviews to ORCID by using their internal tool.

You can check out my ORCID profile here to see how peer reviews are seen on ORCID.

My Verified Peer Reviews Listed on ORCID

Similar to WoS, ORCID will show only verified peer review retrieved from trusted sources. ORCID additionally shows review dates of your peer reviews, and more information about journal like ISSN.

How to Present Peer Review Evidence in Your Application

When preparing your evidence portfolio, include the following:

A summary table listing your completed reviews (journal name, date, and topic or manuscript title, and indexing information such as SCI, SCI-E or ESCI).
Screenshots of verified reviews from Web of Science or ORCID.

If you have verified reviews on your WoS or ORCID profiles, then you don’t have to add invitation or completions emails from journal.

Some journals such as Elsevier provide a cool certificate for your peer reviews.

Exporting Peer Review Certificates from Elsevier Reviewer Hub

A Review Certificate

How to Be a Reviewer

You can create a Web of Science (WoS) account. Here, you can also set your reviewer interests.
This helps editors discover you when they need reviewers in your area.

Peer Review Interests on WoS

Moreover, if you create an Elsevier reviewer hub account, then you can specify the journals up to 20 you want to review.

Volunteering to reviewer in Elsevier

Additionally, you can contact editors of any journal you want to become a reviewer.

Why Peer Review Matters

Peer reviewing is one of the most direct and objective indicators of expert recognition.
It shows that your peers and international institutions trust your judgment and expertise — exactly the kind of evidence immigration authorities look for when assessing extraordinary talent or ability.

Whether you are pursuing the US EB-1A / EB-2 NIW or the UK Global Talent Visa, maintaining verified records of your reviews on Web of Science and ORCID will significantly strengthen your case.

While putting the article together, it became clear that completed peer reviews, although a strong supporting evidence, are not sufficient on their own to secure an EB-1A, EB-2 NIW, or UK Global Talent Visa.

Conclusion

Peer review activity is much more than a professional courtesy — it’s a tangible indicator that your expertise is trusted and recognized by the international academic community. For immigration and endorsement bodies, such as USCIS or the UK Home Office, this trust directly translates into evidence of extraordinary ability and global recognition.

By maintaining verified records of your reviews on Web of Science or ORCID, and by clearly documenting the journals’ indexing information (SCI, SCIE, ESCI, Scopus), you can transform your academic service into a strong, verifiable component of your EB-1A, EB-2 NIW, or UK Global Talent Visa application.

Ultimately, every peer review you complete is not just a contribution to science — it’s also a testament to your standing as an expert in your field.

The post Using Peer Reviews as Evidence in US & UK Extraordinary Talent Visas appeared first on Sefik Ilkin Serengil.

7 Myths About the UK Global Talent and US Extraordinary Ability Visas

Sefik Serengil — Sat, 11 Oct 2025 16:05:58 +0000

Applying for a talent-based visa can be confusing. Whether you’re thinking about the UK Global Talent Visa or the US Extraordinary Ability Visas (like EB1-A or EB2-NIW), there’s a lot of information online — and not all of it is true. Many applicants hear things that sound convincing but are actually myths. In this post, we’ll go through seven of the most common misconceptions about these visas and explain what’s really true. If you’re planning to apply, this guide will help you focus on what actually matters and feel more confident about your application journey.

Albert Einstein showing the tongue artwork on the wall from unsplash

Myth 1: You need to be a genius like Einstein

Many people think you need to be a genius — someone like Albert Einstein — to qualify for these talent-based visas. That’s not true at all.

In fact, the US EB1-A visa is often nicknamed the “Einstein visa.”, but this doesn’t mean you have to be a genius like Einstein himself. The nickname simply highlights that it’s meant for people who are extraordinary in their field — whether it’s science, arts, business, or technology.

The UK Global Talent Visa follows the same idea. You don’t need to be world-famous or the absolute best in history — you just need to show that you have exceptional talent or achievements and are recognized for them in your area.

In short, being truly outstanding is enough. You don’t have to be Einstein-level famous to qualify.

Myth 2: You must work for a top-tier company

Many people believe that to qualify for a talent-based visa, you must work for a top-tier company — for example, one of the MAG 7 (Meta, Apple, Google, Amazon, Microsoft, NVIDIA, or Tesla).

This is a common misunderstanding. While it’s true that many applicants come from these big tech companies, the approval rate among them is not higher than others. In fact, the rejection rate is also quite high for MAG 7 employees.

Working for a well-known brand does not increase your chances of getting endorsed or approved. There’s no correlation between being employed at a MAG 7 company and receiving a positive endorsement for the UK Global Talent Visa or approval for the US EB1-A.

What truly matters is your personal achievements, recognition, and impact — not your employer’s name. You can work at a startup, a mid-size company, or even as an independent professional. As long as your work demonstrates exceptional ability and influence in your field, you can still qualify.

Myth 3: A high salary is required

Some people assume that earning a high salary is essential to qualify for the visa. While a strong income can help support your case, it’s not a requirement.

A high salary might show that your skills are valued in your field, but it’s only one piece of evidence—and not the most important one.

In fact, someone with a modest salary but significant contributions, leadership, or innovation in their area might have a stronger application than someone earning a large income. The focus is always on talent and impact, not paychecks.

Myth 4: You need a Nobel Prize

This one sounds funny, but it’s something many people wonder about.

Yes, having a Nobel Prize automatically makes you eligible for the UK Global Talent Visa without endorsement. However, the vast majority of successful applicants don’t have global awards like that. Instead, they go through an endorsement process by an approved body in their field (such as Tech Nation for digital technology).

So you can absolutely qualify by showing strong evidence of your achievements, recognition, and contributions in your field.

Myth 5: A PhD is required

Having a PhD can be helpful in some fields, especially in academia or research, but it’s not a universal requirement.

If you’re applying in areas like digital technology, arts or culture, you don’t need a PhD at all. Even if you have one, your PhD itself cannot be used as evidence unless it directly shows your achievements or impact.

The visa focuses on what you’ve done and how your work has been recognized—not the degree you hold.

Myth 6: You must have graduated from a top university

Many applicants believe that you must graduate from a prestigious university, like Oxford or Cambridge, to qualify. That’s another myth.

The Global Talent Visa doesn’t consider your university’s ranking. You don’t even need to include your degree in your evidence package.

In fact, there’s a different visa route for top university graduates—it’s called the High Potential Individual (HPI) visa.

For the Global Talent Visa, your university name doesn’t matter. What matters are your professional achievements, the recognition you’ve received, and your influence in your field.

Myth 7: Your major or department must be prestigious

Similar to the previous myth, some people think that their university department must be famous or top-ranked. Again, this is not true.

You don’t need to have studied in a “prestigious” department. You don’t even need a degree in the same field you’re applying for. For example, if you’re now a successful software engineer, you don’t have to hold a degree in Computer Science.

The visa is about your real-world achievements and impact, not your academic background or the reputation of your department.

Conclusion

Applying for a talent-based visa can feel overwhelming, especially with so many myths and mixed opinions online. But as you can see, success doesn’t depend on working for a big company, having a high salary, or graduating from a top-ranked school.

What really matters — for both the UK Global Talent Visa and the US Extraordinary Ability Visas (EB1-A and EB2-NIW) — is your personal achievements, recognition, and impact in your field.

If you’ve built a strong record of contributions, led meaningful work, or influenced your industry in a positive way, you already have a solid foundation. Focus on collecting clear evidence of your success and telling your professional story with confidence.

Remember, these visas are not about luck or prestige — they’re about proven talent and potential

Need a Coach For Your UK Global Talent Visa?

Are you in digital technology field planning to apply for the UK Global Talent Visa? I offer one-on-one 60-minute coaching sessions designed to help you understand your eligibility based on your skills, experience, and achievements.

During our session, we’ll review your background in detail and discuss how your strengths align with the endorsement criteria.
Please note that this service does not include writing your evidence documents or managing your visa application.
I am not an immigration lawyer or adviser, and I am not affiliated with Tech Nation or any official endorsing body.

My coaching focuses only on Stage 1 (endorsement). It’s a guidance session to help you build clarity and confidence — not a guarantee of endorsement, but a roadmap to make your application stronger.

The post 7 Myths About the UK Global Talent and US Extraordinary Ability Visas appeared first on Sefik Ilkin Serengil.

A Practical Guide to Graph Traversal in Data Structures and Algorithms

Sefik Serengil — Tue, 16 Sep 2025 11:43:30 +0000

Graphs are a fundamental data structure in computer science, used to model relationships between objects in a wide variety of applications—from social networks and transportation systems to recommendation engines and network routing. Understanding how to navigate and traverse graphs is a crucial skill, not only for solving real-world problems but also for acing data structure and algorithm interviews. In this guide, we’ll explore two core graph traversal techniques: Depth-First Search (DFS) and Breadth-First Search (BFS). We’ll implement them in Python to understand how they work under the hood. Later, we’ll take a step further and see how graph traversal can be performed efficiently in a database context using Neo4j, where a single query can return the shortest path between nodes.

Abstract Geometric Structure with Dark Polygon Design by pexels

Graph Structure

To make our discussion concrete, let’s consider a simple graph of cities. Each city is a node, and the roads connecting them are edges. Here’s an example in Python using a dictionary to represent the graph:

graph = {
    'Chicago': ['Denver', 'Dallas'],
    'Denver': ['Chicago', 'New York', 'Houston'],
    'Dallas': ['Chicago', 'Miami'],
    'New York': ['Denver', 'Los Angeles', 'Houston'],
    'Houston': ['Denver'],
    'Miami': ['Dallas', 'Los Angeles'],
    'Los Angeles': ['New York', 'Miami'],
    'Seattle': ['Portland', 'San Francisco'],
}

In this representation, Each key is a city (node). The corresponding list contains all neighboring cities directly connected to it. For example, Chicago is connected to Denver and Dallas.

Finding Whether a Path Exists

Depth-First Search (DFS) is a traversal technique that explores a graph as far as possible along each branch before backtracking. This makes it useful for finding a path from a start node to a target node, but it doesn’t guarantee the shortest path.

Here’s a Python implementation of DFS that finds a path from one city to another:

def find_path(start_city, end_city, visited=None, path=None):
    if visited is None:
        visited = set()
    if path is None:
        path = [start_city]

    visited.add(start_city)

    if start_city == end_city:
        return path
   
    if start_city not in graph:
        return None

    for neighbor in graph.get(start_city):
        if neighbor not in visited:
            # print(f"{path} -> {neighbor}")
            new_path = find_path(neighbor, end_city, visited, path + [neighbor])
            if new_path:
                return new_path
    # print("no way out")
    return None

Let’s explain how it works. We start from the start_city and mark it as visited. Recursively, we visit each neighboring city that hasn’t been visited yet. If we reach the end_city, we return the current path. Otherwise, we backtrack and continue exploring other neighbors.

For example, suppose we want to check if there is a path from New York to Houston.

start_city = 'New York'
end_city = 'Houston'
path = find_path(start_city, end_city)

Using DFS, our function will return:

['New York', 'Denver', 'Houston']

Even though there is a direct connection between New York and Houston, DFS explores the neighbors in the order they appear and stops at the first complete path it finds.

This demonstrates one of DFS’s limitations: it doesn’t guarantee the shortest path. While the path it gave is valid, the more direct route is obviously shorter.

Finding All Possible Paths

While DFS explores a graph deeply and returns the first path it finds, Breadth-First Search (BFS) explores the graph level by level. This makes it ideal for finding the shortest path in terms of the number of edges.

However, if we want to consider all possible paths—for example, to account for distances, traffic, or other costs—we can use a modified approach that enumerates every path between two nodes. Here’s a Python function that does exactly that:

def find_all_paths(start_node, end_node, path=None, all_paths=None, visited=None):
    if path is None:
        path = [start_node]
    if all_paths is None:
        all_paths = []
    if visited is None:
        visited = {start_node}

    if start_node == end_node:
        all_paths.append(list(path))
        return

    for neighbor in graph.get(start_node, []):
        if neighbor not in visited:
            find_all_paths(neighbor, end_node, path + [neighbor], all_paths, visited | {neighbor})

    return all_paths

Using the function above, we can enumerate all paths from New York to Houston as:

find_all_paths("New York", "Houston")

This will return paths started from New York and ended at Houston.

[
   ['New York', 'Denver', 'Houston'],
   ['New York', 'Los Angeles', 'Miami', 'Dallas', 'Chicago', 'Denver', 'Houston'],
   ['New York', 'Houston'],
]

Notice that the direct connection from New York to Houston is included alongside longer paths.

Why This Is Useful

Having all possible paths gives us the flexibility to:

Choose the shortest path by number of edges.
Consider weighted criteria like distance, traffic, or cost between cities.
Select the optimal route according to any custom metric.

For instance, if we had a distances dictionary storing kilometers between each city pair, we could compute the total distance of each path and pick the one with the lowest value. Similarly, we could account for traffic delays or other travel constraints.

Neo4j Graph Database

While Python implementations like DFS and BFS are great for understanding graph traversal, in real-world applications, graphs can be large and complex, and manually enumerating paths becomes inefficient. This is where graph databases like Neo4j come in.

Neo4j is designed specifically for storing and querying graph data efficiently. Once your graph is in Neo4j, you can perform complex queries—like finding the shortest path between two nodes—with just a single line of Cypher query.

Creating the Graph in Neo4j

Let’s take our city graph as an example. In Neo4j, each city is a node, and each road is a relationship. You can create the nodes like this:

CREATE (Chicago:City {name: 'Chicago'}),
       (Denver:City {name: 'Denver'}),
       (Dallas:City {name: 'Dallas'}),
       (NewYork:City {name: 'New York'}),
       (Houston:City {name: 'Houston'}),
       (Miami:City {name: 'Miami'}),
       (LosAngeles:City {name: 'Los Angeles'}),
       (Seattle:City {name: 'Seattle'}),
       (SF:City {name: 'San Francisco'}),
       (SP:City {name: 'Portland'})

Once nodes created, you can create the edges between nodes as

MATCH (a:City {name: 'Chicago'}), (b:City {name: 'Denver'})
MERGE (a)-[:CONNECTED_TO {distance: 1000}]->(b);

MATCH (a:City {name: 'Chicago'}), (b:City {name: 'Dallas'})
MERGE (a)-[:CONNECTED_TO {distance: 925}]->(b);

MATCH (a:City {name: 'Denver'}), (b:City {name: 'Chicago'})
MERGE (a)-[:CONNECTED_TO {distance: 1000}]->(b);

MATCH (a:City {name: 'Denver'}), (b:City {name: 'New York'})
MERGE (a)-[:CONNECTED_TO {distance: 1800}]->(b);

MATCH (a:City {name: 'Denver'}), (b:City {name: 'Houston'})
MERGE (a)-[:CONNECTED_TO {distance: 880}]->(b);

MATCH (a:City {name: 'Dallas'}), (b:City {name: 'Chicago'})
MERGE (a)-[:CONNECTED_TO {distance: 925}]->(b);

MATCH (a:City {name: 'Dallas'}), (b:City {name: 'Miami'})
MERGE (a)-[:CONNECTED_TO {distance: 1300}]->(b);

MATCH (a:City {name: 'New York'}), (b:City {name: 'Denver'})
MERGE (a)-[:CONNECTED_TO {distance: 1800}]->(b);

MATCH (a:City {name: 'New York'}), (b:City {name: 'Los Angeles'})
MERGE (a)-[:CONNECTED_TO {distance: 2800}]->(b);

MATCH (a:City {name: 'New York'}), (b:City {name: 'Houston'})
MERGE (a)-[:CONNECTED_TO {distance: 1625}]->(b);

MATCH (a:City {name: 'Houston'}), (b:City {name: 'Denver'})
MERGE (a)-[:CONNECTED_TO {distance: 880}]->(b);

MATCH (a:City {name: 'Miami'}), (b:City {name: 'Dallas'})
MERGE (a)-[:CONNECTED_TO {distance: 1300}]->(b);

MATCH (a:City {name: 'Miami'}), (b:City {name: 'Los Angeles'})
MERGE (a)-[:CONNECTED_TO {distance: 2700}]->(b);

MATCH (a:City {name: 'Los Angeles'}), (b:City {name: 'New York'})
MERGE (a)-[:CONNECTED_TO {distance: 2800}]->(b);

MATCH (a:City {name: 'Los Angeles'}), (b:City {name: 'Miami'})
MERGE (a)-[:CONNECTED_TO {distance: 2700}]->(b);

MATCH (a:City {name: 'Seattle'}), (b:City {name: 'Portland'})
MERGE (a)-[:CONNECTED_TO {distance: 175}]->(b);

MATCH (a:City {name: 'Seattle'}), (b:City {name: 'San Francisco'})
MERGE (a)-[:CONNECTED_TO {distance: 680}]->(b);

This will locate nodes by their names and connect them with an additional property representing distance in miles. Once the relationships are established, we can monitor the structure of the graph as follows:

Graph

I didn’t realize there were two separate sets of nodes until I visualized the graph. For example, there’s no route from New York to San Francisco in this graph. And I didn’t even call something like find_path to figure this out.

Finding the Shortest Path

Once the graph is set up, finding the shortest path between Houston and Miami is straightforward:

MATCH (start:City {name: 'Hounston'}), (end:City {name: 'Miami'})
MATCH p = shortestPath((start)-[:CONNECTED_TO*]-(end))
RETURN p

shortestPath automatically finds the minimal number of hops between the two nodes.
You no longer need to enumerate all paths manually.
Neo4j handles large graphs efficiently and scales much better than a pure Python solution for real-world data.

Shortest Path From Hounston to Miami

But it didn’t take into account the distance values of the routes when going from Houston to Miami. We’ll cover this in Dijkstra’s algorithm section.

Finding (Almost) All Paths

In some cases, you may want to explore all possible paths between two nodes, or at least paths up to a certain length. While Neo4j doesn’t natively return every single path like our Python find_all_paths function, you can use variable length patterns to find paths up to a specific number of hops.

MATCH (start:City {name:'New York'}), (end:City {name:'Houston'})
MATCH p = (start)-[:CONNECTED_TO*1..2]-(end)
RETURN p

All Paths

Dijkstra’s Algorithm – Finding the Shortest Paths In A Weighted Graph

Neo4j’s built-in shortest path function returned the path Houston → New York → LA → Miami. However, the total cost of this path is 1,626 + 2,800 + 2,700 = 7,125 miles. The shortest path function ignored the actual distances between nodes—in other words, it treated the graph as unweighted, even though it is weighted.

We will use Neo4j’s GDS (Graph Data Science) module to handle weighted graphs. GDS does not come with Neo4j by default. If you are using Neo4j Desktop, you can install it by going to Local Instances → your instance → three dots → Plugins → Graph Data Science → Download. I always install both GDS and APOC. If you are running Neo4j directly (not via Desktop), please follow the official installation tutorial.

First, we will create our graph and specify that distances in the relationships should be taken into account during calculations.

MATCH (source:City)-[r:CONNECTED_TO]->(target:City)
RETURN gds.graph.project(
'cityGraph',
source,
target,
{ relationshipProperties: r { .distance } }
)

Once the graph is created, you can take distance costs into account when finding paths.

MATCH (source:City {name: 'Houston'}), (target:City {name: 'Miami'})
CALL gds.shortestPath.dijkstra.stream('cityGraph', {
sourceNode: source,
targetNodes: target,
relationshipWeightProperty: 'distance'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN
index,
gds.util.asNode(sourceNode).name AS sourceNodeName,
gds.util.asNode(targetNode).name AS targetNodeName,
totalCost,
[nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS nodeNames,
costs,
nodes(path) as path
ORDER BY index

Please remember that when we use shortest path function, it didn’t care the distance costs and gave us Hounston -> New York -> LA -> Miami path with 7,125 miles cost.

Dijkstra’s Result

Now, it gives us the path Houston → Denver → Chicago → Dallas → Miami. Although we traverse 5 cities instead of 3, the total cost of this route is 880 + 1,000 + 925 + 1,300 = 4,105 miles. This shows that it actually considered the distance costs between the city nodes when finding the route—in other words, it returned the true shortest path.

Conclusion

Graphs are everywhere in computer science, and knowing how to traverse them efficiently is essential. In this guide, we explored Depth-First Search (DFS) and Breadth-First Search (BFS), two fundamental traversal algorithms, and implemented them in Python to see how they work in practice. We also discussed how all possible paths can be enumerated, allowing us to select the shortest or optimal path based on different criteria such as distance, cost, or traffic.

Finally, we demonstrated how these concepts translate to a real-world scenario using Neo4j, where finding the shortest path between nodes can be achieved with a single query, combining the power of graph databases with algorithmic thinking.

By understanding both the Python implementations and Neo4j queries, you gain a comprehensive view of graph traversal, from coding interviews to practical applications in software development. With these skills, you are well-equipped to tackle graph-related problems efficiently and confidently.

The post A Practical Guide to Graph Traversal in Data Structures and Algorithms appeared first on Sefik Ilkin Serengil.

Building an Atomic, High-Throughput Election System: A System Design Case Study

Sefik Serengil — Mon, 15 Sep 2025 16:57:07 +0000

It’s election night. Millions of citizens have cast their votes, and across the country, hundreds of thousands of ballot boxes are being counted simultaneously. Broadcasters, news websites, and political parties all want instant access to the vote totals — both nationally and per city. Every second matters, and delays in reporting can lead to confusion or mistrust. Building a system to handle this scenario requires balancing speed, concurrency, and reliability. The system must aggregate thousands of updates per second using atomic operations, provide real-time visibility for dashboards and live feeds, and at the same time maintain a persistent history for auditing and post-election analysis.

White Ballot Box on White Background by pexels

In this post, we’ll explore a solution that uses Redis for fast and atomic, high level real-time vote counting and PostgreSQL for durable logging. This hybrid approach allows us to deliver instant insights while keeping complete historical records, demonstrating a pattern that is widely applicable to other high-concurrency, real-time systems.

Real-Time Vote Counting System

To handle real-time vote counting effectively, we use a hybrid architecture: Redis for fast counters for high level results and PostgreSQL for persistent logging. This allows us to deliver instant national and city-level totals while maintaining full historical records.

Redis for Atomic Fast Counters

Each political party and city combination has its own Redis key. When a ballot box reports results, we bulk update the counters using INCRBY:

votes = {
    "tories": 250,
    "labour": 200,
    "libdems": 45,
    "reform": 20,
}
city = "london"

for party, count in votes.items():
    # Increment national total
    redis.incrby(f"election:2025:party:{party}", count)
    # Increment city-level total
    redis.incrby(f"election:2025:city:{city}:party:{party}", count)

Key Naming Strategy:

National totals: election:2025:party:
City totals: election:2025:city::party:

This design has several benefits:

Atomic updates: Multiple ballot boxes can update totals concurrently without conflicts.
Instant querying: A simple GET retrieves the current vote count for a political party for country level or city level.
Scalability: Adding new cities or parties requires no schema changes.

Example: Three UK Cities

Let’s take London, Manchester, and Edinburgh with parties tories, labour, libdems, and reform. When ballot boxes report votes:

London: Tories 120, Labour 200, Lib Dems 45, Reform 20
Manchester: Tories 90, Labour 150, Lib Dems 30, Reform 10
Edinburgh: Tories 50, Labour 80, Lib Dems 40, Reform 5

The system updates both national and city-level counters in Redis, providing real-time totals for dashboards:

tories_total = redis.get("election:2025:party:tories")
labour_london = redis.get("election:2025:city:london:party:labour")

RDBMS for Persistent Logging

While Redis handles live counts, a relational database management system such as PostgreSQL stores detailed logs of every ballot box report:

ballot_id | city      | party    | votes | timestamp
-----------------------------------------------------
1001      | london    | tories   | 120   | 2025-09-15 18:12
1001      | london    | labour   | 200   | 2025-09-15 18:12

This allows:

Historical reporting per city or party
Auditability for election verification
Analysis of reporting patterns (e.g., which ballot boxes reported first)

Why Not Only RDBMS?

Instead Redis, it’s possible to rely solely on PostgreSQL using SELECT … FOR UPDATE to increment counters:

BEGIN;
SELECT votes FROM vote_count
WHERE party='tories' AND city='london' FOR UPDATE;
UPDATE vote_count SET votes = votes + 120
WHERE party='tories' AND city='london';
COMMIT;

However, this approach has serious drawbacks under high load:

Row-level locking creates contention with hundreds of concurrent updates
Real-time dashboards may lag
Each update requires a full transaction, it’s much slower than Redis atomic counters

Using Redis for real-time counting and PostgreSQL for persistent logging combines speed and reliability, making it suitable for large-scale, high-concurrency systems like elections.

Redis Cluster Sharding for Production

In a production-grade election system, a single Redis instance is not enough to handle high concurrency and large datasets. That’s why we use a Redis Cluster, which automatically spreads data across multiple nodes.

From the user or developer perspective, it’s very simple: every key behaves as if it’s in a single Redis instance. Whether you SET, INCRBY, or GET a key, the cluster automatically routes your request to the correct node behind the scenes. You don’t need to worry about which node actually stores the key — the client library handles it.

This setup ensures that real-time vote counts remain fast, reliable, and scalable, even when thousands of ballot boxes report results simultaneously.

Deferred Processing of Ballot Box Requests with Idempotency

In a high-concurrency election system, ballot box requests can sometimes fail due to network issues, worker crashes, or database errors. To handle this reliably, we use a deferred processing pattern with idempotency, storing the raw votes for all parties in the database.

Workflow

1- Store the request in the database: When a ballot box reports results, save the entire request as JSON:

id | ballot_box_id | city  | votes_json                                      | completed | created_at
------------------------------------------------------------------------------------------------
1  | BB123         | london | {"tories":120,"labour":200,"libdems":45,"reform":20} | false     | 2025-09-15 18:12

completed is initially set to false.

2- Process the request: For each party in votes_json, increment Redis counters using INCRBY and insert or update the Postgres logs. If both succeed, mark completed = true.

3- Retry for pending requests: If a request remains completed = false for more than 10 minutes, consider it pending and retry processing.

Redis increment and postgres update steps are idempotent: if they are already happened for a given request, skip it to avoid double transaction.

request = get_request_from_db(ballot_box_id)

if not request.completed:
    # Redis update
    for party, count in request.votes_json.items():
        redis_idempotency_key = f"processed:{request.id}:{party}:redis"

        if not rc.exists(redis_idempotency_key):
            rc.incrby(f"election:2025:party:{party}", count)
            rc.set(redis_idempotency_key, 1)

    # Postgres update
    postgres_idempotency_key = f"processed:{request.id}:postgres"
    if not rc.exists(postgres_idempotency_key):
        insert_vote_log(request)
        rc.set(postgres_idempotency_key, 1)

    # Mark as completed
    mark_completed(request)

Advantages

Reliable: Failed or stuck requests can be retried without affecting vote totals.
Idempotent: Each ballot box request is processed exactly once per party.
Scalable: Redis handles fast atomic counters, while Postgres maintains a durable audit trail.
Raw Data Storage: Storing the complete votes JSON allows for flexible reporting, analytics, and auditing.

Conclusion

Designing a real-time election result system highlights the importance of combining speed, scalability, and reliability. Using Redis for atomic counters allows us to aggregate votes instantly, whether at the national level or broken down by city. Even when hundreds of ballot boxes report results simultaneously, the system can safely increment totals without race conditions or delays.

At the same time, PostgreSQL provides a durable, persistent store for detailed logs, capturing which ballot box reported which votes and when. This ensures complete auditability and allows analysts to generate historical reports, verify results, and answer questions like “Which city reported the most votes between 6 PM and 7 PM?”

While it would be possible to rely solely on a relational database using SELECT FOR UPDATE to increment totals, this approach quickly becomes a bottleneck under high concurrency. Row-level locking slows down updates and makes real-time dashboards lag. By combining Redis for real-time counting with PostgreSQL for persistent logging, we achieve both speed and reliability.

This hybrid pattern is not only applicable to elections but also to any scenario that requires real-time metrics under high concurrency: live dashboards, active user counters, distributed job queues, and more. It’s a clean, scalable solution that balances performance with durability — exactly the kind of design interview solution that demonstrates practical understanding of system design principles.

The post Building an Atomic, High-Throughput Election System: A System Design Case Study appeared first on Sefik Ilkin Serengil.

A Practical Guide to Dependency Injection and Event-Driven Architecture in FastAPI and Kafka

Sefik Serengil — Wed, 10 Sep 2025 10:01:08 +0000

FastAPI has quickly become one of the most popular frameworks in the Python ecosystem because of its modern design, async-first approach, and developer-friendly features such as automatic validation and interactive API documentation. But building real-world applications involves more than just creating endpoints—it requires designing software that is scalable, maintainable, and testable.

Two architectural ideas that support this goal are dependency injection and event-driven design. Dependency injection makes it possible to separate concerns by removing hardcoded dependencies and allowing them to be provided from the outside. This approach improves flexibility and makes testing significantly easier. Event-driven architecture, on the other hand, encourages components to communicate through events instead of direct calls, leading to systems that are loosely coupled and easier to scale.

When combined, these two patterns offer a powerful foundation for building robust applications with FastAPI. In this post, we will explore how to take advantage of FastAPI’s built-in dependency injection system and how to integrate it with a lightweight event bus to create APIs that are clean, testable, and ready for production.

Flash From DC

Vlog

What is Event Driven Design?

Have you ever wondered how Starbucks manages its queues so efficiently? You walk in, and the first barista takes your name and order, then writes it on a cup. That’s their role—quick and focused. The cup is then passed to another barista, whose job is to prepare your coffee. Meanwhile, someone else may be restocking coffee beans to ensure supplies never run out.

Writing your name on a paper cup is quick and effortless, much like the producing phase of event-driven architecture. Brewing the coffee, on the other hand, takes more time, similar to the consuming and processing phase. And just like the resumable infrastructure of event-driven systems, even if the baristas change shifts, the next barista can simply pick up the cups waiting on the shelf and continue brewing without any disruption.

What is Dependency Injection?

Imagine you walk into Starbucks again. When you order, you don’t bring your own coffee machine, beans, milk, and cups. The barista already has everything they need provided for them. Their job is just to prepare your coffee — they don’t need to know where the beans came from, who supplied the milk, or how the machine works.

That’s exactly what dependency injection is in software. Instead of having your code create and manage all its dependencies by itself (like the barista growing their own coffee beans before making your latte), those dependencies are provided to it from the outside. The function, class, or module just focuses on its own responsibility.

In FastAPI, for example, if an endpoint needs access to a database, you don’t make it connect directly inside the function. Instead, you “inject” the database connection as a dependency. This keeps your code clean, testable, and easy to maintain — just like a barista can easily switch from one brand of beans to another without having to change how they make coffee.

Why Event Driven Architecture is So Important?

In production, we generally try to avoid long-running for loops. Python behaves differently from JavaScript in this regard: JavaScript can process forEach loops in parallel, while Python runs them serially by default. You can use multiprocessing to parallelize Python tasks, but that approach is limited by the number of cores on a single machine.

Event-driven architecture (EDA) allows us to scale beyond a single machine. Instead of processing everything sequentially, we can distribute work across multiple servers and take full advantage of many cores on each machine. A common guideline is to run 2–4 worker processes per core. For example, the machine I’m writing this on has 6 physical cores (12 virtual cores), which means I could run 24–48 workers for experimentation.

If you need even more processing power, you can simply add more servers to your architecture. With modern cloud infrastructure, scaling up is often just a matter of changing configuration. In short, adopting EDA removes the bottleneck of single-threaded execution and makes scalability a much easier problem to solve.

Use Case

Similar to the experiment we did with Flask and Kafka, we will implement the same use case: analyzing facial attributes, such as age and gender, for multiple faces in a single photo. The source code for this experiment has already been pushed to GitHub.

Variables and Modules

The purpose of dependency injection in this experiment serves two main goals: loading environment variables just once and loading modules according to the principle of separation of concerns, also only once.

I prefer to store environment variables in a .env file. Since there’s no sensitive information in this project, I pushed the .env file to Git. However, if your environment variables contain sensitive data, make sure to add .env to your .gitignore file and add only keys into .env.example.

# .env

EDA_ACTIVATED=1
DETECTOR_BACKEND=mtcnn
KAFKA_URI=localhost:9093
TF_CPP_MIN_LOG_LEVEL=2

Then, I load them during application initialization using the load_dotenv function from the python-dotenv package.

# src/app.py

from dotenv import load_dotenv
load_dotenv()

I have a Variables class that loads environment variables and assigns them to attributes, setting default values when a variable does not exist. In this class, I also define some constants, such as topic names, for easier reference throughout the application.

# src/dependencies/variables.py

# built-in dependencies
import os


# pylint: disable=too-few-public-methods
class Variables:
    def __init__(self):
        self.kafka_uri = os.getenv("KAFKA_URI", "CHANGEME")
        self.detector_backend = os.getenv("DETECTOR_BACKEND", "opencv")
        self.is_eda_activated = os.getenv("EDA_ACTIVATED", "0") == "1"
        self.topics = ["faces.extracted"]

Next, I’ll create a singleton Container class. This class will initialize application modules according to the principle of separation of concerns and provide them with the necessary variables.

# src/dependencies/container.py

from dependencies.variables import Variables


class Container:
    _instance = None
    _initialized = False
    
    def __init__(self, variables: Variables):
        self.logger = Logger()
        self.deepface_service = DeepFaceService(
            logger=self.logger,
            detector_backend=variables.detector_backend,
        )
        self.event_service = KafkaService(
            logger=self.logger,
            server_uri=variables.kafka_uri,
        )

        for topic_name in variables.topics:
            self.event_service.create_topic_if_not_exists(
                topic_name=topic_name
            )

        self.core_service = CoreService(
            logger=self.logger,
            deepface_service=self.deepface_service,
            event_service=self.event_service,
            is_eda_activated=variables.is_eda_activated,
        )

        self._initialized = True
        self.logger.info("Container initialized")

    def __new__(cls, variables=None)
        if cls._instance = None:
            cls._instance = super().__new__(cls)
        return cls._instance

Dependency Injection – Traditional Way

We could create the container during application startup and attach it to the router.

# src/app.py

def create_app() -> FastAPI:
    app = FastAPI()
    # enable CORS
    app.add_middleware(
        CORSMiddleware,
        allow_origins=["*"],
        allow_credentials=True,
        allow_methods=["*"],
        allow_headers=["*"],
    )

    variables = Variables()
    container = Container(variables=variables)

    # dependency injection to router directly
    router.container = container

    app.include_router(router)
    container.logger.info("router registered")

    # startup event for FastStream
    @app.on_event("startup")
    async def start_faststream():
        # FastStream is blocking; we are staring with background task
        asyncio.create_task(faststream_app.run())
        container.logger.info("FastStream broker started")

    return app

Then, we will access the container from the router within the routes, whether we are exposing web service endpoints or consuming Kafka topics.

# src/modules/core/routes.py

# 3rd party dependencies
from fastapi import APIRouter
from pydantic import BaseModel

# projet dependencies
from modules.core.bus import broker


router = APIRouter()


class AnalyzeRequest(BaseModel):
    image: str

@router.post("/analyze")
async def analyze(
    payload: AnalyzeRequest,
    background_tasks: BackgroundTasks,
):
    container: Container = router.container  # directly attachted to router


class AnalyzeExtractedFaceRequest(BaseModel):
    face_id: str
    request_id: str
    face_index: int
    encoded_face: str
    shape: tuple

@broker.subscriber("faces.extracted")
async def analyze_extracted_face_kafka(
    input_value: AnalyzeExtractedFaceRequest,
):
    container: Container = router.container  # directly attachted to router

This approach allows the router to access all required services and objects via the container. In other words, any service your endpoints need is immediately available through router.container. But this approach comes with some advantages and disadvantages.

Pros

Simplicity: It’s straightforward and works well in small projects.
Centralized access: All services are stored in one container, so you don’t have to pass them individually.

Cons / Limitations

Not the standard FastAPI DI: FastAPI’s recommended dependency injection system is based on Depends(), which integrates with the request lifecycle. Directly attaching a container bypasses this system.
No request-scoped management: Services are not automatically tied to a request lifecycle. You have to manage instantiation, cleanup, and thread-safety manually.
Testing and mocking are harder: Since dependencies are not injected per-request, replacing services with mocks requires extra boilerplate.
Reduced type safety: You lose FastAPI’s type-based dependency resolution, which can help catch errors at design time.
Less scalable: This pattern can work for small projects but becomes harder to maintain in large, complex applications.

In short: Assigning a container directly to a router is convenient, but it trades off the safety, flexibility, and scalability offered by FastAPI’s built-in Depends system. For production or larger applications, it’s generally recommended to use Depends for dependency injection.

Dependency Injection – FastAPI’s Way

FastAPI provides a built-in dependency injection system using Depends(), which allows you to define dependencies at the request level.

Pros

Request-scoped dependencies: Each request can get its own instance of a dependency, which is important for things like database sessions or user-specific services.
Type safety: FastAPI uses Python type hints to validate and resolve dependencies, reducing runtime errors.
Automatic lifecycle management: FastAPI can handle initialization and cleanup of dependencies, including async context management.
Easier testing and mocking: Dependencies can be overridden for testing using dependency_overrides, making it easy to inject mocks or stubs.
Better scalability: This approach works well in large, complex applications because dependencies are explicit and managed consistently.

Cons / Limitations

Slightly more boilerplate: You need to define dependency functions or classes and use Depends in each endpoint.
Indirect access: Compared to attaching a container directly, you cannot access all services globally; everything must go through dependency injection.
Request-scoped by default: Dependencies are resolved per request. This means if you use a container or a heavy service as a dependency, it will be recreated for each request. If you want it to behave like a singleton, you have to manage that explicitly.

In short: Depends provides a structured, scalable, and testable way to manage dependencies in FastAPI. While it requires a bit more setup than directly attaching a container, it improves safety, maintainability, and alignment with FastAPI best practices.

Dependency Injection

Previously, we were creating the container in src/app.py. In this approach, however, we won’t initialize the container there; instead, we’ll handle everything in src/modules/core/routes.py.

When defining the function for the HTTP POST endpoint /analyze, we will include the container as an input argument using FastAPI’s Depends. The container initialization logic is encapsulated in a get_container function, which is then passed to Depends to provide the container to the endpoint.

Similarly, we will stream the faces.extracted Kafka topic, and whenever a message arrives, the analyze_extracted_face_kafka function will be triggered. The container will be passed as an input argument using FastStream’s Depends. The container initialization logic is encapsulated in the get_container function, which is then provided to Depends to make the container available to the streaming process.

# src/modules/core/routes.py


# 3rd party dependencies
from fastapi import APIRouter, Depends
from faststream import Depends as FastStreamDepends

# project dependencies
from dependencies.variables import Variables
from dependencies.container import Container
from modules.core.bus import broker


router = APIRouter()



def get_container() -> Container:
    variables = Variables()
    return Container(variables=variables)



class AnalyzeRequest(BaseModel):
    image: str


@router.post("/analyze")
async def analyze(
    payload: AnalyzeRequest,
    background_tasks: BackgroundTasks,
    container: Container = Depends(get_container), # via FastAPI Depends
):
    container.logger.info("POST /analyze endpoint is called")



class AnalyzeExtractedFaceRequest(BaseModel):
    face_id: str
    request_id: str
    face_index: int
    encoded_face: str
    shape: tuple


@broker.subscriber("faces.extracted")
async def analyze_extracted_face_kafka(
    input_value: AnalyzeExtractedFaceRequest,
    container: Container = FastStreamDepends(get_container),  # via FastAPI Depends
):
    container.logger.info("STREAM faces.extracted triggered")

Event Driven Architecture with FastAPI

While setting up the infrastructure for dependency injection, we also built the structure to listen to the Kafka topic. In other words, the consumer part of the event-driven architecture was partially designed as well.

Thanks to FastStream, the analyze_extracted_face_kafka function is triggered automatically whenever a message is published to the faces.extracted Kafka topic, just like a web service endpoint. Without FastStream, using the standard Kafka Python package would require running a long-lived application, which is much harder to debug and test.

However, in its current state, it won’t actually listen to the topic because the broker we imported at the routes level is not yet fully initialized.

# src/modules/core/bus.py


# 3rd party dependencies
from faststream import FastStream
from faststream.kafka import KafkaBroker

# project dependencies
from dependencies.container import Variables

variables = Variables()

broker = KafkaBroker(variables.kafka_uri)
faststream_app = FastStream(broker)

Later, when we start our FastAPI application, we will also need to start the FastStream application.

# src/app.py

def create_app() -> FastAPI:
    app = FastAPI()
    # enable CORS
    app.add_middleware(
        CORSMiddleware,
        allow_origins=["*"],
        allow_credentials=True,
        allow_methods=["*"],
        allow_headers=["*"],
    )

    # dependency injection to router directly
    # variables = Variables()
    # container = Container(variables=variables)
    # router.container = container

    app.include_router(router)
    print("router registered")

    # startup event for FastStream
    @app.on_event("startup")
    async def start_faststream():
        # FastStream is blocking; we are staring with background task
        asyncio.create_task(faststream_app.run())
        print("FastStream broker started")

    return app

service = create_app()

Now, our application is ready to stream incoming messages to the Kafka topics.

Finally, we can start our application using Uvicorn. Here, app comes from src/app.py, and service is obtained from the corresponding variable defined there.

uvicorn app:service --host 0.0.0.0 --port 5000 --workers 2

Producing Messages

In our core service, the event service is already referenced. We simply need to call its produce method to send messages to a Kafka topic.

# src/modules/core/service.py

def analyze(self, image: str, request_id: str):
    faces = self.deepface_service.extract_faces(image)
    self.logger.info(f"extracted {len(faces)} faces")
    for idx, face in enumerate(faces):
        encoded_face = base64.b64encode(face.tobytes()).decode("utf-8")
        self.event_service.produce(
            topic_name="faces.extracted",
            key="extracted_face",
            value={
                "face_id": uuid.uuid4().hex,
                "face_index": idx,
                "encoded_face": encoded_face,
                "request_id": request_id or "N/A",
                "shape": face.shape,
            },
        )
        self.logger.info(
            f"{idx+1}-th face sent to kafka topic faces.extracted"
        )

FastAPI vs Flask

There’s a long-standing debate in the Python community about which is better: Flask or FastAPI. To be honest, both are excellent frameworks, but FastAPI has become the more popular choice in recent years. That said, if you’re a Flask fan, you can check out this post to see the Flask equivalent: Dependency Injection and Event-Driven Architecture with Flask and Kafka.

Conclusion

Designing applications with FastAPI goes beyond defining endpoints—it’s about building systems that are resilient, testable, and ready to grow. By embracing dependency injection, we gain flexibility and the ability to swap or mock components without rewriting business logic. By applying event-driven principles, we decouple responsibilities and make it easier for services and features to evolve independently.

When used together, these patterns create a foundation where FastAPI applications remain clean and maintainable while also being prepared for the demands of production environments. Whether you start with a simple in-memory event bus or integrate with a distributed message broker like Kafka or RabbitMQ, the combination of dependency injection and event-driven architecture can significantly improve the quality and scalability of your projects.

The key takeaway is that good architecture doesn’t just solve today’s problems—it ensures the codebase is ready for tomorrow’s challenges. With FastAPI, dependency injection, and event-driven design working hand in hand, you’ll be well equipped to build applications that stand the test of time.

I pushed the source code of this study into GitHub. You can support this work by starring the repo.

The post A Practical Guide to Dependency Injection and Event-Driven Architecture in FastAPI and Kafka appeared first on Sefik Ilkin Serengil.

A Gentle Introduction to Event Driven Architecture in Python, Flask and Kafka

Sefik Serengil — Thu, 04 Sep 2025 12:17:37 +0000

In today’s fast-paced and scalable application development landscape, event-driven architecture (EDA) has emerged as a powerful pattern for building systems that are decoupled, reactive, and highly extensible. Instead of relying on tightly coupled and synchronous workflows, EDA enables services to communicate through events, promoting flexibility, resilience, and parallel processing. One of the key advantages of EDA is that messages can seamlessly travel between different modules and even across domain boundaries, while their flow can be monitored and traced much more effectively than traditional logs—providing deeper visibility into system behavior and data flow.

Photo of Paper Cup on Top of the Table by Pexels

In this blog post, we’ll explore how to implement an event-driven system in Python and Flask using Kafka as a message broker. Kafka allows us to produce and consume messages efficiently, and it enables scalable, parallel processing across multiple machines, overcoming the limitations of threading, which confines execution to a single machine and its CPU cores. While Kafka is a robust and popular choice, we’ll also touch on alternative queueing mechanisms like RabbitMQ and Celery, demonstrating how EDA can be implemented flexibly depending on your stack and use case. Whether you’re building microservices, data pipelines, or real-time applications, adopting EDA can bring significant gains in scalability, observability, and maintainability.

Vlog

Events in Real Life

Have you ever thought about how Starbucks manages queues so efficiently? You walk in, and one barista takes your name and your order, then writes it on a paper cup. That’s their job — fast and focused. Then that cup moves on to another barista who makes your coffee. Meanwhile, someone else might be restocking coffee beans from the supply chain to make sure nothing runs out.

Each person is doing a different part of the job — at their own pace — but everything flows together smoothly. That’s not a coincidence. It’s a system where each step is triggered by an event: a new customer, a new order, a low inventory alert. That’s basically how event-driven architecture works in software.

If you’re preparing for system design interviews, you’ll hear about this model a lot — and for good reason. It’s a smart way to build systems that scale easily, respond quickly, and handle real-world complexity with elegance.

Key Concepts in Event Driven Architecture

There are three main parts in event-driven systems:

Event: This is a message that says something happened — for example, “user registered.”
Producer: The part of the system that creates and sends the event.
Consumer: The part that listens for events and reacts to them.
Broker: Usually, there’s a middleman like Kafka or RabbitMQ that holds these events and delivers them to the right consumers.

One cool thing about events is that they can be designed as chains — where one event triggers another, which triggers the next, and so on.

Or sometimes, when you get an event, you can split it into multiple smaller tasks. Each task can be processed independently and in parallel. This helps break down complex work into manageable pieces and makes scaling easier.

More Reasonable In Python

In Python, we don’t like long-running for loops because they run serially, one after another, which slows things down. While threading or multiprocessing can help, it depends on the number of CPU cores you have. Event-driven systems take a different approach. You can design a system with many servers and many workers running in parallel.

Scalability is simple — if you add a new server that consumes messages from a topic, your app can process more events at the same time without changing your code.

Decoupling Request Handling from Processing

Another big advantage is how your app handles requests. In a traditional system, when you send a request, you often have to wait until the server finishes all the processing before getting a response.

In event-driven architecture, your app can respond immediately, saying, “Request received,” and give you a unique ID. Behind the scenes, the request is stored as an event in a topic.

Another job or worker listens for these events and does the actual processing, like sending emails or updating a database. To keep users informed, you can expose another endpoint where they can check the status of their request using the ID you provided. This way, your main app stays responsive all the time because it’s not doing the heavy lifting immediately.

Even if the worker goes down temporarily, events stay safely in the topic and get processed later when the worker is back up. This makes your system much more reliable.

Common Tools and Technologies

Back in the 90s, enterprise applications used heavy and complex message queues, called MQs.

Once, I had to implement an event-driven-like system using a temporary database table and a trigger. Whenever a new record was added to the main table, the trigger would insert its metadata into the temporary table. I was continuously polling this temporary table to detect new records. After processing a record, I would delete its corresponding entry from the temporary table.

Today, things are much lighter and easier to use. Popular tools include Kafka, RabbitMQ, and Celery.

Personally, I like consuming messages with Flask. When you build a message bus this way, it feeds incoming events to web service methods exposed in Flask. This approach is great because it makes it easy to monitor, debug, and test your event consumers using familiar HTTP endpoints.

Real-Life Example Scenario

Let’s look at a real-life example involving CCTV cameras and facial recognition. Imagine a busy public center with hundreds of people walking in. The CCTV records images continuously — each new image is fed as an event into the system.

A job consumes the image event and detects faces in the image. For each detected face, it creates a new event and puts it into another topic. Once done, that job’s task is complete.

Another job consumes these face events, runs facial recognition, and converts the faces into numerical representations called embeddings. These embeddings are sent as new events to yet another topic.

The final job listens for these embeddings and searches them against a database of wanted people. If it finds a match, it triggers an event that alerts the authorities.

Finally, criminals are reported to the police with their latest location — all done asynchronously and efficiently through a chain of events.

Traditional Approach

Consider the following snippet. When analyze method gets an image, it firstly calls DeepFace’s extract_faces function and this will return a list of faces. Then, it will call DeepFace’s analyze function for pre-detected and pre-extracted face.

def analyze(self, image: str):
    faces = self.deepface_service.extract_faces(image)
    self.logger.info(f"extracted {len(faces)} faces")
    for idx, face in enumerate(faces):
        demography = self.deepface_service.analyze(face)
        self.logger.info(
            f"{face_index+1}-th face analyzed: {demography['age']} years old "
            f"{demography['dominant_gender']} "
        )

Python’s standard for loops are synchronous and blocking, meaning each iteration completes before the next one starts. In contrast, JavaScript for loops can start asynchronous operations (like Promises) in parallel, allowing multiple tasks to run concurrently.

In Python, achieving parallel or asynchronous execution requires using asyncio, threading, or multiprocessing, because a normal for loop alone will not run tasks in parallel. Even if it runs in parallel like in JavaScript, it will still be limited to the multiple cores of a single machine. It is still not suitable for scaling across multiple machines.

Event Driven Approach

Instead, we will publish each item from the for loop to a Kafka topic, and a separate job will consume and process them.

def analyze(self, image: str):
    faces = self.deepface_service.extract_faces(image)
    self.logger.info(f"extracted {len(faces)} faces")
    for idx, face in enumerate(faces):
        encoded_face = base64.b64encode(face.tobytes()).decode("utf-8")
        self.event_service.produce(
            topic_name="faces.extracted",
            key="extracted_face",
            value={
                "face_index": idx,
                "encoded_face": encoded_face,
                "shape": face.shape,
            },
        )

Producing to Kafka will be very fast because it is an asynchronous operation — the message is queued, but it is not verified whether it has actually been written to the topic (unless explicitly requested).

With the Flask-Kafka package, we can listen to a Kafka topic as if it were a web service. You can also test the service using an HTTP POST with the payload of a message placed on the topic. In other words, it’s the same whether you put a message on the faces.extracted Kafka topic or send an HTTP POST to the localhost:5000/analyze/extracted/face endpoint with the same payload.

@bus.handle("faces.extracted")
@blueprint.route("/analyze/extracted/face", methods=["POST"])
def analyze_extracted_face(input_args):
    event = json.loads(input_args.value)
    container: Container = blueprint.container

    try:
        container.core_service.analyze_extracted_face(
            face_index=event["face_index"],
            encoded_face=event["encoded_face"],
            shape=event["shape"],
        )
        return {"status": "ok", "message": "analyzing face asyncly"}, 200
    except Exception as err:  # pylint: disable=broad-exception-caught
        container.logger.error(
            f"Exception while analyzing single face - {err}"
        )
        container.logger.error(traceback.format_exc())
        return {"status": "error", "detail": str(err)}, 500

This way, the analyze_extracted_face function will be triggered whenever a message is placed on the faces.extracted topic, and it will perform the analysis for that individual face.

    def analyze_extracted_face(
        self,
        face_index: int,
        encoded_face: str,
        shape: Tuple[int, int, int],
    ):
        decoded_face = base64.b64decode(encoded_face)
        face = np.frombuffer(decoded_face, dtype=np.float64).reshape(shape)

        demography = self.deepface_service.analyze(face)
        self.logger.info(
            f"{face_index+1}-th face analyzed: {demography['age']} years old "
            f"{demography['dominant_gender']} "
        )

When starting the service with Gunicorn, faces will be analyzed in parallel according to the number of workers specified in the command (in my experiments, I used 2 workers). If we run this service on multiple machines, each machine will serve with the same number of workers. In other words, for scaling, it will be sufficient to adjust the number of workers and the partition count of the Kafka topic through configuration.

Conclusion

Event-driven architecture represents a shift from rigid, tightly coupled systems toward flexible, scalable, and resilient designs. By leveraging tools like Kafka, RabbitMQ, or Celery, developers can decouple services, enable parallel processing, and gain better visibility into the flow of data across their applications. While implementing EDA may introduce new concepts such as brokers, producers, and consumers, the long-term benefits in terms of scalability, maintainability, and fault tolerance make it a valuable investment for modern software systems. Whether you’re orchestrating microservices, handling high-throughput data streams, or building responsive user experiences, adopting an event-driven mindset can help future-proof your architecture and keep your systems ready for growth.

I pushed the source code of this study into GitHub. I strongly recommend you to pull the repo and run the service locally. Read me of the repo explains the steps to get the service up. Finally, you can support this work by starring the repo.

The post A Gentle Introduction to Event Driven Architecture in Python, Flask and Kafka appeared first on Sefik Ilkin Serengil.

How to Calculate Percentage-Based Confidence Scores from Similarities of Embedding Models

Sefik Serengil — Tue, 02 Sep 2025 13:24:53 +0000

Embedding models have become a cornerstone of modern machine learning, powering everything from recommendation engines and document search to verification systems like signature or image matching. At their core, these models transform complex data—whether it’s text, images, audio, or even behavioral patterns—into high-dimensional numerical vectors, known as embeddings. These embeddings capture the essential features of the data, allowing us to compare items mathematically.

Black and White Dartboard by Pexels

To determine similarity between embeddings, we typically use distance or similarity metrics such as cosine similarity or Euclidean distance. Based on these measurements, we often implement hard classification rules: if the distance between two vectors is below a predefined threshold, we classify them as belonging to the same group; otherwise, they are considered different. This binary approach works well for applications like reverse image search, facial recognition, or fraud detection, where a yes/no decision is sufficient.

However, hard classification has a major limitation—it doesn’t provide a measure of how confident we are in that decision. For instance, knowing that two embeddings are “the same” doesn’t tell us whether the similarity is borderline or nearly identical. This lack of interpretability can be a problem when stakeholders or end-users need a more intuitive understanding of similarity.

In this post, we’ll explore a simple yet effective approach to bridge this gap: converting embedding distances and similarity scores into percentage-based confidence scores. Using a straightforward logistic regression model, we’ll demonstrate how to transform raw distance measurements from any embedding model into interpretable percentages. This not only provides more nuanced insights into similarity but also makes embedding-based systems more transparent and user-friendly.

Use Case

We will use DeepFace to obtain distance scores between vector embeddings for pairs of images, both of the same person and of different people. Additionally, we will leverage the unit test data provided by the DeepFace library.

When we run the verify functionality in DeepFace, it returns both a distance score and a hard classification: True if the pair belongs to the same person, and False if they are different.

# !pip install deepface
from deepface import DeepFace
result = DeepFace.verify("img1.jpg", "img2.jpg")

The result payload will look like this:

{
   "verified": True,
   "distance": 0.41,
   "threshold": 0.68,
}

Preparing The Dataset

The unit test folder contains numerous facial images, each accompanied by its identity information.

idendities = {
 "Angelina": ["img1.jpg", "img2.jpg", "img4.jpg"
    , "img5.jpg", "img6.jpg", "img7.jpg", "img10.jpg", "img11.jpg"],
 "Scarlett": ["img8.jpg", "img9.jpg"],
 "Jennifer": ["img3.jpg", "img12.jpg"],
 "Mark": ["img13.jpg", "img14.jpg", "img15.jpg"],
 "Jack": ["img16.jpg", "img17.jpg"],
 "Elon": ["img18.jpg", "img19.jpg"],
 "Jeff": ["img20.jpg", "img21.jpg"],
 "Marissa": ["img22.jpg", "img23.jpg"],
 "Sundar": ["img24.jpg", "img25.jpg"]
}

First, let’s create a Pandas DataFrame containing only same-person labeled instances by cross-matching the images of each identity.

positives = []
for key, values in idendities.items():
 for i in range(0, len(values)-1):
  for j in range(i+1, len(values)):
   positive = []
   positive.append(values[i])
   positive.append(values[j])
   positives.append(positive)
 
positives = pd.DataFrame(positives, columns = ["file_x", "file_y"])
positives["actual"] = "Same Person"

Then, we’ll add different-person labeled instances by performing cross-sampling across identities.

samples_list = list(idendities.values())
 
negatives = []
for i in range(0, len(idendities) - 1):
 for j in range(i+1, len(idendities)):
  cross_product = itertools.product(samples_list[i], samples_list[j])
  cross_product = list(cross_product)
 
  for cross_sample in cross_product:
   negative = []
   negative.append(cross_sample[0])
   negative.append(cross_sample[1])
   negatives.append(negative)
 
negatives = pd.DataFrame(negatives, columns = ["file_x", "file_y"])
negatives["actual"] = "Different Persons"

Now, let’s concatenate the same-person and different-person labeled instances into a single Pandas DataFrame.

df = pd.concat([positives, negatives]).reset_index(drop = True)
 
df.file_x = "../tests/dataset/"+df.file_x
df.file_y = "../tests/dataset/"+df.file_y

Now, we have a Pandas DataFrame containing the image pair names along with their labels.

Dataframe with pair’s names and labels

Generate Embeddings

I prefer to store the vector embeddings of each image in a dictionary, since the same image may appear multiple times in our Pandas DataFrame. This way, we avoid storing duplicate embeddings and reduce unnecessary repetition.

pivot = {}

model_name = "Facenet"
detector_backend = "mtcnn"

def represent(img_name: str):
    if pivot.get(img_name) is None:
        embedding_objs = DeepFace.represent(img_path=img_name, model_name=model_name, detector_backend=detector_backend)

        if len(embedding_objs) > 1:
            raise ValueError(f"{img_name} has more than one face!")
            
        pivot[img_name] = [embedding_obj["embedding"] for embedding_obj in embedding_objs]
    return pivot[img_name]

Then, we can represent each item in the Pandas DataFrame using its corresponding vector embedding.

img1_embeddings = []
img2_embeddings = []
for index, instance in tqdm(df.iterrows(), total=df.shape[0]):
    img1_embeddings = img1_embeddings + represent(instance["file_x"])
    img2_embeddings = img2_embeddings + represent(instance["file_y"])

df["img1_embeddings"] = img1_embeddings
df["img2_embeddings"] = img2_embeddings

Now, we have a Pandas DataFrame that contains the image pair names along with their corresponding embeddings, structured as follows:

Dataframe with pair’s names, labels and embeddings

Distance Calculation

In each row of the DataFrame, we have two vector embeddings. From these, we can compute the distance for each row as follows:

from deepface.modules.verification import find_distance, find_threshold

distance_metrics = [
    "cosine", "euclidean", "euclidean_l2", "angular",
]

for distance_metric in distance_metrics:
    distances = []
    for index, instance in tqdm(df.iterrows(), total=df.shape[0]):
        img1_embeddings = instance["img1_embeddings"]
        img2_embeddings = instance["img2_embeddings"]

        distance = find_distance(
            alpha_embedding=img1_embeddings,
            beta_embedding=img2_embeddings,
            distance_metric=distance_metric
        )
        distances.append(distance)
    
    df[distance_metric] = distances

This adds the distances for each metric as new columns in the Pandas DataFrame, as shown below:

Dataframe with distances

Hard Classification

Once we have the distance scores, we can classify each pair as same-person or different-person by comparing the distance against a pre-tuned threshold.

for distance_metric in distance_metrics:
    threshold = find_threshold(model_name=model_name, distance_metric=distance_metric)
    df[f"{distance_metric}_threshold"] = threshold
    df[f"{distance_metric}_decision"] = 0
    idx = df[df[distance_metric] <= threshold].index
    df.loc[idx, f"{distance_metric}_decision"] = 1

This adds the pre-tuned threshold and a hard prediction column, with 1 indicating a same-person pair and 0 indicating a different-person pair.

Dataframe with hard predictions

Logistic Regression Model

Next, we will build a logistic regression model to convert distance scores into confidence scores. The distance values will serve as the input features, while the hard predictions will be used as the target labels.

We need to normalize the input distances to the [0, 1] range before feeding them into the model, since the logistic regression uses a sigmoid function, which saturates for values below -4 or above +4.

confidence_metrics = {}

for distance_metric in distance_metrics:
    max_value = df[distance_metric].max()

    X = df[distance_metric].values.reshape(-1, 1)

    # normalize the distance values before feeding them to the model
    if max_value > 1:
        X = X / max_value

    y = df[f"{distance_metric}_decision"].values

    model = LogisticRegression().fit(X, y)

    w = model.coef_[0][0]
    b = model.intercept_[0]

    confidence_metrics[distance_metric] = {
        "w": w,
        "b": b,
        "normalizer": max_value,
    }

    confidences =[]
    for index, instance in df.iterrows():
        distance = instance[distance_metric]

        if max_value > 1:
            distance = distance / max_value

        z = w * distance + b
        confidence = 100 / (1 + math.exp(-z))

        confidences.append(confidence)
    
    df[distance_metric + "_confidence"] = confidences

    confidence_metrics[distance_metric]["denorm_max_true"] = df[df[f"{distance_metric}_decision"] == 1][distance_metric + "_confidence"].max()
    confidence_metrics[distance_metric]["denorm_min_true"] = df[df[f"{distance_metric}_decision"] == 1][distance_metric + "_confidence"].min()

    confidence_metrics[distance_metric]["denorm_max_false"] = df[df[f"{distance_metric}_decision"] == 0][distance_metric + "_confidence"].max()
    confidence_metrics[distance_metric]["denorm_min_false"] = df[df[f"{distance_metric}_decision"] == 0][distance_metric + "_confidence"].min()

After training, we will obtain the coefficient and intercept of the logistic regression model, which define the slope and position of the sigmoid curve:

{'cosine': {'w': -6.502269165856082,
  'b': 1.679048923097668,
  'normalizer': 1.206694,
  'denorm_max_true': 77.17253153662926,
  'denorm_min_true': 41.790002608273234,
  'denorm_max_false': 20.618350202170916,
  'denorm_min_false': 0.7976712344840693},
 'euclidean': {'w': -6.716177467853723,
  'b': 2.790978346203265,
  'normalizer': 18.735288,
  'denorm_max_true': 74.76412617567517,
  'denorm_min_true': 40.4423755909089,
  'denorm_max_false': 25.840858374979504,
  'denorm_min_false': 1.9356150486888306},
 'euclidean_l2': {'w': -6.708710331202137,
  'b': 2.9094193067398195,
  'normalizer': 1.553508,
  'denorm_max_true': 75.45756719896039,
  'denorm_min_true': 40.4509428022908,
  'denorm_max_false': 30.555931000001184,
  'denorm_min_false': 2.189644991619842},
 'angular': {'w': -6.371147050396505,
  'b': 0.6766460615182355,
  'normalizer': 0.56627,
  'denorm_max_true': 45.802357900723386,
  'denorm_min_true': 24.327312950719133,
  'denorm_max_false': 16.95267765757785,
  'denorm_min_false': 5.063533287198758}}

I also stored the denormalization minimum and maximum values to map false predictions to the range [0, 49] and true predictions to [51, 100]. This step is purely optional but can make the confidence scores more intuitive.

Confidence Scores

Now, we can use the trained logistic regression model to convert distance scores into confidence scores for each row, and we can do this separately for each distance metric.

for distance_metric in distance_metrics:
    for index, instance in df.iterrows():
        current_distance = instance[distance_metric]
        threshold = find_threshold(model_name=model_name, distance_metric=distance_metric)

        prediction = "same person" if current_distance <= threshold else "different persons"

        # denormalize same person predictions
        if prediction == "same person":
            min_orginal = confidence_metrics[distance_metric]["denorm_min_true"]
            max_orginal = confidence_metrics[distance_metric]["denorm_max_true"]
            min_target = max(51, min_orginal)
            max_target = 100
        else:
            min_orginal = confidence_metrics[distance_metric]["denorm_min_false"]
            max_orginal = confidence_metrics[distance_metric]["denorm_max_false"]
            min_target = 0
            max_target = min(49, max_orginal)

        confidence = instance[f"{distance_metric}_confidence"]

        confidence_new = (
            (confidence - min_orginal) / (max_orginal - min_orginal)
        ) * (max_target - min_target) + min_target
        
        confidence_new = float(confidence_new)

        # print(f"{prediction}: {confidence}  -> {confidence_new}")

        df.loc[index, f"{distance_metric}_confidence"] = confidence_new

We now have confidence scores computed for each pair.

Dataframe with confidence scores

Distributions

Next, let’s plot the distributions of confidence scores for same-person and different-person pairs. Ideally, we should see scores in the range 0–49 for different-person pairs and 51–100 for same-person pairs.

for distance_metric in distance_metrics:
    df[df.actual == "Same Person"][f"{distance_metric}_confidence"].plot.kde(label="Same Person")
    df[df.actual == "Different Persons"][f"{distance_metric}_confidence"].plot.kde(label="Different Persons")
    plt.legend()
    plt.show()

The confidence scores are indeed distributed within the expected ranges.

Distributions

With this approach, we were able to convert continuous distance values into finite confidence scores ranging from 0 to 100. Since we used distances and predictions from a dataset for this conversion, we also accounted for how a small decrease in distance affects the confidence score—similar to how a derivative measures sensitivity. In other words, instead of just meaningless distance values, we now have meaningful, interpretable, and actionable confidence scores on a 0–100 scale. At this point, you can implement additional actions, such as taking automatic action for classifications with confidence above 75, while sending scores in the 51–75 range for human review.

In Summary

Converting embedding distances and similarity scores into percentage-based confidence scores adds a layer of interpretability that hard classification cannot provide. Instead of relying solely on true/false decisions, we can now understand not just whether two items are considered the same, but how strong or fuzzy that classification is. Even a simple approach like logistic regression allows us to transform raw embedding metrics into intuitive, human-understandable percentages. This added nuance makes similarity-based systems more transparent, informative, and user-friendly, bridging the gap between powerful machine learning models and actionable insights.

I pushed the source code of this study into GitHub. You can support this work by starring the repo.

The post How to Calculate Percentage-Based Confidence Scores from Similarities of Embedding Models appeared first on Sefik Ilkin Serengil.

What Are Vector Embeddings? And Why They Matter in AI

Sefik Serengil — Sat, 28 Jun 2025 13:22:15 +0000

If you’ve spent time exploring machine learning or AI recently, you’ve probably heard the term “embedding” — vector embedding, word embedding, image embedding… But what exactly are embeddings? And why are they everywhere?

In this post, we’ll break down what vector embeddings really are, why they’re useful, and how you can understand as human-beings. Along the way, we’ll use facial recognition models as a concrete example — but embeddings are also critical in systems like reverse image search, recommendation engines, and large language models (LLMs) like GPT.

Robot Pointing On A Wall By Pexels

What Is a Vector Embedding?

In traditional machine learning, we often build classification models that are trained to recognize a fixed set of categories.

For example:
– In binary classification, we might ask: Is this a hot dog or not?
– In multiclass classification, we might extend that to: Is this a hot dog, pizza, burger, or pasta?

These models can be very powerful. But they have a major limitation: if you want to add a new category (e.g., taco), you need to retrain the model from scratch — often using a large dataset.

That’s where vector embeddings come in.

Instead of assigning inputs to rigid class labels, a model can output a vector embedding — a list of numbers that captures the essence or features of the input (whether it’s a food image, a face, a sentence, etc.). You can imagine that we would have infinite number of classes in our classification models.

For instance, if you pass an image of a burger through a deep neural network, it might produce a vector like:

embedding = [0.21, -0.57, 0.89, ..., 0.04]  # e.g., 512 dimensions

This vector doesn’t just say “burger” — it encodes visual and semantic features (like round shape, color, texture, etc.).

Here’s the powerful part: once you have embeddings, you don’t need to retrain the model to handle new examples. If you introduce a new food item (say taco), you can extract its embedding and compare it to other embeddings — using simple distance metrics.

So instead of retraining a classification model every time the world changes, you can use a pre-trained embedding model that generalizes to any kind of food, person, sentence, etc.

Where Are Embeddings Used?

Embeddings are used everywhere in modern AI. Here are a few real-world examples:

1. Facial Recognition

A deep learning model extracts embeddings from face images. These embeddings represent unique facial features in a numeric form. When two embeddings are similar, the system considers the faces to belong to the same person.

2. Reverse Image Search

When you upload an image to search for similar ones (e.g., Google Lens), the system converts your image into an embedding. It then compares that vector to millions of others to find visually similar results.

3. Large Language Models (LLMs)

LLMs like GPT use embeddings to represent words, sentences, and even entire documents. For example, the phrase “I’m feeling great today!” is turned into a vector that captures its sentiment and meaning — making it comparable to similar phrases like “I’m doing well.”

Why Are Embeddings Hard to Understand?

Here’s the tricky part: embeddings often live in high-dimensional spaces — with hundreds or even thousands of dimensions.

Humans naturally think in 2D, 3D (or 4D when you include time). We understand maps, graphs, and physical space. But try to imagine 512 dimensions? It’s impossible. Our brains aren’t wired for that.

Yet, for a computer, operating in high-dimensional spaces is totally normal. It can easily calculate similarities, distances, and clusters in those spaces using linear algebra.

So while embeddings are intuitive in what they represent, they’re not directly interpretable or visualizable — unless we simplify them.

Let’s Make Them Undestandable

To help us see embeddings, we can use dimensionality reduction techniques like PCA (Principal Component Analysis). PCA takes high-dimensional vectors and projects them down to lower dimensional space (2D in our experiment) — preserving the most important relationships between them.

When we apply PCA to a set of embeddings, we can plot them on a graph. Here’s what typically happens:

Similar items form tight clusters.
Different items spread out across the graph.

This gives us a visual intuition for how the model understands data.

BTW, PCA doesn’t have to reduce the dimension to 2D or 3D. For instance, VGG-Face represents facial images as 4096-dimensional vectors, then VGG researchers used a dimension reduction method (possibly PCA) to reduce it to 1024-dimensions before verification.

Example: Face Embeddings in 2D

Let’s take facial recognition as an example.

Say we extract 512-dimensional embeddings for multiple face images using the FaceNet model. Then we apply PCA to reduce the vectors to 2D.

When we plot them:

All embeddings from the same person appear as a tight cluster.
Embeddings from different people are distant from each other.

This confirms the key idea: similar data → similar vectors → close points. And this pattern holds whether you’re dealing with faces, texts, products, or users.

Preparing Embeddings

I will use unit test items of DeepFace library. Normally, images aren’t coming with labels for identities but I did it for you.

database = {
    "angelina_jolie": [
        "dataset/img1.jpg",
        "dataset/img2.jpg",
        "dataset/img4.jpg",
        "dataset/img5.jpg",
        "dataset/img6.jpg",
        "dataset/img7.jpg",
        "dataset/img10.jpg",
        "dataset/img11.jpg",
    ],
    "jennifer_aniston": [
        "dataset/img3.jpg",
        "dataset/img12.jpg",
        "dataset/img53.jpg",
        "dataset/img54.jpg",
        "dataset/img55.jpg",
        "dataset/img56.jpg",
    ],
    "scarlett_johansson": [
        "dataset/img9.jpg",
        "dataset/img47.jpg",
        "dataset/img48.jpg",
        "dataset/img49.jpg",
        "dataset/img50.jpg",
        "dataset/img51.jpg",
    ],
    "mark_zuckerberg": [
        "dataset/img13.jpg",
        "dataset/img14.jpg",
        "dataset/img15.jpg",
        "dataset/img57.jpg",
        "dataset/img58.jpg",
    ],
    "jack_dorsey": [
        "dataset/img16.jpg",
        "dataset/img17.jpg",
        "dataset/img59.jpg",
        "dataset/img61.jpg",
        "dataset/img62.jpg",
    ],
    "elon_musk": [
        "dataset/img18.jpg",
        "dataset/img19.jpg",
        "dataset/img67.jpg",
    ],
    "marissa_mayer": [
        "dataset/img22.jpg",
        "dataset/img23.jpg",
    ],
    "sundar_pichai": [
        "dataset/img24.jpg",
        "dataset/img25.jpg",
    ],
    "katty_perry": [
        "dataset/img26.jpg",
        "dataset/img27.jpg",
        "dataset/img28.jpg",
        "dataset/img42.jpg",
        "dataset/img43.jpg",
        "dataset/img44.jpg",
        "dataset/img45.jpg",
        "dataset/img46.jpg",
    ],
    "matt_damon": [
        "dataset/img29.jpg",
        "dataset/img30.jpg",
        "dataset/img31.jpg",
        "dataset/img32.jpg",
        "dataset/img33.jpg",
    ],
    "leonardo_dicaprio": [
        "dataset/img34.jpg",
        "dataset/img35.jpg",
        "dataset/img36.jpg",
        "dataset/img37.jpg",
    ],
    "george_clooney": [
        "dataset/img38.jpg",
        "dataset/img39.jpg",
        "dataset/img40.jpg",
        "dataset/img41.jpg",
    ],
}

Then, will use DeepFace to represent those images as vector embeddings.

model_name = "Facenet512"

# store each identity's many embeddings in vector_database dict
vector_database = {}
for identity, images in tqdm(database.items()):
    target_embeddings = []
    for image in images:
        emb = DeepFace.represent(
            img_path=image,
            model_name=model_name,
            detector_backend="mtcnn"
        )[0]["embedding"]
        target_embeddings.append(emb)
    vector_database[identity] = target_embeddings

# store all identities' embeddings in single target_identities list
target_identities = []
target_embeddings = []
for identity, embeddings in vector_database.items():
    for embedding in embeddings:
        target_embeddings.append(embedding)
        target_identities.append(identity)

# store corresponding identity for target_identities
image_sources = []
for identity, images in tqdm(database.items()):
    for image in images:
        image_sources.append(image.split("/")[-1].split(".")[0])

PCA

Once we feed all embeddings to PCA model, it will give us a list of x and y coordinates for each embedding.

pca = PCA(n_components=2)
vectors_2d = pca.fit_transform(target_embeddings)
x, y = zip(*vectors_2d)

Visualizing

Let’s demonstrate each embedding in 2D space. Also, we will use different color for each identity. In that way, we will be able to understand the clusters.

printed_labels = set()

plt.figure(figsize=(8, 6))
for i, (x, y) in enumerate(vectors_2d):
    target_identity = target_identities[i]

    target_idx = None
    for idx, (identity, _) in enumerate(database.items()):
        if target_identity == identity:
            target_idx = idx

    plt.scatter(x, y, color=colors[target_idx])

    # plt.text(x + 0.02, y + 0.02, image_sources[i], fontsize=12, color=colors[target_idx])
    if target_identity not in printed_labels:
        plt.text(x + 0.02, y + 0.02, target_identity, fontsize=12, color=colors[target_idx])
        printed_labels.add(target_identity)

plt.title(f"PCA-reduced 2D representations for {model_name} with indices")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.grid(True)
plt.show()

When we run this experiment for same dataset with different models, we will see obvious clusters.

FaceNet512

FaceNet128

VGG-Face

ArcFace

GhostFaceNet

TL;DR — Key Takeaways

Vector embeddings are numeric representations of complex data like images or text. They allow computers to compare and reason about data using simple math. Embeddings are used in facial recognition, reverse image search, LLMs, recommendation systems, and more.

Although embeddings live in high-dimensional spaces, we can project them into 2D, 3D or even 4D using PCA or t-SNE to visualize their structure. Similar embeddings should form clusters — making it easier to interpret how models “see” the world.

Final Thoughts

Embeddings are one of the most elegant ideas in AI — turning messy, unstructured data into structured, comparable vectors. Understanding how they work — and how to visualize them — is a big step toward understanding modern machine learning.

I pushed the source code of this experiment to GitHub. You can support this work by starring the repo.

The post What Are Vector Embeddings? And Why They Matter in AI appeared first on Sefik Ilkin Serengil.

Digital Signatures In Python

Sefik Serengil — Sat, 19 Apr 2025 22:26:31 +0000

Digital signatures play a crucial role in securing data integrity and authenticity across modern systems. Whether it’s signing documents, verifying transactions, or securing communication channels, digital signatures ensure that messages come from legitimate sources and haven’t been tampered with. In this post, we’ll explore how to implement digital signatures in Python using the LightDSA library—a lightweight and flexible cryptographic toolkit that supports multiple signature algorithms and elliptic curve configurations.

Person Holding Fountain Pen By Pexels

Vlog

What is LightDSA?

LightDSA is a Python library designed for generating and verifying digital signatures. It supports a variety of signature schemes including:

RSA
DSA
ECDSA (Elliptic Curve Digital Signature Algorithm)
EdDSA (Edwards-Curve Digital Signature Algorithm)

What sets LightDSA apart is its configurability, especially when it comes to elliptic curve–based algorithms like ECDSA and EdDSA.

ECDSA & EdDSA Curve Support

LightDSA provides three elliptic curve forms:

And each form supports hundreds of pre-defined curves. For example, the Bitcoin protocol uses ECDSA over the secp256k1 curve, which is a Weierstrass-form curve.

Here’s how to use custom curves in LightDSA:

# import library
from lightdsa import LightDSA

# build the curve used in bitcoin
dsa = LightDSA(
    algorithm_name = "ecdsa",
    form_name = "weierstrass", # or koblitz, edwards
    curve_name = "secp256k1" # see supported curves
)

For EdDSA:

# import library
from lightdsa import LightDSA

# build an edwards curve based eddsa
dsa = LightDSA(
    algorithm_name = "eddsa",
    form_name = "edwards", # or weierstrass, koblitz
    curve_name = "ed25519" # see supported curves
)

On the other hand, you can use edwards curves in ECDSA and weierstrass curves in EdDSA, too. But this is not common practice.

RSA and DSA

For RSA:

# import library
from lightdsa import LightDSA

# build rsa cryptosystem
dsa = LightDSA(
    algorithm_name = "rsa",
)

For DSA:

# import library
from lightdsa import LightDSA

# build dsa cryptosystem
dsa = LightDSA(
    algorithm_name = "dsa",
)

Customizing Key Sizes

For RSA and DSA algorithms, you can increase the key size to build stronger cryptosystems. For instance, upgrading from a 2048-bit RSA key to a 4096-bit one dramatically enhances security—though it also increases computation time.

# import library
from lightdsa import LightDSA

# build rsa cryptosystem
dsa = LightDSA(
    algorithm_name = "rsa", # or dsa
    key_size=7680
)

Consider this table before setting key sizes:

Key Size Comparison

In contrast, with ECDSA and EdDSA, security is primarily dictated by the order of the elliptic curve—the number of points it defines—rather than the key size itself. This is mentioned in “n (bits)” column of the supported curves.

Exporting Private and Public Keys

Once you built the cryptosystem, you will be able to export private and public keys as

# export private key
dsa.export_keys("secret.txt")

# export public key
dsa.export_keys("public.txt", public = True)

You must keep your private key secret.

Restoring Cryptosystems

You can restore the cryptosystem from a given secret or public key file as

signer_dsa = LightDSA(
    algorithm_name = algorithm_name,
    form_name = form_name,
    curve_name = curve_name,
    key_file = "secret.txt"
)

verifier_dsa = LightDSA(
    algorithm_name = algorithm_name,
    form_name = form_name,
    curve_name = curve_name,
    key_file = "public.txt"
)

Here, you should send the same algorithm name, form name and curve name when you were creating the cryptosystem.

Signing

Signing a message is very straightforward. You must have the private key to sign a message.

# sign a message
message = "Hello, world!"
signature = dsa.sign(message)

Verification

Verification is also very straightforward. You must have the public key to verify message.

verifier_dsa.verify(message, signature)

Why Use LightDSA?

Lightweight and easy to use
Fully configurable cryptographic backend
Supports modern cryptographic standards
Great for learning, prototyping, and even production usage

Conclusion

LightDSA makes it easy to experiment with different digital signature algorithms and elliptic curve configurations. Whether you’re developing secure systems or simply learning how modern cryptography works, it’s a fantastic tool to have in your Python toolkit.

You can support this study by starring its GitHub repo!

The post Digital Signatures In Python appeared first on Sefik Ilkin Serengil.