Table-Detection

Table Extraction and OCR for Persian Documents 📄✨

This project provides a Python-based solution for detecting table structures in images and extracting Persian text using Optical Character Recognition (OCR). It uses OpenCV for table detection and Tesseract OCR for text extraction, with proper rendering of Persian text. 🚀

Features 🌟

Table Detection 📊: Identifies table cells in images using advanced image processing with OpenCV.
OCR Support 🔍: Extracts Persian text from table cells using Tesseract OCR with Persian language support.
Data Structuring 📈: Organizes extracted text into a Pandas DataFrame for easy analysis.
Visualization 🎨: Displays detected table cells, line masks, and intersection points for verification.

Requirements 🛠️

To run this project, you need the following dependencies: - Python 3.7+ 🐍 - OpenCV (cv2) - NumPy - Matplotlib - Pandas - Pytesseract (for Tesseract OCR) - Tesseract-OCR with Persian language support (tesseract-ocr-fas)

Install the dependencies using:

pip install opencv-python numpy matplotlib pandas pytesseract

For Tesseract OCR:

apt-get install -y tesseract-ocr tesseract-ocr-fas

Usage 🚀

Clone the Repository 📂:

git clone https://github.com/shahin-ro/table-extraction-ocr.git
cd table-extraction-ocr

Prepare an Image 🖼️:
- Ensure you have an image containing a table with Persian text (e.g., a scanned document or screenshot).
- Place the image in the project directory or provide the path to the script.
Run the Script ▶️:
- The script (jadval.py) processes the image, detects table cells, extracts text, and visualizes the results.
- Run the script:
```
python jadval.py
```
Output 📜:
- The script outputs:
  - A count of detected table cells ✅.
  - Extracted text for each cell with coordinates 📍.
  - A Pandas DataFrame representing the table structure 🗃️.
  - Visualizations showing detected cells, line masks, and intersection points 🖼️.

How It Works 🧠

Table Detection 📏:
- Uses OpenCV to preprocess the image (grayscale, adaptive thresholding, morphological operations).
- Detects horizontal and vertical lines to identify table boundaries.
- Clusters line intersections to determine cell coordinates.
Text Extraction 📝:
- Crops each detected cell and processes it with Tesseract OCR (lang='fas') for Persian text extraction.
- Stores text and coordinates for each cell.
Data Structuring 📚:
- Maps extracted text to a grid based on cell positions.
- Creates a Pandas DataFrame to represent the table structure.
Visualization 🖌️:
- Displays three plots:
  - Detected Cells 🟢: Original image with green rectangles around table cells.
  - Line Mask ⚪: Inverted mask showing detected horizontal and vertical lines.
  - Joints 🔲: Intersection points of table lines.

Example 📋

# Example output for a table with 6 cells
✅ Detected 6 cells.
متن سلول 1: نام
مختصات: (50, 30, 150, 80)
---
متن سلول 2: سن
مختصات: (150, 30, 250, 80)
---
...
جدول استخراج شده (متن داخل سلول‌ها):
     0    1    2
0  نام  سن  شغل
1  علی  30  مهندس

Notes 📌

Tesseract OCR 🔍: Requires tesseract-ocr-fas for Persian language support.
Colab Compatibility ☁️: The script is designed to work in Google Colab, with file upload support and Tesseract installation commands.
Image Quality 🖼️: OCR accuracy depends on clear table lines and readable text.

Limitations ⚠️

The table detection algorithm assumes well-defined table lines.
OCR accuracy depends on image quality and text clarity.
Persian text rendering in visualizations may require additional font support for non-Colab environments.

Contributing 🤝

Contributions are welcome! Please submit a pull request or open an issue for bug reports, feature requests, or improvements. 🙌

License 📜

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments 💖

OpenCV for image processing 🖼️.
Tesseract OCR for Persian text extraction 🔍.
Pandas for data structuring 📚.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
Table_Detection.ipynb		Table_Detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table-Detection

Features 🌟

Requirements 🛠️

Usage 🚀

How It Works 🧠

Example 📋

Notes 📌

Limitations ⚠️

Contributing 🤝

License 📜

Acknowledgments 💖

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Table-Detection

Features 🌟

Requirements 🛠️

Usage 🚀

How It Works 🧠

Example 📋

Notes 📌

Limitations ⚠️

Contributing 🤝

License 📜

Acknowledgments 💖

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages