Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • S SynthText
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 25
    • Issues 25
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Ankush Gupta
  • SynthText
  • Issues
  • #101
Closed
Open
Issue created Feb 07, 2018 by Shangbang Long@Jyouhou

Some errors in the 800K datasets: oversized word/char box, missing labels

Hi, I am having similar problems as those discussed in #13 #15.

I am using the pre-generated 800K dataset to train a model, and found that there exist the following issues:

(1) Some word/char boxes are oversized, as discussed in #13, #15. (2) Some word recognition annotations are wrong. (3) There are some confusing bounding box coordinate values, e.g. negative value, coordinates that cross over the image boundary, char box coordinates that actually consist of 2 pairs of vertexes(e.g. p1,p1,p2,p2, while 4 different points are expected).

Assignee
Assign to
Time tracking