Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Issues
  • #106
Closed
Open
Issue created May 18, 2022 by Administrator@rootOwner

Improved memory utilization in API

Created by: stephenroller

🚀 API Performance Improvements

The current API implementation seems to have a memory leak, preventing larger batch sizes from being used. While I haven't instrumented this yet, my gut instinct is it's coming from fragmentation.

In particular, I believe the beam reordering aspects of generation are causing fragmentation. Whenever we reorder incremental state or the logprobs

https://github.com/facebookresearch/metaseq/blob/b2de089f7f1ddc938120ca5210767d9c6926c1db/metaseq/sequence_generator.py#L274-L280

It's likely that we're creating partial views that hang onto tensors and use too much memory when that state is no longer needed.

Furthermore, the returned logits are also likely highly fragmented:

https://github.com/facebookresearch/metaseq/blob/b2de089f7f1ddc938120ca5210767d9c6926c1db/metaseq/sequence_generator.py#L501-L514

In all likelihood, a very carefully placed .continguous() or two is likely to significantly relieve memory pressure.

As a first step, we can add a special case check that the indexes of reordering are not [0, 1, 2, 3, ...] which will create an expensive view but output the same tensor as the input. If we are doing such an identity reordering, we should avoid actually calling reorder_incremental_state.

Assignee
Assign to
Time tracking