Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Issues
  • #825
Closed
Open
Issue created Apr 18, 2017 by Administrator@rootContributor

Integration with csvdedupe

Created by: fgregg

On twitter, @jpmckinney, @hunterowens, and I discussed integrating csvdedupe and csvlink with csvkit.

I'd really like to see closer connections between these projects, but this could take a number of forms.

  1. Complete Integration (csvdedupe and csvlink would be subsumed into csvkit)
    1. Pros
      1. Seamless experience for users
      2. Pooling of developer time
    2. Cons
      1. Current core devs of csvkit would need to become somewhat familiar with csvdedupe
      2. The complicated stuff that csvdedupe is doing may not fit within the csvkit philosophy
    3. Neutral
      1. A few years ago, it was pretty hard to install dedupe, but python packaging has gotten a lot better. I think this is not a serious disadvantage at present.
  2. Interface compatibility and publicizing each other's projects on these independent projects. csvdedupe and csvlink would need to provide csvkit's common arguments.
    1. Pros
      1. Better discoverability for users (more benefit for csvdedupe than csvkit obviously)
      2. No need for csvkit core devs to know anything about csvdedupe
      3. Users need to learn less to use csvdedupe
    2. Cons
      1. Harder for users
  3. Only publicizing each others's projects
    1. Pros
      1. Better discoverability for users (more benefit for csvdedupe than csvkit obviously) 2. No need for csvkit core devs to know anything about csvdedupe
    2. Cons
      1. Harder for users
  4. Do Nothing (status quo)
    1. Pros
      1. Easiest for core devs
    2. Cons 2. No advantages of 1,2, or 3

We, the core devs of csvdedupe, would be interested in options 1, 2, and 3.

Beyond, @jpmckinney and @onyxfish, @mbauman and @hunterowens might also be interested in this conversation.

Assignee
Assign to
Time tracking