Integration with csvdedupe
Created by: fgregg
On twitter, @jpmckinney, @hunterowens, and I discussed integrating csvdedupe and csvlink with csvkit.
I'd really like to see closer connections between these projects, but this could take a number of forms.
- Complete Integration (csvdedupe and csvlink would be subsumed into csvkit)
- Pros
- Seamless experience for users
- Pooling of developer time
- Cons
- Current core devs of csvkit would need to become somewhat familiar with csvdedupe
- The complicated stuff that csvdedupe is doing may not fit within the csvkit philosophy
- Neutral
- A few years ago, it was pretty hard to install dedupe, but python packaging has gotten a lot better. I think this is not a serious disadvantage at present.
- Pros
- Interface compatibility and publicizing each other's projects on these independent projects. csvdedupe and csvlink would need to provide csvkit's common arguments.
- Pros
- Better discoverability for users (more benefit for csvdedupe than csvkit obviously)
- No need for csvkit core devs to know anything about csvdedupe
- Users need to learn less to use csvdedupe
- Cons
- Harder for users
- Pros
- Only publicizing each others's projects
- Pros
- Better discoverability for users (more benefit for csvdedupe than csvkit obviously) 2. No need for csvkit core devs to know anything about csvdedupe
- Cons
- Harder for users
- Pros
- Do Nothing (status quo)
- Pros
- Easiest for core devs
- Cons 2. No advantages of 1,2, or 3
- Pros
We, the core devs of csvdedupe, would be interested in options 1, 2, and 3.
Beyond, @jpmckinney and @onyxfish, @mbauman and @hunterowens might also be interested in this conversation.