Hey HN, would appreciate feedback on a version control for data toolset I am building, creatively called the Data Manager. When working with large repositories with data, full checkouts are problematic. Many git-for-data solutions will create a new copy of the entire datasets for each commit and none of them allow contributing to a data repo without full checkouts, to my knowledge. In the video, a workflow that does not require full checkouts of the datasets and still allows to commit changes in Git is presented. Specifically, it becomes possible to check out kilobytes to commit changes to a 130 gigabyte repository, including versions. Note that only diffs are committed, at row, column, and cell level, so the diffing that appears in the GUI will seem weird, since it will interpret the old diff as the file to be compared with the new one, when in fact they are both just diffs. The goal of the Data Manager is to version datasets and structured data in general, in a storage-efficient way,...