You raise a great point. There’s no single rule to determine if a library is proven enough to use. We will employ a new library if it solves a problem no existing libraries do and does it better than if we wrote the code. As an example, we use codecov
even though it only has a handful of stars because there are not other great options for automatic reporting of code coverage in Python. In other words, we don’t rule out including new libraries in our data science platform, but there has to be a strong reason for including it.
Moreover, we want to see that the library is active meaning commits in the last 90 days and issues are being addressed. Sorry for the ambiguous answer, but on the positive side, the standard libraries for data science in Python ( pandas
, numpy
, scipy
, and sklearn
) are including more functionality over time. In regards to your specific question about dask
, given the number of GitHub stars, the ongoing work, the continuing resolution of pull requests, and the amount of time it’s been in use, I’d say it’s proven.