Data Engineers: How do you promote your open-source tools?
News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.
What's your experience growing an open-source project?
A subreddit for everything open source related (for this context, we go off the definition of open source here http://en.wikipedia.org/wiki/Open_source)
How Do You Handle Data Quality in Spark?
News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.
If you love Spark but hate PyDeequ – check out SparkDQ (early but promising)
If you love Spark but hate PyDeequ – check out SparkDQ (early but promising)
Articles and discussion regarding anything to do with Apache Spark.
PyDeequ frustrated me — so I built SparkDQ (feedback wanted!)
A subreddit for everything open source related (for this context, we go off the definition of open source here http://en.wikipedia.org/wiki/Open_source)
I built a PySpark data validation framework to replace PyDeequ — feedback welcome
The official Python community for Reddit! Stay up to date with the latest news, packages, and meta information relating to the Python programming language. --- If you have questions or are new to Python use r/LearnPython
Goodbye PyDeequ: A new take on data quality in Spark
News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.