Skip to content
Menu
machinelearning.to
  • Home
  • Contact
machinelearning.to

Data Collection: Questions to Ask

What is “good” data?

  • Defined consistently (definition of labels y is unambiguous)
  • Cover of important cases (good coverage of inputs x)
  • Has timely feedback from production data (distribution covers data drift and concept drift)
  • Sized appropriately

What kind of problem are we trying to solve?

What data sources already exist?

What privacy concerns are there?

Is the data public?

Where should we store the data?

Status: Online

All pages will be updated and added to, thank you for your patience!

Categories

Quick Links:

  • ML Tutorials
  • ML Everyday Challenge – Anjum Ismail
  • ML Discussions
  • ML Applications
  • ML News
  • ML Ops
  • ML Books
  • ML Careers
  • ML Researchers
  • ML Podcasts
  • ML Papers
  • ML Domains
  • ML Ethics
  • ML Certificate Programs
  • ML Degree Programs

Recent Posts:

  • Tutorials: Towards AI – Machine Learning Fundamentals
  • Tutorial: KDnuggets – Retraining the Model
  • Tutorial: Siddhardhan – Machine Learning Models
  • Tutorial: Siddhardhan – Machine Learning Projects
  • Tutorial: Siddhardhan – Python Basics for Machine Learning

RSS arxiv.org Computer Science – ML RSS Feed

  • Weakly-Supervised Questions for Zero-Shot Relation Extraction. (arXiv:2301.09640v1 [cs.CL])
  • Topogivity: A Machine-Learned Chemical Rule for Discovering Topological Materials. (arXiv:2202.05255v3 [cond-mat.mtrl-sci] UPDATED)
  • Robustness through Data Augmentation Loss Consistency. (arXiv:2110.11205v3 [cs.LG] UPDATED)
  • Neyman-Pearson Multi-class Classification via Cost-sensitive Learning. (arXiv:2111.04597v2 [stat.ML] UPDATED)
  • On the Tradeoff between Energy, Precision, and Accuracy in Federated Quantized Neural Networks. (arXiv:2111.07911v3 [cs.LG] UPDATED)

RSS arxiv.org Statistics – ML RSS Feed

  • Flexible conditional density estimation for time series. (arXiv:2301.09671v1 [stat.ME])
  • A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning. (arXiv:2009.01797v3 [cs.LG] UPDATED)
  • Weighted Sum-Rate Maximization With Causal Inference for Latent Interference Estimation. (arXiv:2211.08327v3 [cs.IT] UPDATED)
  • Granger Causal Chain Discovery for Sepsis-Associated Derangements via Continuous-Time Hawkes Processes. (arXiv:2209.04480v3 [stat.AP] UPDATED)
  • Incorporating functional summary information in Bayesian neural networks using a Dirichlet process likelihood approach. (arXiv:2207.01234v2 [cs.LG] UPDATED)

Sites We Like:

  • madewithml
  • Mr. Daniel Bourke
  • Tech with Tim
  • https://pythonprogramming.net
  • geeksforgeeks
  • mlexpert
  • Chip Huyen
  • /r/MachineLearning
  • /r/LearnMachineLearning
  • machinelearningmastery
  • paperswithcode
  • towardsai
  • kdnuggets
  • Analytics Vidhya
  • William Rinehart – Resource DB

YouTube Channels We Like

  • Sentdex
  • freeCodeCamp.org
  • Clément Mihailescu
  • Tech With Tim
  • 3Blue1Brown
  • Aaron Jack
  • Statquest with Josh Starmer
  • Ken Jee
  • Daniel Bourke
  • DeepLearningAI
  • Mike Dane
  • Khan Academy
  • Keith Galli
  • Lex Fridman
  • Professor Leonard
  • Part Time Larry
  • Jon Krohn
  • Tübingen Machine Learning
  • Shai Ben-David
  • Krish Naik

Help support this site:

Buy me a coffee

©2023 machinelearning.to