Previously, I’ve posted other social media data compilations. Today, we will focus on the world’s most popular forum site, Reddit.

Image for post
Image for post

This guide will introduce the top 10 Reddit datasets for machine learning.

Known as “the front page of the internet,” Reddit is a forum/social media site where users can post virtually anything and everything. Unlike Facebook, Twitter, or Instagram, the majority of Reddit users remain anonymous. Reddit moderators strictly censor and curate the subforums, known as subreddits. However, anonymity allows people to say what they want in whatever manner they wish. Therefore, Reddit comments and posts are perfect for testing and training numerous natural language processing (NLP) models. Some of these models include content moderation models and sentiment classifiers.

Best Reddit Datasets for Machine Learning

Warning: Some of the datasets below were compiled specifically for the training of content moderation models. Therefore, the data may include explicit content. …


Many of these datasets have been made public to allow people to contribute and add valuable insight into the way the climate is changing and its causes.

Image for post
Image for post

Data is a central piece of the climate change debate. With the climate change datasets on this list, many data scientists have created visualizations and models to measure and track the change in surface temperatures, sea ice levels, and more.

We hope this collection provides you with a jumping off point to use your skills to contribute to one of the biggest and most important challenges of our time.

Global Climate Change Datasets

1. Berkeley Earth Surface Temperature Data — From the Berkeley Earth Data page, this dataset in made up or temperature recordings from the Earth’s surface.


Image for post
Image for post

The Promised Neverland is a must-watch anime that every anime fan needs to check out. Below, I have compiled a list of the 5 best places where you can watch The Promised Neverland online. I will also tell you guys where you can purchase a Blu-ray set to enjoy this anime in the best way possible.

About The Anime

The Promised Neverland is a shounen anime that gives us a beautiful blend of horror and mystery. With stunning animation and a well-built plot, The Promised Neverland has managed to win the hearts of many fans all over the world. This suspense-filled masterwork follows the story of a group of orphans who live a relatively happy life in Grace Field house. The caretaker in this orphanage is a kind and loving woman named Isabella whom the children refer to as mama. …


Studies have shown that self-agreement checks are as important or even more important than inter-annotator agreement when evaluating your annotation team for quality.

Image for post
Image for post

Finding, creating, and annotating training data is one of the most intricate and painstaking tasks in machine learning (ML) model development. Many crowdsourced data annotation solutions often employ inter-annotator agreement checks to make sure their labeling team understands the labeling tasks well and is performing up to the client’s standards. However, some studies have shown that self-agreement checks are as important or even more important than inter-annotator agreement when evaluating your annotation team for quality.

In this article, we will explain what self-agreement is and introduce an ML study where self-agreement checks were crucial to the quality of the team training data and the accuracy of their model. …


Data Science, Machine Learning

Image for post
Image for post

Looking for information on the different image annotation types? In the world of AI and machine learning, data is king. Without data, there can be no data science. For AI developers and researchers to achieve the ambitious goals of their projects, they need access to enormous amounts of high-quality data. In regards to image data, one major field of machine learning that requires large amounts of annotated images is computer vision.

Table of Contents

  1. What is Computer Vision?
  2. What is Image Annotation?
  3. Common Image Annotation Types
  4. 2D Bounding Boxes
  5. 3D Bounding Boxes / Cuboids
  6. Polygons
  7. Lines and Splines
  8. Semantic Segmentation

Don’t have time to read the entire article?


Image for post
Image for post

Product categorization/product classification is the organization of products into their respective departments or categories. As well, a large part of the process is the design of the product taxonomy as a whole.

Product categorization was initially a text classification task that analyzed the product’s title to choose the appropriate category. However, numerous methods have been developed which take into account the product title, description, images, and other available metadata. The following papers on product categorization represent essential reading in the field and offer novel approaches to product classification tasks.

1. Don’t Classify, Translate

In this paper, researchers from the National University of Singapore and the Rakuten Institute of Technology propose and explain a novel machine translation approach to product categorization. The experiment uses the Rakuten Data Challenge and Rakuten Ichiba datasets. Their method translates or converts a product’s description into a sequence of tokens which represent a root-to-leaf path to the correct category. Using this method, they are also able to propose meaningful new paths in the taxonomy. …


Image for post
Image for post

Reviewing the bHaptics suit for Oculus Quest

Haptic suits represent the next step towards true immersion in virtual reality gaming. Virtual reality works by establishing a space that can stimulate our senses enough to create the illusion of being in a different world. The current VR headsets on the market create this illusion by stimulating our sense of sight (through 6DoF visuals) and our sense of hearing (through binaural 3D audio), along with slight vibration feedback from the controllers.

However, with the emergence of haptic feedback accessories, VR games now have the ability to activate a third human sense, our sense of touch.

What is Haptic Feedback?

Put simply, haptic feedback is the use of vibration to convey another layer of information to the user. In VR, we can use haptic feedback in the controllers to simulate things like the kick from firing a pistol or the impact of hitting an enemy with your fists. When we start looking at top-of-the-line accessories that can generate more sophisticated forms of haptic feedback, the amount of immersion VR games provide can increase drastically. …


Image for post
Image for post

Many data scientists claim that around 80% of their time is spent on data preprocessing, and for good reason; collecting, annotating, and formatting data are crucial tasks in machine learning. This article will help you understand the importance of these tasks, as well as learn methods and tips from other researchers.

Below, we will highlight academic papers from reputable universities and research teams on various training data topics. The topics include the importance of high-quality human annotators, how to create large datasets in a relatively short time, ways to securely handle training data that may include private information, and more.

1. How Important are Human Annotators?


Image for post
Image for post

Weathering With You (Tenki No Ko) is the latest of Makoto Shinkai’s impressive anime films and many fans consider it as good, or even better than, Your Name. If you are a Shinkai fan looking to watch his latest film, there are a variety of online platforms where you can stream the film straight to your computer or mobile devices. This article will introduce some of best places to watch Weathering With You online, as well as where you can buy the dvd or blu-ray.

Please note: This is NOT an article about where you can watch Weathering With You for free. …


Image for post
Image for post

With AI often thrown around as a buzzword in business circles, people often forget that machine learning is a means to an end, rather than an end in itself. For most companies, building an AI is not your true goal. Instead, AI implementation can provide you with the tools to meet your goals, be it better customer service through an intuitive chatbot or streamlining video production through synthetic voiceovers.

To help shed light on some real-world applications of machine learning, this article introduces five innovative AI software that you should keep on eye on throughout 2020.

1. Scanta

About

Limarc Ambalina

Owner of Jpbound.com, Editor at Hackernoon.com, Content Writer for ZenMarket.jp and Lionbridge.ai | Specializing in AI, tech, VR, and pop culture.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store