Conquering the clutter: Data Civilizer can sift through heaps of information

The Digital Self: We need laws that empower consumers in the face of big data
Big data is a big deal. With these huge data sets, analysts can gain unprecedented insight into the hidden patterns of fields like physics, healthcare, and finance. Collecting and analyzing this data has become a relatively easy part of the process. Aggregating and organizing it all has proven to be more difficult.

“An oft-cited statistic is that data scientists spend 80 percent of their time finding, preparing, integrating, and cleaning data sets,” Dong Deng, a postdoctorate associate at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL), told Digital Trends. “The remaining 20 percent is spent doing the desired analytic tasks.” Deng suspects that 80 percent may even be a low-ball estimate, citing Mark Schreiber, a data officer from Merck, who claimed his data scientists spend 98 percent of their time on “grunt work.”

To minimize this grunt work and help conquer the clutter of big data, Deng created a system called Data Civilizer along with a team of researchers form CSAIL, the Technical University of Berlin, Nanyang Technological University, the University of Waterloo, and the Qatar Computing Research Institute.

To tame data, the system requires the information be arranged in tables. From there, the system analyzes every column in each table to create a statistical summary of the individual columns, such as the range of values or most frequently occurring words. It then compares each column summary to find similar ranges or sets of words and develops a map to represent the connections.

“Data Civilizer helps users discover interesting data, stitch together relevant data from multiple sources, clean the desired data, and output it to the recipient,” Deng said.

Deng and his team are working to make Data Civilizer into a more scalable module, which means refining the system to include more automated functions. “Data cleaning cannot be a manual process,” he said, “because that will not scale. Hence, we are investigating semi-supervised algorithms for more scalable data cleaning.”

The team is also planning a more approachable user interface that can be used easily by non-programmers. They expect their system to be available sometime in 2017.

Emerging Tech

Gorgeous images show storms and cloud formations in the atmosphere of Jupiter

NASA's Juno mission arrived at Jupiter in 2016 and has been collecting data since then. NASA has shared an update on the progress of the mission as it reaches its halfway point, releasing stunning images of the planet as seen from orbit.
Digital Trends Live

Digital Trends Live: Passenger spacecraft, Facebook data breach, and more

This episode of Digital Trends Live saw the fourth launch of passenger spacecraft SpaceShipTwo, and a Facebook data breach. We were also joined by the founder of Forter, Michael Reitblat, to discuss how to keep your data safe.
Emerging Tech

Parker Solar Probe captures first image from within the atmosphere of the sun

NASA has shared the first image from inside the atmosphere of the sun taken by the Parker Solar Probe. The probe made the closest ever approach to a star, gathering data which scientists have been interpreting and released this week.
Computing

Windows 10 user activity logs are sent to Microsoft despite users opting out

Windows 10 Privacy settings may not be enough to stop PCs from releasing user activity data to Microsoft. Users discovered that opting out of having their data sent to Microsoft does little to prevent it from being released.
Emerging Tech

Postmates’ to roll out Minion-like autonomous delivery robots in 2019

Postmates is about to employ a cute little robot to work alongside its human delivery personnel. Called Serve, the wheel-based bot can carry items weighing up to 50 pounds and has a range of 30 miles.
Emerging Tech

The best drone photos from around the world

Most of today's drones come equipped with high-end cameras, which are quickly revolutionizing the world of aerial photography as we know it. Here are some of the best drone photos from around the world.
Emerging Tech

Rise of the Machines: Here’s how much robots and A.I. progressed in 2018

2018 has generated no shortage of news, and the worlds of A.I. and robotics are no exception. Here are our picks for the most exciting, game changing examples of both we saw this year.
Emerging Tech

Are e-cigarettes safe? Here’s what the most recent science says

Ecigarettes are widely regarded and advertised as a healthier alternative to cigarettes for people who are trying to kick the smoking habit. How safe are these cigarette alternatives? We went deep into the recent scientific literature to…
Emerging Tech

Thrill-seekers will be able to pilot themselves in a giant drone as soon as 2019

Want to hitch a ride on a giant drone? The startup Lift Aircraft is gearing up to let paying customers fly its 18-rotor giant drones over assorted scenic landscapes across the U.S.
Emerging Tech

CRISPR gene therapy regulates hunger, staves off severe obesity in mice

Researchers from UC San Francisco have demonstrated how CRISPR gene editing can be used to prevent severe obesity in mice, without making a single edit to the mouse's genome. Here's how.
Emerging Tech

Capture app saves money by 3D scanning objects using iPhone’s TrueDepth camera

Capture is a new iPhone app created by the Y Combinator-backed startup Standard Cyborg. It allows anyone to perform 3D scans of objects and share them with buddies. Here's how it works.
Emerging Tech

Sick of walking everywhere? Here are the best electric skateboards you can buy

Thanks for Kickstarter and Indiegogo, electric skateboards are carving a bigger niche than you might think. Whether you're into speed, mileage, or something a bit more stylish, here are the best electric skateboards on the market.
Emerging Tech

Say cheese: InSight lander posts a selfie from the surface of Mars

NASA's InSight mission to Mars has commemorated its arrival by posting a selfie. The selfie is a composite of 11 different images which were taken by one of its instruments, the Instrument Deployment Camera.
Emerging Tech

Researchers create a flying wireless platform using bumblebees

Researchers at the University of Washington have come up with a novel way to create a wireless platform: using bumblebees. As mechanical drones' batteries run out too fast, the team made use of a biology-based solution using living insects.