Sat.Aug 17, 2019 - Fri.Aug 23, 2019

article thumbnail

A High Performance Platform For The Full Big Data Lifecycle

Data Engineering Podcast

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully integrated platform to meet the needs of enterprise grade analytics it provides a solution for the full lifecycle of data at massive scale.

Big Data 100
article thumbnail

Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch

KDnuggets

Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs.

Coding 123
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

As Uber Freight marked its second anniversary, we went back to the drawing board to redesign its app. The original carrier app was successful for owner-operators with one or two drivers, but it wasn’t optimized for larger fleets—feedback we … The post Building the New Uber Freight App as Lists of Modular, Reusable Components appeared first on Uber Engineering Blog.

Building 110
article thumbnail

Applying Netflix DevOps Patterns to Windows

Netflix Tech

Baking Windows with Packer By Justin Phelps and Manuel Correa Customizing Windows images at Netflix was a manual, error-prone, and time consuming process. In this blog post, we describe how we improved the methodology, which technologies we leveraged, and how this has improved service deployment and consistency. Artisan Crafted Images In the Netflix full cycle DevOps culture the team responsible for building a service is also responsible for deploying, testing, infrastructure, and operation of t

AWS 83
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

How We Reduced DynamoDB Costs by Using DynamoDB Streams and Scans More Efficiently

Rockset

Many of our users implement operational reporting and analytics on DynamoDB using Rockset as a SQL intelligence layer to serve live dashboards and applications. As an engineering team, we are constantly searching for opportunities to improve their SQL-on-DynamoDB experience. For the past few weeks, we have been hard at work tuning the performance of our DynamoDB ingestion process.

Bytes 52
article thumbnail

Is Kaggle Learn a “Faster Data Science Education?”

KDnuggets

Kaggle Learn is "Faster Data Science Education," featuring micro-courses covering an array of data skills for immediate application. Courses may be made with newcomers in mind, but the platform and its content is proving useful as a review for more seasoned practitioners as well.

Education 114

More Trending

article thumbnail

Data is Not the New Oil. Data is Water!

Teradata

If you work in data analytics or a related field, you’ve probably heard the mantra that data is the new oil. But data is not oil, it's water. Find out why.

Data 45
article thumbnail

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

As Uber Freight marked its second anniversary, we went back to the drawing board to redesign its app. The original carrier app was successful for owner-operators with one or two drivers, but it wasn’t optimized for larger fleets—feedback we … The post Building the New Uber Freight App as Lists of Modular, Reusable Components appeared first on Uber Engineering Blog.

article thumbnail

Top Handy SQL Features for Data Scientists

KDnuggets

Whenever we hear "data," the first thing that comes to mind is SQL! SQL comes with easy and quick to learn features to organize and retrieve data, as well as perform actions on it in order to gain useful insights.

SQL 111
article thumbnail

The Kafka Connect Plugin for Rockset and How It Works

Rockset

Rockset continuously ingests data streams from Kafka, without the need for a fixed schema, and serves fast SQL queries on that data. We created the Kafka Connect Plugin for Rockset to export data from Kafka and send it to a collection of documents in Rockset. Users can then build real-time dashboards or data APIs on top of the data in Rockset. This blog covers how we implemented the plugin.

Kafka 40
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Building Transactional Systems Using Apache Kafka

Confluent

Traditional relational database systems are ubiquitous in software systems. They are surrounded by a strong ecosystem of tools, such as object-relational mappers and schema migration helpers. Relational databases also provide strong guarantees in the form of ACID transactions, which are loved by developers for their all-or-nothing semantics. Today’s businesses, however, want to process ever-increasing amounts of data.

Kafka 22
article thumbnail

Teradata Earns Spot (Again x2!) on Constellation ShortList for Hybrid Cloud

Teradata

Teradata is named yet again to the Constellation ShortList™ for “Hybrid and Multi-Cloud Relational Database Management Systems." Read more!

Cloud 15
article thumbnail

Order Matters: Alibaba’s Transformer-based Recommender System

KDnuggets

Alibaba, the largest e-commerce platform in China, is a powerhouse not only when it comes to e-commerce, but also when it comes to recommender systems research. Their latest paper, Behaviour Sequence Transformer for E-commerce Recommendation in Alibaba, is yet another publication that pushes the state of the art in recommender systems.

Systems 103
article thumbnail

Optimizing Bulk Load in RocksDB

Rockset

What’s the fastest we can load data into RocksDB? We were faced with this challenge because we wanted to enable our customers to quickly try out Rockset on their big datasets. Even though the bulk load of data in LSM trees is an important topic, not much has been written about it. In this post, we’ll describe the optimizations that increased RocksDB’s bulk load performance by 20x.

Bytes 40
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

A Guide to the Confluent Verified Integrations Program

Confluent

When it comes to writing a connector, there are two things you need to know how to do: how to write the code itself, and helping the world know about your new connector. This post specifically outlines the process by which we verify partner integrations, and is a means of letting the world know about our partner’s contributions to our connector ecosystem.

article thumbnail

Detecting stationarity in time series data

KDnuggets

Explore how to determine if your time series data is generated by a stationary process and how to handle the necessary assumptions and potential interpretations of your result.

Data 102
article thumbnail

Understanding Decision Trees for Classification in Python

KDnuggets

This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.

Python 98
article thumbnail

An Overview of Python’s Datatable package

KDnuggets

Modern machine learning applications need to process a humongous amount of data and generate multiple features. Python’s datatable module was created to address this issue. It is a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum possible speed.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Deep Learning for NLP: Creating a Chatbot with Keras!

KDnuggets

Learn how to use Keras to build a Recurrent Neural Network and create a Chatbot! Who doesn’t like a friendly-robotic personal assistant?

article thumbnail

Gender Diversity in AI Research

KDnuggets

Through an analysis of 1.5M papers from arXiv, this study reviews the evolution of gender diversity across disciplines, countries, and institutions as well as the semantic differences between AI papers with and without female co-authors.

89
article thumbnail

Proptech and the proper use of technology for house sales prediction

KDnuggets

Using the ATTOM dataset, we extracted data on sales transactions in the USA, loans, and estimated values of property. We developed an optimal prediction model from correlations in the time and status of ownership as well as the time of the year of sales fluctuations.

article thumbnail

Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data?

KDnuggets

What’s the best way to execute your data integration tasks: writing manual code or using ETL tool? Find out the approach that best fits your organization’s needs and the factors that influence it.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Automate Stacking In Python: How to Boost Your Performance While Saving Time

KDnuggets

Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.

Python 86
article thumbnail

Comparing Decision Tree Algorithms: Random Forest vs. XGBoost

KDnuggets

Check out this tutorial walking you through a comparison of XGBoost and Random Forest. You'll learn how to create a decision tree, how to do tree bagging, and how to do tree boosting.

article thumbnail

Which skills / knowledge areas do you currently have, and which do you want to add or improve?

KDnuggets

New KDnuggets survey looks to find out what skills our readers currently use, and which they are looking to add or improve. Take a few minutes to participate.

article thumbnail

Math for Programmers

KDnuggets

Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.

article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

Crafting an Elevator Pitch for your Data Science Startup

KDnuggets

If you are launching a data science startup, these tips will give you a head start as you seek capital for seed funding or your next level of growth.

article thumbnail

eBook: How to Enhance Privacy in Data Science

KDnuggets

Check out this eBook, How to Enhance Privacy in Data Science, to equip yourself with the tools to enhance privacy in data science, including transforming data in a manner that protects the privacy, an overview of the challenges and opportunities of privacy-aware analytics, and more.

article thumbnail

Artificial Intelligence Is Not Intelligence – Interview With Andy Cotgreave (Keynote Speaker at Crunch Conf)

KDnuggets

Crunch is coming to Budapest, Hungary on 16-18 Oct. Use code KDNuggets to save on Data Science, Data Engineering, or BI tracks. But first, read this interview with keynote speaker Andy Cotgreave.

BI 70
article thumbnail

Lincoln Clean Energy: Director, Asset Performance [Austin, TX]

KDnuggets

Seeking an Asset Performance Director, a role which requires an individual that possesses a strong technical skill set and the ability to communicate findings effectively throughout the organization.

61
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.