Sat.Nov 12, 2022 - Fri.Nov 18, 2022

article thumbnail

Introduction to Pandas for Data Science

KDnuggets

The Pandas library is core to any Data Science work in Python. This introduction will walk you through the basics of data manipulating, and features many of Pandas important features.

article thumbnail

Who is Still Hiring Software Engineers and EMs?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe here. This article was updated in December 2022. In the midst of gloomy news about hiring freezes and layoffs, let's highlight companies which are growing  and hiring.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

A Diatribe against Data Contracts and their Abuses.

Confessions of a Data Guy

Ok, so I don’t really mean all that. Or do I? I have no idea what the future holds. Sometimes it’s easy to pick out the winners, like Databricks and Snowflake, you can see, feel, and taste the results of those data products, a delicious and delectable bounty to feast upon. Other things are harder […] The post A Diatribe against Data Contracts and their Abuses. appeared first on Confessions of a Data Guy.

Data 130
article thumbnail

Enabling The People, Enabling The Data with Kulani Likotsi

Jesse Anderson

My guest this week is Kulani Likotsi , the Head of Data Management and Data Governance at one of the four biggest banks in Africa. She’s had a rising career journey going from an analyst, to a Business Intelligence developer, to the data warehouse team, to the data governance team. I was impressed with Kulani’s volunteer spirit. Whenever there was a need, she volunteered.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

If I Had To Start Learning Data Science Again, How Would I Do It?

KDnuggets

While different ways to learn Data Science for the first time exist, the approach that works for you should be based on how you learn best. One powerful method is to evolve your learning from simple practice into complex foundations, as outlined in this learning path recommended by a physicist who turned into a Data Scientist.

article thumbnail

The Scoop: Tech Layoffs in 2022

The Pragmatic Engineer

I get a lot of scoop sent by readers (thank you!). Sadly, in 2022, a good part of the scoop is about companies laying off people. Some of this scoop has not been reported before. I don't want to broadcast layoffs on Twitter or LinkedIn continuously, but also don't want this information to be lost. This page collects scoops I receive, some of which might not have been reported elsewhere.

More Trending

article thumbnail

For your eyes only: improving Netflix video quality with neural networks

Netflix Tech

by Christos G. Bampis , Li-Heng Chen and Zhi Li When you are binge-watching the latest season of Stranger Things or Ozark, we strive to deliver the best possible video quality to your eyes. To do so, we continuously push the boundaries of streaming video quality and leverage the best video technologies. For example, we invest in next-generation, royalty-free codecs and sophisticated video encoding optimizations.

Media 120
article thumbnail

Git for Data Science Cheatsheet

KDnuggets

Knowing git is no longer an option for data professionals. Grab this handy reference sheet now and make sure you know how to git the job done.

article thumbnail

Doing More with Less: 5 Ways Leading Organizations Maximize the Value of their Data

Teradata

"Doing more with less” is a familiar refrain echoing through the halls of many organizations. To answer this call, businesses are searching for efficiency gains & turning to data to unlock savings.

Data 98
article thumbnail

Taking A Look Under The Hood At CreditKarma's Data Platform

Data Engineering Podcast

Summary CreditKarma builds data products that help consumers take advantage of their credit and financial capabilities. To make that possible they need a reliable data platform that empowers all of the organization’s stakeholders. In this episode Vishnu Venkataraman shares the journey that he and his team have taken to build and evolve their systems and improve the product offerings that they are able to support.

MongoDB 100
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Helping VFX studios pave a path to the cloud

Netflix Tech

By: Peter Cioni (Netflix), Alex Schworer (Netflix), Mac Moore (Conductor Tech.), Rachel Kelley (AWS), Ranjit Raju (AWS) Rendering is core to the the VFX process VFX studios around the world create amazing imagery for Netflix productions. Nearly every show that is produced today includes digital visual effects, from the creatures in Stranger Things , to recreating historic London in Bridgerton.

Cloud 116
article thumbnail

What To Expect for AI Quality Trends In 2023

KDnuggets

Based on the recent discussions with dozens of Fortune 500 data science teams, we can expect to see a continued spotlight on AI model quality in 2023.

article thumbnail

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

As the leading financial institution of Pakistan, Habib Bank Limited (HBL) is at the forefront of all development initiatives which includes growth of priority sectors and targeting the unbanked population in the country. HBL remains committed to its objective of client centric innovation and financial inclusion for all segments of society. . HBL was the first Pakistani commercial bank to be established in Pakistan in 1947.

Banking 85
article thumbnail

Write What You Know: Turning Your Apache Kafka® Knowledge into a Technical Talk

Confluent

The call for papers for Kafka Summit London 2023 has opened, and we’re looking to hear about your experiences using and working with Kafka. If you’re stuck looking for ideas on what to talk about, write what you know.

Kafka 83
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Vulnerability Management at Lyft: Enforcing the Cascade [Part 1]

Lyft Engineering

Vulnerability Management at Lyft: Enforcing the Cascade - Part 1 Converting container scan data into tickets, linked with automated pull requests Abstract Over the past 2 years, we’ve built a comprehensive vulnerability management program at Lyft. This blog post will focus on the systems we’ve built to address OS and OS-package level vulnerabilities in a timely manner across hundreds of services run on Kubernetes.

article thumbnail

Research Papers for NLP Beginners

KDnuggets

Read research papers on neural models, word embedding, language modeling, and attention & transformers.

Process 159
article thumbnail

#Clouderalife Volunteer Spotlight: Glaucia Esppenchutz

Cloudera

Cloudera’s November Volunteer Spotlight is Glaucia Esppenchutz , staff data engineer, based in Lisbon, Portugal. . Glaucia volunteers with Free Code Camp , an organization founded in 2014 that helps aspiring technicians learn to code for free. . Through the creation and publication of videos, articles, and interactive coding lessons — all freely available to the public — Free Code Camp is able to reach and train millions of people annually.

Coding 85
article thumbnail

Move faster, wait less: Improving code review time at Meta

Engineering at Meta

Code reviews are one of the most important parts of the software development process At Meta we’ve recognized the need to make code reviews as fast as possible without sacrificing quality We’re sharing several tools and steps we’ve taken at Meta to reduce the time waiting for code reviews When done well, code reviews can catch bugs , teach best practices , and ensure high code qualit y.

Coding 56
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

DataOps Observability: Taming the Chaos (Part 3)

DataKitchen

Part 3: Considering the Elements of Data Journeys. This is the third post in DataKitchen’s four-part series on DataOps Observability. Observability is a methodology for providing visibility of every journey that data takes from source to customer value across every tool, environment, data store, team, and customer so that problems are detected and addressed immediately.

article thumbnail

7 SQL Concepts You Should Know For Data Science

KDnuggets

The post explains all the key elements of SQL that you must know as a data science practitioner.

article thumbnail

Once Upon a Time in the Land of Data

Cloudera

I recently had the privilege of attending the CDAO event in Boston hosted by Corinium. Tracks represented financial services, insurance, retail and consumer packaged goods, and healthcare. Overall, it struck me that while data science is not new, most firms are still defining the mission of the data office and data officer. It’s clear firms seek to leverage data and embrace its potential insights, but most are forging ahead in largely uncharted territory.

article thumbnail

Artificial Intelligence (AI) in Cloud Computing

U-Next

Introduction . Artificial Intelligence (AI) is a process of programming computers to make decisions for themselves. This technology creates intelligent applications capable of reasoning, learning, and acting independently. Among many things, AI finds innumerable applications in cloud computing. Cloud computing delivers computing services—including servers, storage, databases, networking, software, analytics, and intelligence—over the Internet (“the cloud”) to offer faster innovation

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Question: What is the difference between Data Quality and DataOps Observability?

DataKitchen

. Question: What is the difference between Data Quality and Observability in DataOps? Data Quality is static. It is the measure of data sets at any point in time. Data Observability is dynamic — it is the testing of data, integrated data, and tools acting upon data — as it is processed — that checks for flow rates and data errors.

Data 52
article thumbnail

How LinkedIn Uses Machine Learning To Rank Your Feed

KDnuggets

In this post, you will learn to clarify business problems & constraints, understand problem statements, select evaluation metrics, overcome technical challenges, and design high-level systems.

article thumbnail

Unlocking HBase on S3 With the New Store File Tracking Feature

Cloudera

CDP Operational Database (COD) is a real-time auto-scaling operational database powered by Apache HBase and Apache Phoenix. It is one of the main data services that run on Cloudera Data Platform (CDP) Public Cloud. You can access COD from your CDP console. The cost savings of cloud-based object stores are well understood in the industry. Applications whose latency and performance requirements can be met by using an object store for the persistence layer benefit significantly with lower cost of o

article thumbnail

How Does AI Aid in Creating Sound Business Strategies?

U-Next

Introduction . The usage of AI technology has been on the rise in the business world, especially when it comes to creating business strategies. . Artificial Intelligence (AI) and Machine Learning are currently used by businesses to make their operations more efficient, improve customer experience and achieve better results. As per Artificial Intelligence Statistics 2022 , AI adoption by businesses around the globe continued at a steady pace in 2022, with more than a third of companies (35%) re

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How Often Does the Best Team Win the Title?

Elder Research

The post How Often Does the Best Team Win the Title? appeared first on Elder Research.

52
article thumbnail

9 Free Resources to Master Python

KDnuggets

Python is the most popular general-purpose language and you can learn it for free.

Python 145
article thumbnail

Enriching Streams with Hive tables via Flink SQL

Cloudera

Introduction. Stream processing is about creating business value by applying logic to your data while it is in motion. Many times that involves combining data sources to enrich a data stream. Flink SQL does this and directs the results of whatever functions you apply to the data into a sink. Business use cases, such as fraud detection , advertising impression tracking, health care data enrichment, augmenting financial spend information, GPS device data enrichment, or personalized customer commun

SQL 59
article thumbnail

Snowflake SSO Login with Azure Active Directory

Cloudyard

Read Time: 3 Minute, 0 Second SSO Login with Azure Active Directory: During this post we will discuss configure SSO (single sign-on) to connect with Snowflake via Azure Active Directory. With SSO enabled, your users authenticate through an external, SAML 2.0-compliant identity provider (IdP). Once authenticated by this IdP, users can securely initiate one or more sessions in Snowflake for the duration of their IdP session without having to log into Snowflake.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m