Data Cleaning, The Make it or Break it Step to BI Data Analytics

So, your company is planning to use predictive analytics to climb ahead of the market? Smart move.

But before you start to play with those shiny machine learning algorithms, you need to collect and clean up your data. 

It’s not everyone’s favorite step…I mean who really likes cleaning? In fact, 60% of data scientists view data preparation and data cleaning as the least enjoyable part of their work. 

But data cleaning is vital to effective data analytics. 

Looking for a partner to help with your data clean up or data analytics


Before we dive into the ins and outs of good data cleanup, we have to ask…

What can predictive analytics do for your business? 

Powered by machine learning, predictive analytics provides actionable insights. Armed with this knowledge, decisions can drive business growth and grow customer loyalty.

With effective data, an organization can:

  • upsell to customers
  • predict industry trends
  • implement strategies for employee satisfaction
  • understand customer feedback

…and much more.

Predictive Analytics in Practice

Have you ever gone to the grocery store, for one thing, a loaf of bread…only to leave $100 later with shopping cart full of food?

It’s not luck that you walk past the peanut butter and jelly from the bread aisle to the checkout. Because of predictive analytics, grocery stores understand consumer buying patterns. By organizing the store based on these patterns, you are more likely to make a spontaneous purchase.

If this causes most customers to buy even one unintended item, this an account for major revenue.

The practice has been in place for years. But with machine learning, companies can be more strategic than ever.

  • Streaming services use analytics to recommend new songs or shows you might like. 
  • Analytics prompts those crazy specific adds that pop up on your social media. 
  • Predictive analytics even plays cupid, suggesting your matches on online dating apps. 

The use-cases are endless. But without clean data, it does not matter how advanced your machine learning algorithms are. Without clean data, predictive analytics is useless. 

You Can’t Have Good Analytics without Data Cleaning

Cleaning your data is a crucial step to prepare data for analytics. Did you know data preparation accounts for 80% of a data scientist’s work

Why? Unnecessary noise in a dataset can lead to the wrong conclusions. So you do not only have inaccurate information but wasted time. 

Presenting inaccurate data to upper management will cause them to lose faith in your analytics. 

Knowing how powerful good insights can be, we want to make sure we have accurate data to fuel them.  

But what constitutes clean data? Clean Data is:  

  • Accurate 
  • Complete 
  • Consistent 
  • Valid 

Let’s explore this idea more with our grocery store example:

Each of your purchases gives stores invaluable data about your buying patterns. That data, combined with data from hundreds of other customers and stores make for some pretty accurate insights.

But…let’s say you have data from two major US-based grocery store chains, Kroger and Publix. Kroger may label a loaf of white bread as whitebread001, while Publix calls it bread-white1.

While us humans understand that these labels refer to similar items, a computer might not.

Of course, you can put a team together to sort through the data. But think of how many people buy bread each year at Kroger’s 2,800 stores or Publix’s 1,200.

That’s a lot of data…and manually sorting through it all is not an effective use of time.

But that is where machine learning comes in. You don’t need to wait until the data is clean to leverage this powerful tool. Engineers can write machine learning algorithms to sort through the data.

Want to know more about preparing your data? We have a post that goes over the ins and outs to a good extract, transform, load (ETL) solution

Things to Keep in Mind During Data Cleaning: 

So, what do you need to do to ensure your data is clean and ready to analyze? 


Fill in missing data

You may find that there are gaps in your data or one dataset includes a variable that another doesn’t. In this case, it is important to fill in any missing information to ensure your data is complete and insights accurate. 

Filter out data that you don’t need 

Not only will this step make it easier for you to navigate through your data, it saves processing time. In the age of big data–think thousands of terabytes or petabytes–this is vital. 


Eliminate duplicates 

Do not let a record’s sneaky evil twin skew your analytics. 

Clean up data  

For qualitative data, you will want to remove punctuation, special characters, transform all data to lowercase, etc. 


Standardize naming 

In the example we mentioned earlier, we showed you the issue of different naming formats. For this step, you will want to ensure data that represents the same thing, has the same name.

Organize your data with clean columns

Create clean column names, this will make analyzing data much easier. For example, change columns labeled “Current STATUS,” to “current_status.” 


Rectify outliers

Data visualization is a powerful tool to help you identify outliers. Run basics descriptives (like range, mean, median, and standard deviation) on quantitative datasets. From there you can identify outliers that might skew your analytics. 

And there you have it! Take your clean data set and let those machine learning algorithms get to work on those business insights! 

If you are working with a microservices architecture, check out this post that explains how microservices handles big data.

Looking for a partner to handle your data clean up or the entire ETL process? KMS engineers are here to help. Learn more about our data analytics offering


Part Three – Story Point Estimation in Scrum: Does it Still Work?

To continue from the second part in our series, let’s dive into the final Pros & Cons of Story Point Estimation:


Pro:  Story Points empower teams to improve quality.

Story Point Estimation gives Dev teams empowerment and control over quality, and that’s something that needs to be preserved no matter how big the engagement or project. A core principle of Scrum is ‘self-managing teams,’ where it is the team who decides if/when there’s a need to slow down in order to meet the Definition of Done. A team may have to slow down in order to improve code quality and lower technical debt (e.g., important defects left open, ignoring code refactor, etc.).

Organizations can use Story Points as a method to track, analyze and challenge areas of uncertainty, which are only growing bigger. DevOps, Automation, CI/CD, IOT and the demand for digital apps continue to accelerate and complicate delivery; defects are becoming more costly, while end-users expect more features faster. Story Points help teams to anticipate Dependencies earlier and budget enough time to improve quality Sprint over Sprint.

Con:  People don’t understand Story Points.

As we mentioned earlier, non-technical stakeholders and managers who came from waterfall mistakenly equate or try to translate Story Points into hours/effort. They see a number, and suddenly it’s a definitive target. Agile says don’t do it, but humans do anyway. The bigger the chunk of work being planned in hours, the deeper the “you-know-what” managers will step in.


Let’s say a manager runs up against a large module and realizes there is a gap in how the backlog was sized across multiple teams. Now teams must get together to re-estimate and agree. Agilists will say, “You can’t do this! Let the team size their own way!” But for the manager, this is a big problem. He/she made a commitment for a release date, but due to varying interpretations of a Story Point across teams, Velocity is off. One team didn’t meet their expectations, and the whole team becomes unstable. The stakeholder doesn’t give a whoop. They just want what was promised.

Truth is, a Story Point is only understood by the team who uses it. Even within a team, Story Points can be objective from one member to the next. A good Release/Epic/Portfolio Manager must consider that every team has their own flavor of technical skills, experiences, infrastructure availability, etc. They must view a Story Point estimate as a common denominator for “good enough”. That’s it.

Pro:  Story Points can help facilitate continuous improvement.

In the scenario described above, teams must get together after every Sprint to inspect and adapt their Estimation process. Scrum is about learning. Adaption. Challenging each other. Get everyone together in a regularly scheduled Retrospective session to look at every aspect of your Estimation processes transparently. Talk about gaps, why they happened, and how the team can get better:

  • What dependencies can be reduced next time to prevent schedule slippage?
  • Did we establish enough feedback loops to be able to respond to change?
  • What cross-functional integration issues should we anticipate next time?
  • How can we leverage collaboration and bottom-up intelligence to improve the accuracy of Story Point Estimation next time?
  • How can we share these learnings with managers and other teams in the organization?

An Example

We had a large customer migrate from a functional waterfall-based team structure to 10+ cross-module Scrum teams. The Board needed to see a Roadmap, Timeline and high-level cost with contingency buffer for the entire MVP backlog across multiple modules. We had the customer pull one development lead representative from each team to form an Estimation Team, which was asked to estimate the MVP backlog into Epics using t-shirt sizes S – M – L – XL. Once the Epics were put into sizes, the Estimation Team picked two sample candidates from each size for the Story Point Estimates.  Son Tang, KMS Sr. Engineering Manager, recommends choosing the simplest – albeit well defined (or concise) – Epics for the sample candidates.

Next, the sample Epics were assigned to the Scrum teams to be worked on. The assignment was randomly done at this point, since no single team was an expert at any particular module or domain. The Estimation Team members went back to their own Scrum teams for Sprint Planning, where each team was encouraged to challenge their ambassador. Is the information accurate? Are there dependencies which he/she missed? And of course, the team had a Retrospective after the Sprint. Each team worked the Epics across two or three sprints with Story Point Estimation. At the end of this, the teams had a clear baseline understanding of their average total Velocity, as well as total Story Points, for the entire backlog. The timeline was calculated from an average Estimation by consensus of the group and used to define baseline Story Points Velocity for the remaining backlog.   

At the end, teams were able to give the executive team more accurate estimates for Timeline, Cost and Story Points capacity within six weeks (three sprint cycles). The six weeks were well worth the wait happy team, happy board, happy customer!

In terms of the managers, Son says it’s best to provide duration of the Timeline to the customer or stakeholder. Stick with the base information, so that inquiries about the Epic can be updated accordingly. You can always add or decrease teams to maintain the right capacity depending on the Timeline, but your approach should remain the same. And again, inspect and adapt any gaps in your Estimation early and often.

So What Do YOU Think?

Have Story Points become too misunderstood/misused to use in enterprise Scrum? Can we break down high-level projects during Epic Estimation in a way that is consistent across many teams, or does scaling Scrum compromise the fundamental values of Agile?

KMS has led many large/global organizations in scaling Scrum. There’s nothing we haven’t seen. If you need help with software development or scaling Story Points for Epic Estimation in your Scrum organization, contact us.


Part Two – Story Point Estimation in Scrum: Does it Still Work?

Epic Estimation is Hard

To continue from the first part in our series, let’s dive into the next Pro & Con of Story Point Estimation:

Pro:  Story Points set a critical baseline for obtaining budget.

Story Points give managers a baseline data point for estimating and calculating Velocity across multiple teams, which everyone knows is a necessary evil if you expect to get funding for any large project.

In waterfall, managers could talk to stakeholders in terms of numbers (hours, days, weeks, months). Everyone could understand numbers. Nothing took less than one hour. When the Internet and Agile came along, hours could no longer be used because so many Tasks could be completed faster than the meeting would take to plan it! On the other hand, some stories have longer Timelines and Dependencies which need to be documented with stakeholders. Story Points give managers a baseline idea of where a story falls on that spectrum, relative to a team’s past experiences, and how they are packaging it.

Our CTO, Kaushal Amin compares Epic Estimation to asking how many boxes (Effort) it would take to handle your move:

Company A tells me they can get the job done in 150 boxes. Company B can do it in 100. On the surface it seems like Company B is much more efficient, but what if Company B is just using bigger boxes? The effort required to do the work could be exactly the same; it’s just that the companies estimate in different ‘bucket sizes’.

At the Program level, you can’t enforce the same bucket sizes across teams. Story Points can, however, help teams break down work to the smallest functional piece so that, over time, high-level planners can learn and understand what bucket size each team is using to estimate.


Con: Story Points imply more than what Scrum intended them to.

That same Pro is also a Con here. Story Points were never meant to become a performance measure. Story Point Estimates are just that, estimates; not forecasts. Yet, they are being used to compare the capacity of different teams against each other.

Scrum never intended to hold teams to a “do-or-die” commitment. Story Points represent a “good enough” guesstimate of the scope involved in a project. If the whole intent of Agility is to allow organizations to respond to change over following a plan, then we’re not getting the full value of Scrum.  


Make sure to check back with us next week for our final post on story point estimation. Part three will cover the pros and cons of story points empower teams to improve quality and that certain roles don’t understand story points from a business perspective.

Partner with KMS for your next software development project.


Part One – Story Point Estimation in Scrum: Does it Still Work?

Do scrum story points still work?We all remember the first time we read The Agile Manifesto. It brought out the “Hell-Yeah, Man!” in people. Its values motivated and empowered teams. It created deeper value for customers and kept the malarkey out of software development. Stuff got done.

Fast-forward to 2018, and “Houston, we have a problem.”

Scrum works, and stuff still gets done, but… Rapid adoption of Agile and Scrum has created a monster for enterprise-level planners responsible for Epic/Portfolio Estimation. The industry took a practice that was created for small teams working closely together in a uniform way – then retrofitted it for the masses.

One Scrum team became multiple teams, or dozens, each with their own understanding of “what is a Story Point?” Today, no two teams have the same experiences, infrastructure or skill sets; thus, no two teams define Story Points the same.

All of a sudden you have high-level managers trying to divvy up the Epic backlog based on 25+ different interpretations of a Story Point. The more teams that are working on the same product, the bigger the mountain is to climb. [Pardon the irony]


What would those 17 guys at Snowbird say?

Organic Scrum would say don’t even try to estimate an Epic backlog because it is effort wasted and full of inaccuracy. Detail will inevitably change, and things will get reprioritized last minute. But tell this to the Epic/Portfolio Manager on the hook to provide a roadmap timeline for funding, and you’ll get the stink eye!

Epic estimation deviates from the days of single Scrum teams when Estimation was super straightforward. Dependencies, available skill sets, and tools to complete the work were rinse-and-repeat. Individual team members knew each other’s capabilities and limitations. When someone said a Story was five points, everyone on the team knew exactly what he/she meant. As Scrum grew legs, the meaning of “five points” grew increasingly fuzzy. The more teams that were added, the harder it was for managers to interpret Story Points. At 10 or 20 Scrum teams… Utter Planning Chaos! If there is no center line for estimation, the whole team can become unstable very quickly.

It raises the question, “Are Story Points still an effective way to estimate Epics in Scrum?”

Some argue yes, you can still use Story Points with the right scaling framework. Others believe enterprise Scrum has wandered too far from Agile principles because the work cannot break down. As Scrum scales up, how can teams establish a consistent standard for estimating Epic backlogs so that high-level managers, stakeholders and other teams are all on the same page?


The Pros and Cons of Story Points

We had a little internal debate and came up with these pros and cons to using Story Points for high-level Estimation in Scrum. Read the list, then send us your feedback on the topic. What do you disagree with? What are we missing?


Pro:  Story Points help break down large backlogs.

Story Points can work well as part of a scaling framework such as SAFe and NEXUS. These frameworks provide scalable estimation techniques for breaking down work at the Epic, Program and Portfolio levels:

  • SAFe looks at the big picture of how work flows from Product Management through Governance, Program teams and Dev teams, and out to customers.
  • NEXUS is a more bottom-up approach that focuses on unifying multiple Scrum teams working on the same product. Through transparency, NEXUS seeks to protect and strengthen connections between teams and keep scaling as uniform as possible.

From a tactical perspective, one technique recommended by coaches is to hold the team who did the Estimate accountable for meeting the Estimate. That same team always works on the same module so that the meaning of Story Points remains clear and consistent from Sprint to Sprint. The downside is that your capacity-per-module within each release is fixed, since teams can’t work across modules. You are locked in with the amount of work that can get completed in a release for each module.


Another approach is to designate one or two representatives from each Scrum to join a formal Estimation session. These ambassadors can make decisions about story sizes without taking the whole team off their jobs. This technique assumes that each representative has knowledge of how their team defines a Story Point, so they can share, discuss and find common ground for estimating stories. The downside here is that each Scrum team has to trust and live with whatever their chosen ambassadors come up with for each story Estimate. Also, you run the risk of ambassadors getting influenced by the Estimation approach of ambassadors from other teams.


Con:  This is not Scrum!

Hardcore Agilists will argue that these techniques do not represent Scrum. If the same team always works on the same module, does this threaten scalability? What about the concept of self-managing teams? From a capacity perspective, limiting a team to one module does not promote scalability. It is no longer scaled agile. From a team perspective, if you only have one representative Estimating in a vacuum, you risk team instability because either (1) the representative did not have the right knowledge about a particular dependency or integration issue, or (2) he/she could become influenced by another team’s experiences which do not match how their own team works. In either case, people fear things are becoming too top-down, causing Velocity gaps and a big ugly mess in the PMO.

Looking for a partner for software development?


Stay tuned for next’s week entry! Part two will cover pros and cons of story points to set a critical baseline for obtaining budget, and whether they imply more than what Scrum intended them to.

Integration Fueling App Modernization

Businesses have plenty of reasons to move on from legacy apps and platforms. In many cases, companies spend heavily on keeping legacy systems up and running, so much so that innovation ends up stifled because maintenance tasks end up dominating development and testing activities. In some instances, security risks become an issue. While both of these problems offer plenty of incentive to update legacy products and platforms, data integration may be an even bigger issue. The problem isn’t simply that legacy apps don’t easily communicate with contemporary solutions. Instead, legacy platforms are often built on underlying architectures that don’t work well in cloud environments, have the ability to support big data, or provide support for modern solutions, such as Blockchain.

With organizations becoming more dependent on interconnected digital ecosystems to support operations, gaps that force users to work exclusively from certain device types, work from a specific location or jump through similar logistical hoops can slow operations to a crawl. Furthermore, the maintenance work and security issues we just discussed often come into play when businesses try to migrate information from modern apps into legacy solutions.

Integration presents a major problem for organizations contending with legacy apps. Creating APIs and implementing open architectures across a platform can ease many integration challenges. These strategies may solve immediate issues, but they won’t resolve underlying architectural issues that limit how legacy systems work alongside cloud services or systems existing in highly virtualized environments. Complete app modernization is often necessary to resolve integration challenges fully. However, an app modernization project is a huge undertaking. Ramping up your organization’s development capabilities can go a long way in dealing with modernization challenges, making it easier to take on such a large development and testing initiative.

Need a partner with extensive experience in legacy modernization?


Looking at the modernization problem

Consider today’s cloud apps and platforms. These systems rely on highly automated, orchestrated and interconnected ecosystems incorporating virtual operating systems, servers, storage machines and network assets. While these setups are complex, they also allow for a greater degree of data sharing and integration between systems as solutions hosted within the same platform can share data seamlessly and apps hosted in separate clouds can often use APIs and similar solutions to talk to one another.

“Integration is a major impetus behind legacy system modernization.”

In contrast, legacy apps are often residing on traditional servers, possibly specialized legacy systems that couldn’t easily be replaced, and operating in a data center and network model that doesn’t naturally interconnect with the cloud. As businesses embrace hybridized IT environments, with modern, highly virtualized data center setups that work hand-in-hand with public and private cloud assets, legacy systems end up sticking out like a sore thumb.

In this setup, data integration becomes a nightmare, and changing the app is problematic. TechBeacon pointed out that many legacy apps are built as monolithic systems, meant to work within closed-off configurations and operate in isolated silos.

On top of this major architectural limitation, most legacy platforms will struggle to work with systems such as Azure, AWS, Google cloud services and Salesforce, and the complexity involved in aligning a legacy solution with such cloud platforms can slow time to market to a crawl. Testing also become a nightmare because any component that changes in order to work with modern systems will need regression testing. In short, you don’t just need to write new code or find a modern app, you need to completely retrofit the solution to bring it up to date.

As data integration becomes acutely necessary in the modern enterprise, companies increasingly must update their legacy applications to keep up in a cloud-focused world. Organizations striving to keep pace in this environment need to ramp up their development efficiency. This may sound overwhelming, but businesses aren’t alone. At KMS Technology, we offer development teams and training to help organizations take on app and platform modernization initiatives. We also offer consulting services to make sure the project aligns with your needs.


How to Motivate Software Developers

If you want to make sure that you ship a quality piece of software, and boost your chances of delivering within budget and on-schedule, then you need to motivate your development team. So how do you get developers fired up and ready to work? What makes for a determined, passionate, and committed team? Let’s take a look at some of the key contributing factors that can really spur motivation.

The right job, the right team, the right way

You have to start out with a task that will inspire developers. They want to work on things that actually matter. If you’re asking them to build something that doesn’t seem to offer any real value and benefit to end users, then you can expect a tepid response. It also doesn’t make sense from a wider business stance.

The schedule and budget have to be realistic. You should set out goals that are going to stretch them, but not be impossible. If you force engineers to cut corners and turn in work that they don’t feel pride in, then you’ll have a negative impact on morale. They should be equipped with the tools and time they need to do the job properly. It’s one thing to work over the weekend because of an unexpected technological challenge, but quite another to be toiling on a Sunday because the schedule was poorly planned from the outset.

Developers should expect an environment that enables them to do a good job, but you should also expect effort and quality from them. Any engineer who falls below par, is consistently shoddy or late with code, or doesn’t meet the quality standards of their peers, perhaps should be let go.  It’s not fair to expect others to carry the weight of a poor performer.

Creativity, learning and recognition

Software developers may prefer a clear vision and a solid plan but they can be creative, too. Give them some input in the design phase and let them bring their experience to the table. They may have useful ideas about what direction the software should take, and they’ll come at it from a different angle than the other stakeholders.

Give them an opportunity to learn and develop new skills, and they’ll be motivated and happy. You’ll also benefit on future projects, if you encourage a continuous learning approach. It’s also vital to recognize outstanding efforts and praise engineers when they do a good job. Being recognized and praised can have a huge impact on morale, and really boost motivation for the whole team.

Motivated developers are always going to do a better job, so it’s worth making some effort. If you want to drill down a bit deeper, and expand on these ideas, then you may want to read “7 Tips for Motivating Software Development Teams” just published in