Comparing Apples to Serums: Why Beauty & Personal Care Products Need Better Benchmarks

Please login to like posts.

📖 6 min read

Please login to follow authors.

font size

In skin care, star ratings are often treated as shorthand for product quality. Skin care professionals often rely on them to gauge whether a product is worth trying or not. A 4.0-star rating, in particular, is frequently treated as the baseline for quality – an easy indicator that a product is performing well among users. But here’s the problem: that standard isn’t actually standard.

In a recent analysis, the team at Yogi found that average star ratings vary widely depending on the type of skin care product being reviewed. A 4.0 might be below average in one category and above average in another. Facial cleansers, for example, tend to hover around 4.31 stars, while men’s moisturizers and sunscreens routinely score lower, even when reviewers are generally positive.

The takeaway? A flat benchmark like 4.0 creates blind spots. Skin care professionals who cling to it risk overvaluing products that aren’t truly standouts in their category or overlooking options that are outperforming peers despite what appears to be a lower score.

The smarter approach is to view ratings within the right context, understanding what’s typical for each product type and price point. That’s how professionals can better assess product quality, manage client expectations, and make more confident recommendations.

REACH FOR THE STARS?

Star ratings may seem like a universal metric, but the reality is they’re deeply contextual. In the skin care space, products in different categories are judged by different consumer standards, and it shows up clearly in the numbers.

To uncover these dynamics, the team at Yogi analyzed nearly 400,000 product reviews across beauty and personal care categories. The dataset spans from January 2018 to January 2025 and includes only full text, nonincentivized reviews. That means “ratings-only” submissions were excluded – which tend to be disproportionately positive – as well as any reviews tied to coupons, free samples, or seeding programs. What’s left is a more honest look at how satisfied consumers really are.

Across the board, the average star rating was 4.03, but that number hides major discrepancies. Facial cleansers, as mentioned, average 4.31 stars. meaning a product at 4.0 is actually underperforming relative to its category. On the other hand, deodorants average just 3.63 stars. In that category, a 3.9 might be enough to signal a market leader.

For skin care professionals, these gaps matter. When you use a single rating benchmark to evaluate products – whether for your retail shelves or client recommendations – you risk misreading a product’s true performance. The real insight isn’t in the absolute number; it’s in how that number compares to what’s typical for that specific product category. For professionals making decisions that directly affect client outcomes and satisfaction, understanding this context is essential.

BEHIND THE DISCREPANCIES

So, what’s driving the variation in star ratings across skin care categories? A mix of product expectations, consumer behavior, and external factors – all of which influence how customers rate their experience, regardless of how well the product performs.

Take customer expectations. They’re rarely uniform. A $60 facial serum is held to a very different standard than a $6 body wash. With premium products, customers expect a more noticeable effect and an overall elevated experience. Lower-cost items, on the other hand, are often rated more generously when they meet basic expectations. The same star rating can reflect very different levels of satisfaction depending on price point and perceived value.

Then there’s packaging. Damaged packaging, leaky containers, or malfunctioning pumps are some of the most common complaints in skin care reviews. Even when the formula itself performs well, a poor unboxing experience can drag a product’s rating down. These issues often fall outside the product team’s control, but they still shape the narrative around quality.

Customer misunderstanding can also play a role. In categories like exfoliators, for instance, reviewers often conflate chemical exfoliants with physical scrubs – even though they serve different purposes and work on different timelines. When product labeling or descriptions aren’t clear, customers may penalize a product for not doing something it was never designed to do. That disconnect between expectation and functionality shows up in the score, even when the product is doing its job.

Skin sensitivities add another layer of complexity. A product that causes irritation in a small group of users—often due to fragrance or active ingredients—can quickly see its ratings suffer. This is especially true in facial skin care, where consumers are more vocal about adverse reactions. The result isn’t always a reflection of product quality; it’s often about how well the product aligns with individual skin needs.

All these variables contribute to the wide range of average ratings across skin care categories, and they matter for skin care professionals when it comes to making product recommendations. In a category where the average rating is 4.3, a product holding steady at 4.0 could be falling behind, potentially leading to disappointing results for clients or missed opportunities in your retail strategy.

Overreacting is just as risky. If a product earns a 3.9 in a category where the average is 3.7, it’s outperforming the field. But if that context is missing, professionals may rush to change vendors or remove solid products from their offerings based on a perceived problem that doesn’t actually exist.

CATEGORY-SPECIFIC

To move beyond a flat 4.0-star goal, skin care professionals need to get more specific and more strategic about how they evaluate product performance. That starts with reframing the question from “Is this a good rating?” to “Is this a good rating for this type of product, at this price point, and for this customer?”

Start by identifying the right peer group. That means focusing on products in the same category, at a similar price point, targeting a comparable audience. For instance, an $80 luxury eye cream shouldn’t be benchmarked against a drugstore moisturizer, even if they both technically fall under skin care.

If the top-rated products in a category average 4.3 stars, viewing a 4.0 as “good enough” could mean recommending a product that’s trailing the market. Conversely, if the leading products in a category sit around 3.9, a 4.0 could signal a standout option worth recommending or featuring in your practice.

FINAL THOUGHTS

With something as nuanced as skin care products, relying on a flat 4.0-star average rating as a benchmark for quality is often more misleading than it is helpful. Without context, it oversimplifies performance and obscures what really matters to users. When product ratings are read in context, they become less about numbers and more about delivering what customers actually want.

References

“Why a 4.0 Star Rating Just Won’t Cut It: The New Era of Smarter Benchmarking.” 2018. Meetyogi.com. 2018. https://www.meetyogi.com/post/why-a-4-0-star-rating-just-wont-cut-it-the-new-era-of-smarter-benchmarking.

Written by Sogyel Lhungay, vice president of insights at Yogi, an artificial intelligence-driven consumer insights platform that transforms mountains of messy feedback into real-time insights, revealing hidden trends and unmet needs. By unifying data from ratings, reviews, customer care interactions, social discussions, surveys, and more into a single source of truth, Yogi helps brands make smarter decisions faster.