This is a good question, and I suspect there is no one right answer, as it may depend on the context of the situation, the target audience, and what you are trying to assess.
I think items and activities that attempt to demonstrate an applied understanding of a disciplinary core idea within an authentic context of a real life situation that may be relevant to many is one target to aim for.
For example, I remember attending a presentation at ITEEA delivered by Cary Sneider (NSTA's 2108 Robert e Carleton Award winner), where he provided a nice review of several NAEP digital online assessment items (one gold standard to review). The items were focused on technology and engineering, but these in part are included in STEM and 3D learning espoused in NGSS, right? The item we reviewed was about an iguana, and how students might design their cage to help sustain life based on parameters germane to the iguana's needs (applied principles in science intertwinning engineering design). The item and national results are available here: https://www.nationsreportcard.gov/tel_2014/#tasks/iguana. This is one "gold standard" example I feel comfortable sharing.
I'd add one other note directly to your question, IMHO I think it might be a combination of both hands-on AND online assessments (such as the NAEP example) where in the case of the hands-on component, students are demonstrating abilities to manipulate variables, or designed models/solutions, observing interactions/outcomes in investigation/soultion, collecting data for analysis and using that data as evidence to support their arguments/claims/explanations.
The same could be said for an online assessment (simulation/parsed existing data sets/visualizations), where students might not readily have access to a physical instantiation of the same, or the affordance of the simulation allows observations/data analysis/modeling that would otherwise be impossible to do, or manipulate/replicate, or repeat often under varied conditions, etc.
In the blended approach (hands-on and online), students can also compare/contrast physical models/experiments against a simluation of the same, Students might then be able to extrapolate effects over a long time, larger/varied set of data, and analyze the integritey of the digital modeling/simluation at hand (looking underneath the hood at the computational thinking/algorithm), where the model/simluation holds up, etc.
My 2 cents! Thanks for asking the question.
This response is coming at the question more from an instructional technology viewpoint and I'm sure other NGSS assessment experts may embellish, disreard, or greatly enhance this contribution!