Comment by shadowgovt

6 months ago

Scoping the data collection to Google domains is a reasonable security measure because you don't want to leak it to everybody. And in general, Google does operate under the security model that if you trust them to drop a binary on your machine that provides a security sandbox (i.e. the browser), you trust them with your data because from that vantage point, they could be exfiltrating your bank account if they wanted to be.

But yes, I don't doubt that the data collection was pretty vital for getting Hangouts to the point it got to. And I do strongly suspect that it got us to browser-based video conferencing sooner than we would have been otherwise; the data collected got fed into the eventual standards that enable video conferencing in browsers today.

"Could not have" is too strong, but I think "could not have this soon" might be quite true. There was an explosion of successful technologies in a brief amount of time that were enabled by Google and other online service providers doing big data collection to solve some problems that had dogged academic research for decades.

To be more clear:

After your infelicitous contribution, you were politely invited to consider _a client side web API only on Google domains for CPU metrics_ isn't necessary for _collecting client metrics_.

To be perfectly clear: they're orthogonal. Completely unrelated.

For some reason, you instead read it as an invitation to continue fantasizing about WebRTC failing to exist without it

  • What would the alternative be?

    (Worth noting: Google Hangouts predates WebRTC. I think a case can be made that big data collection of real users machine performance in the real world was instrumental for hammering out the last mile issues in Hangouts, which informed WebRTC's design. I'm sure we would have gotten there eventually, my contention is it would have taken longer without concrete metrics about performance).

    • I made this.

        +------------------+
        |   Web Browser    |
        | +--------------+ |
        | |  WebRTC      | |
        | |  Components  | |
        | +------+-------+ |
        |        |         |
        | +------v-------+ |    +---------------+
        | | Browser's    | |    |   Website     |
        | | Internal     | |    | (e.g. Google  |
        | | Telemetry    | |    |  Meet)        |
        | +------+-------+ |    |               |
        |        |         |    |  (No direct   |
        | +------v-------+ |    |   access to   |
        | |  CPU Stats   | |    |   CPU stats)  |
        | |  (Internal)  | |    |               |
        +------------------+    +---------------+
                 |
                 | WebRTC metrics
                 | (including CPU stats as needed)
                 v
        +------------------+
        |  Google Servers  |
        | (Collect WebRTC  |
        |    metrics)      |
        +------------------+
      

      Another attempt, in prose:

      I am referring to two alternatives to consider:

      A) Chrome sends CPU usage metrics, for any WebRTC domain, in C++

      B) as described in TFA: JavaScript, running on allow-listed Google sites only, collect CPU usage via a JavaScript web API

      There's no need to do B) to launch/improve/instrument WebRTC, in fact, it would be bad to only do B), given WebRTC implementers is a much less biased sample for WebRTC metrics than Google implementers of WebRTC.

      I've tried to avoid guessing at what you're missing, but since this has dragged out for a day, I hope you can forgive me for guessing here:

      I think you think there's a _C++ metrics API for WebRTC in Chrome-only, no web app access_ that _only collects WebRTC on Google domains_, and from there we can quibble about whether its better to have an unbiased sample or if its Google attempting to be a good citizen via collecting data from Google domains.

      That's not the case.

      We are discussing a _JavaScript API_ available only to _JavaScript running on Google domains_ to access CPU metrics.

      Additional color commentary to further shore up there isn't some WebRTC improvement loop this helps with:

      - I worked at Google, and it would be incredibly bizarre to collect metrics for improvements via B) instead of A).

      - We can see via the rest of the thread this is utilized _not for metrics_, but for features such as gSuite admins seeing CPU usage metrics on VC, and CPU usage displayed in Meet in a "Having a problem?" section that provides debug info.

      1 reply →