Skip to content

wgpu: Implement Tqueries #8669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

wgpu: Implement Tqueries #8669

wants to merge 2 commits into from

Conversation

jc3265
Copy link
Contributor

@jc3265 jc3265 commented Apr 25, 2025

BUGS=[407961733]

This PR Implements Timer Queries. It does it by taking a timestamp before it beings a frame and after it ends processing

@jc3265 jc3265 added the internal Issue/PR does not affect clients label Apr 25, 2025
@jc3265 jc3265 force-pushed the jcaldas/wgpuTQueries branch from 2a92144 to 7cfcd2d Compare April 25, 2025 20:51
@jc3265 jc3265 force-pushed the jcaldas/wgpuTQueries branch from 7cfcd2d to 658390f Compare April 25, 2025 20:53
std::atomic<uint64_t> elapsed{ 0 };// only valid if available is true
};

std::shared_ptr<Status> status;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does status need a std::shared_ptr? From what I can tell, only this class owns and uses this? Thus, if a smart pointer were needed I would think to reach for std::unique_ptr, but I don't see the need for a pointer at all. Why can't Status just be a member variable of this class, or even just its members directory (available and elapsed)? In any of these cases, should the underlying status be private, since only this class needs it?

if (!status->available.load()) {
return false;
}
if (outElapsedTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this should not ever be true right? If the status is available, the time should be. Perhaps an assert_invariant(...) makes more sense? Second, even if this were true we would return that the results are available, but we never set anything in outElapsedTime which could have undefined behavior, such as junk in that integer.


void WebGPUDriver::endTimerQuery(Handle<HwTimerQuery> tqh) {
auto* tq = handleCast<WGPUTimerQuery>(tqh);
tq->endTimeElapsedQuery();
Copy link
Contributor

@AndyHovingh AndyHovingh Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly one way to do it, and maybe we should review this with some of the other backend engineers, to see if this gets to the spirit of what is needed by these queries. But, what I originally had in mind was something more like:

...
// somewhere in the driver initiation... maybe the constructor:
    ...
    mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::AllowSpontaneous, [this](auto const& status) {
        if (status == wgpu::QueueWorkDoneStatus::Success) {
          mCurrentTimerQuery.end();
        } else {
          ... log something ...
        }
      });
      ...
}

void WebGPUDriver::commit(Handle<HwSwapChain> sch) {
    mCommandEncoder = nullptr;
    mCurrentTimerQuery.begin();
    mQueue.Submit(1, &mCommandBuffer);
   ...

beginTimerQuery(...) and endTimerQuery(...) maybe could be used to set (and unset) mCurrentTimerQuery.

You also have to account for finish, where we may need to add additional time if we submit multiple command buffers in a frame (we can discuss offline if you need clarity on that). In that case, you might want some kind of state for intermediate queues to account for in OnSubmittedWorkDone and maybe another argument to the timer query's "end" call.

#include "WebGPUHandles.h"

#include <chrono>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to include (include-what-you-use) <memory> for std::weak_ptr and <cstdint> for uint64_t.

Copy link
Contributor

@AndyHovingh AndyHovingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least time the command buffer queue submission time (I believe close to what Metal is doing) instead of the approach as-is at this point. We need to prioritize the approach of timing all the render/compute passes, accumulating those, and reporting that, which is a bit more complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal Issue/PR does not affect clients
2 participants