Skip to content

I2C/TWI library: Race Condition with Repeated STARTs #2173

Closed
@earlyprogrammer

Description

@earlyprogrammer

I believe I've identified a race condition with repeated STARTS in the TWI library.
(As Master, doesn't seem related to the TWI slave reports I found searching online)
I'm using a modified version of Wire and TWI, I'll discuss my changes later.

The Flow:
Instead of sending a stop, send a repeated start. Note: TWIE is set to 0, so the ISR should not get called when the hardware flags that sending the START has completed.
(ISR)

if (twi_sendStop)
    twi_stop();
else {
  twi_inRepStart = true;    // we're gonna send the START
  // don't enable the interrupt. We'll generate the start, but we 
  // avoid handling the interrupt until we're in the next transaction,
  // at the point where we would normally issue the start.
  TWCR = _BV(TWINT) | _BV(TWSTA)| _BV(TWEN) ;
  twi_state = TWI_READY;
}

Next, the flow skips the wait loop, believing that the state is READY.
(readFrom or writeTo)

while(TWI_READY != twi_state){
    continue;
  }
//various prep variable assignments

Finally, put the address in TWDR, then initiate the transmission and reenable the interrupts
(readFrom or writeTo)

if (true == twi_inRepStart) {
    // if we're in the repeated start state, then we've already sent the start,
    // (@@@ we hope), and the TWI statemachine is just waiting for the address byte.
    // We need to remove ourselves from the repeated start state before we enable interrupts,
    // since the ISR is ASYNC, and we could get confused if we hit the ISR before cleaning
    // up. Also, don't enable the START interrupt. There may be one pending from the 
    // repeated start that we sent outselves, and that would really confuse things.
    twi_inRepStart = false;         // remember, we're dealing with an ASYNC ISR
    TWDR = twi_slarw;               
    TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE);  // enable INTs, but not START
  }

CASE 1: intended

The hardware finishes the repeated start before TWDR = twi_slarw;

Hardware sets TWINT to 0, but because TWIE is disabled, ISR will not be called.
Address will be put in the TWDR, then the hardware will be initiated by setting TWINT to 1.

CASE 2: works

The hardware finishes the repeated start after TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE);

The attempt to TWDR = twi_slarw; broke because TWINT was still 1, flagging TWWC.
The TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE); was presumably ignored because the hardware was already working on a request.
Note: TWIE has been set.
When the hardware returns, the ISR will be called because interrupts were enabled.
leading to...

    case TW_REP_START: // sent repeated start condition
      // copy device address and r/w bit to output register and ack
      TWDR = twi_slarw;
      twi_reply(1);
      break;

twi_reply(1) == TWCR = _BV(TWEN) | _BV(TWIE) | _BV(TWINT) | _BV(TWEA);
meaning the address is put in TWDR, then hardware initiated, just like CASE 1.
This will also clear TWWC, meaning no error is noticed.
It works, so that's fine.

CASE 3: broken

The hardware finishes the repeated start between cases 1 and 2.

The attempt to TWDR = twi_slarw; broke because TWINT was still 1, flagging TWWC.
Hardware returns, and because ISR was disabled when the repeated start was constructed, the TWINT flag is ignored as it comes in.
Finally, TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE); runs, starting the hardware.
But the TWDR was never updated, so the wrong data gets sent (whatever was previously in the TWDR).
TWWC will also stay flagged.

My sniffer would notice an address nack once in about 50 repeat starts. (the address is the first byte sent after the repeated start, so that's the one that is sent incorrectly, and there was no slave at that wrong address.)

My Fix:
I ended up adding this...

twi_inRepStart = false;         // remember, we're dealing with an ASYNC ISR
do
  {
    TWDR = twi_slarw;
  } while (TWCR & _BV(TWWC));               
TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE);  // enable INTs, but not START

polling TWWC until it accepts the TWDR change, effectively removing cases 2 and 3.
(I tried polling TWINT with while (TWCR & _BV(TWINT)); before updating the data register, but I kept hitting an infinite loop. Not sure why, not completely relevant, but if anyone knows, I am curious.)
This fix doesn't quite fit the nice interrupt driven design, but I was having trouble fixing it in a good way.

The end, I'd love to hear from you, hope this was understandable and accurate.

PS: Why I was exploring the TWI and Wire libraries.
We wanted to fit PMBus protocol, meaning we needed 256 byte block write/read. changing the static buffers in Wire and TWI to 256 increases the static memory load from 32_5 to 256_5 > 1250 bytes, a big hit while using the uno.
We don't intend to fully use the asynchronous capabilities of the TWI library, and Wire blocks anyway, so I hacked out the buffers, made TWI write to a buffer created by the application. I also removed most of the slave code (using as demo board for other products, only master is needed.)
I'll attach those files as soon as I figure out how. The simplified versions might be necessary to consistently recreate the issue. I'm hoping a theoretical conversation will be useful either way though.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions